Dr Kristina Nilsson Björkenstam

Email: kristina.nilsson@ling.su.se
Phone: +46 8 16 17 61
Address: Department of LinguisticsStockholm University, SE-10691 Stockholm, Sweden


My main research interests are hybrid approaches to natural language processing combining data-driven methods and linguistic knowledge.

I am currently involved in the MINGLE project ("Modelling the emergence of linguistic structures in early childhood"), where the longterm goal is to model the emergence of linguistic structures in child language based on distributional information in the utterances directed at the infant within its ecological environment. To that end, we are developing a longitudinal corpus of video and audio recordings of parent-child interaction with verbal and non-verbal annotation (transcription, eye gaze, object-related actions, gestures) and discourse information. Within this project, we have studied the distribution of disfluency in child-directed speech in comparison to adult-directed speech, and synchrony across modalities (that is, recurring patterns or structural regularities that may reduce the complexity of the language learning task, e.g., the parent shaking an object in the infant's visual field while verbally naming it). As part of this project, a longitudinal corpus of child-directed speech (LONG-MINGLE) has been made available for research.This corpus consists of 57 transcripts from longitudinal dyads with 13 children between 2 and 33 months of age.

My previous work includes coreference resolution, named entity recognition, cross-language information retrieval, and computer-assisted language learning. Past corpus construction projects that have resulted in publicly available resources include the Stockholm University Strindberg Corpus, consisting of 7 autobiographical novels by August Strindberg, annotated with part-of-speech, morpho-syntactic information, and lemmas, and SUC-CORE, the gold standard section of The Swedish Treebank annotated with coreference relations between noun phrases.


SUC-CORE: SUC 2.0 Annotated with NP Coreference
23/01/2012 - 08/06/2012
SUC-CORE is a 20 000 word subset of the Stockholm-Umeå Corpus (SUC 2.0) annotated with coreference relations between noun phrases. The corpus covers a wide range of genres and domains, and is freely available for research.


Collaborating with

De Lacerda, Francisco

My research profile is focused on explaining the infant's acquisition of the ambient language's linguistic structure as an emergent information-structuring process anchored on independently motivated biological and social components. My specific goal is to present a coherent a...


Last updated on 2017-16-06 at 11:14