Modelling the emergence of linguistic structures in early childhood

Start date: 01/01/2012
End date: 31/12/2014


Dnr: 2011-2263
Total: 6 354 000 SEK

Principal investigator: Francisco Lacerda
Co-applicant: Mats Wirén

The young child´s transition from holistic phrases to the use of syntactic constructions will be investigated. The project will use a combination of Phonetic and Computational Linguistic methods to determine how linguistic structure can be derived from the distributional information available in the utterances heard by the infant in its ecological environment. The project assumes no linguistic information at the early stages of the language acquisition process and avoids the use of aprioristic notions, like words or grammatical categories.

The model’s input will be phonetic transcriptions providing string-like representations of the utterances heard by the infant and "context descriptors", conveying information on the language learner´s ecological environment associated with the transcribed utterances.

The model’s general working principle is the hierarchical detection of recurrent patterns, based on a simple notion of similarity. At a first level, "sound strings" involved in recurrent sound-context patterns will function as potential lexical items or lexicalized phrases. At a second level, the distributional properties of the recurrent sequences of lexical items will convey syntactic-like information, which will be expected to converge to semantic-like information when fed into the next level of processing. The proposed model is intended to test the power and the limits of data-driven distributional processes applied to the special case of early language acquisition.

