Authors: Kalenkovich E, Koorathota S, Tor S, Amatuni A, Egan-Dailey S, Moore C, Laing C, Garrison H, Baudet G, Bulgarelli F, Uner S, Righter L, Bergelson E
This paper describes a dataset consisting of manually annotated nouns from a corpus of longitudinal day-long audio and hour-long video recordings collected monthly from 44 babies from age 6 months to age 17 months. This dataset was created as part of a larger project, called SEEDLingS, that examines the development of infants' language comprehension before and after their first birthday, from earliest comprehension to the early days of word production. This paper provides an overview of the corpus, describes how and why the nouns from the corpus were annotated, and discusses considerations for the reuse of this dataset for future work. The described annotations and relevant metadata are publicly available alongside this manuscript.
Keywords: Corpus; Home recordings; Infancy; Language acquisition;
PubMed: https://pubmed.ncbi.nlm.nih.gov/41034519/
DOI: 10.3758/s13428-025-02826-9