Research topics
Statistical Estimation of Vocabulary Size Including "Unseen" Words. †
The number of unique words in children’s speech is one of most basic statistics indicating their language development. We may face, however, to a difficulty to accurately evaluate the number of unique words in a child’s growing corpus over time with a limited sample size. This study proposes a novel technique to estimate the latent number of words from a series of words uttered by children. This technique utilizes statistical properties of the number of types as a function of the number of sampled tokens. We tested the practical effectiveness of the proposed method in the empirical data analysis of the cross-sectional and longitudinal samples. The converging empirical evidence suggests that the proposed estimator improves the accuracy of vocabulary size estimation over a naïve type-counting estimators. Utilizing this efficient estimator, we propose a new sampling scheme for vocabulary assessment that has lower cost and higher accuracy compared to existing methods.
Keywords †
Vocabulary growth; Small sample size; Number of latent types; Type–token ratio;
観察されない語彙を含む語彙数の推定法 †
- PreprintPub2013MethodologicalImprovementVocabSizeEstimation Hidaka, S. (2013). General Type Token Distribution., eprint arXiv:1305.0328. (link) [Publications]
- Pub2014MethodologicalImprovementVocabSizeEstimation幼児が獲得している単語の抽出単語数と獲得単語種類数の理論的な関係を証明 ~氷山の一角から潜在語彙数の推定~, 北陸先端科学技術大学院大学 (2014 年 9 月 10 日).(link) [Publications]
- Pub2014MethodologicalImprovementVocabSizeEstimation幼児の語彙力統計学で測定, 北國新聞 (2014 年 9 月 11 日). (pdf) [Publications]
- PubSelected?Pub2014MethodologicalImprovementVocabSizeEstimationHidaka2014VocabEstimation? Hidaka, S. (2014). General type-token distribution., Biometrika. 101 (4), 999-1002. doi: 10.1093/biomet/asu035. (First published online: August 17, 2014) (pdf). (link) [Publications]
- PubSelected?Pub2016MethodologicalImprovementVocabSizeEstimationHidaka2014VocabEstimation? Hidaka, S. (2016). Estimating the latent number of types in growing corpora with reduced cost–accuracy trade-off. Journal of Child Language, 43, pp 107-134. [Publications]
- VocabGrowth[VocabGrowth]VocabSizeEstimationPub2009 Hidaka, S. (2009). A Sample-size-invariant Estimation of Lexical Diversity. In Proceedings of The Thirty First Annual Meeting of Cognitive Science Society. [Publications]