『昭和話し言葉コーパス』の設計と構築

丸山 岳彦, 小磯 花絵, MARUYAMA Takehiko, KOISO Hanae, NISHIKAWA Ken'ya
Construction of the "Showa Speech Corpus" (SSC) began in 2016, and was completed in March 2021 and made available to the public online through the corpus search application Chunagon. The SSC consists of a collection of recordings made from the 1950s to the 1970s by the National Institute for Japanese Language and Linguistics. Thus, it is a speech corpus made with modern technology, but with old recordings as its content. The SSC is innovative in that it can be used to explore the changes in
more » ... en language over time (i.e., as a "diachronic speech corpus") by linking, comparing, and contrasting the SSC with modern spoken language corpora such as the Corpus of Spontaneous Japanese (CSJ) and the Corpus of Everyday Japanese Conversation (CEJC). In this paper, we describe the origins of the recorded materials stored in the SSC, the process of corpus construction and annotation, and the results of the preliminary analysis.
doi:10.15084/00003522 fatcat:xx4jqmymyfannpb4xnhmbqqrb4