CANELC: constructing an e-language corpus

Dawn Knight, Svenja Adolphs, Ronald Carter
2014 Corpora  
This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus 3 . CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis
more » ... rpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC -British National Corpus).
doi:10.3366/cor.2014.0050 fatcat:4yfpcebsa5bkxmzehabnrtdb54