INTRODUCTION
[chapter]
2018
A Guide to Using Corpora for English Language Learners
IF YO U W ER E to ask a group of native English speakers to list the three most common adjectives that are used to describe the noun fisherman, it is highly probable that many would list the adjective avid. Despite having countless adjectives from which to choose, the native speaker is much more likely to list avid than other seemingly appropriate adjectives such as skilled, enthusiastic, passionate, or zealous. In corpus linguistics, avid is considered a collocate of fisherman because the two
more »
... o-occur with great frequency. If you ask the same question regarding fisherman and adjectives of English language learners, it is equally unlikely that many would write avid on their list. Since avid is not an especially common adjective, it is certainly possible that a language learner has never encountered the word. This type of collocational knowledge is possessed by native speakers who have years of experience in the language but is not as easily acquired by learners. Importantly, this example of words preferring particular partners is not an anomaly; it is a common occurrence in language. So, what is a corpus? A corpus is simply a large collection of authentic language collected from newspapers, blogs, academic essays, and so on that has been compiled, organised, and made searchable. For example, there are corpora of student essays, political speeches, academic lectures, newspaper articles, blogs, and much more. By collecting and analysing real language from real contexts, we can learn a great deal about language and how it is used. Corpus study has shown repeatedly that words often appear together in chunks and bundles and often display a preference for each other. For example, native speakers implicitly know that strong coffee is preferred instead of powerful coffee. Another example to highlight the importance of collocational knowledge for language users are the synonyms attractive and beautiful. The dictionary definitions of these words would indicate that these items could be essentially interchangeable in speaking and writing. However, these words occur in rather distinct contexts.
doi:10.1515/9781474427180-003
fatcat:ld25ccvtzfdfbhyyumsy6qi3l4