Towards Identification of Nominal Multiword Expressions in Bengali Language

Tanmoy Chakraborty
2014 OALib  
Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds, play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world's languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Bengali due to the lack of such tools and large corpora. This paper deals with the investigation of
more » ... estigation of Noun-Noun bigram collocations from the medium-size untagged Bengali corpus of the articles of Rabindranath Tagore using simple unsupervised approach with various statistical evidences to show the affinity of the constituents of each bigram candidate as a proof of the Multi-Word Expression (MWE) and build a weighted measurement to get a distinction between MWE or non-MWE. We have mentioned different taxonomies of compound noun MWEs in Bengali based on morpho-syntactic flexibility. We have also identified major Noun-Noun semantic collocations that are not MWEs. This initial approach for Bengali is promising in terms of the Precision, Recall and F-score.
doi:10.4236/oalib.1100582 fatcat:7atcaopxkjdsbhh5pbdsk4skhm