Filters








15,228 Hits in 4.7 sec

Very large annotated database of American English

Mitch Marcus
1990 Proceedings of the workshop on Speech and Natural Language - HLT '90   unpublished
Objective To construct a data base (the "Penn Treebank') of written and transcribed spoken American English annotated with detailed grammatical structure.  ...  components of natural language understanding systems, and a research tool for the investigation of the grammar and prosodic structure of naturally spoken English.  ...  Objective To construct a data base (the "Penn Treebank') of written and transcribed spoken American English annotated with detailed grammatical structure.  ... 
doi:10.3115/116580.1138612 fatcat:zpzhcbinznholft57bnfzwxy6u

Very large annotated database of American English

Mitch Marcus
1991 Proceedings of the workshop on Speech and Natural Language - HLT '91   unpublished
Objective To construct a data base (the "Penn Treebank') of written and transcribed spoken American English annotated with detailed grammatical structure.  ...  components of natural language understanding systems, and a research tool for the investigation of the grammar and prosodic structure of naturally spoken English.  ...  Objective To construct a data base (the "Penn Treebank') of written and transcribed spoken American English annotated with detailed grammatical structure.  ... 
doi:10.3115/112405.1138667 fatcat:7753hjo2pbbxjhssavowoudayu

Building a Database of Political Speech

Ailbhe Cullen, Andrew Hines, Naomi Harte
2014 Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge - AVEC '14  
To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers.  ...  The impact of these different annotations on charisma prediction from political speech is also investigated.  ...  Annotators were aged from 20 to 59, with a median of 30, and a standard deviation of 9 years. 90% of annotators were American and native English speakers. The remaining 10% were Indian.  ... 
doi:10.1145/2661806.2661808 dblp:conf/mm/CullenHH14 fatcat:fnwnztye6zbqbo2mxmrbh6nq2q

Frame Semantics and legal corpora annotation

Anderson Bertoldi, Rove Chishman
2012 Linguistic Issues in Language Technology  
The objective of this paper is to evaluate the applicability of Frame Semantics theory and FrameNet paradigm for the semantic annotation of legal texts.  ...  This paper presents a theoretical discussion about the use of Frame Semantics as corpora annotation paradigm.  ...  Considering the verb acusar in Portuguese, an annotator will very easily identify to accuse as an English equivalent for acusar.  ... 
doi:10.33011/lilt.v7i.1277 fatcat:xmb5344s65ddzbwzqe4k7aycmq

Elements of Bibliography: A Simplified Approach, by Robert B. Harmon

Bertrum H. MacDonald
1991 Papers of the Bibliographical Society of Canada. Cahiers de la Societe bibliographique du Canada. Bibliographical Society of Canada  
very brief definitions of selected terms.  ...  The latter, or narrow view, is seen in his discussion of only English-language scholarship (all the work in histoire du livre published in France is missed) and in the listing of only North-American schools  ... 
doi:10.33137/pbsc.v29i1.17795 fatcat:sv3zhhiwbfhwvdsj44a2neftoe

UniMorph 2.0: Universal Morphology [article]

Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden
2020 arXiv   pre-print
This paper details advances made to the collection, annotation, and dissemination of project resources since the initial UniMorph release described at LREC 2016. lexical resources} }  ...  The project releases annotated morphological data using a universal tagset, the UniMorph schema.  ...  Wiktionary Extraction In Kirov et al. (2016) , we introduced version 1.0 of the Uni-Morph morphological database, based on a very large-scale parsing and normalization of Wiktionary.  ... 
arXiv:1810.11101v2 fatcat:gck5gcxwszbtrigkng5mbjgbfe

CancerMine: Knowledge Base Construction for Personalised Cancer Treatment

Jake Lever, Martin Jones, Steven J. Jones
2016 International Conference on Biomedical Ontology  
We have annotated a large body of literature which reports oncogenic aberrations using a custom designed annotation tool.  ...  Knowledge of the relevant genomic aberrations that drive a particular cancer type is necessary to accelerate efficient interpretation of genomic data and enable large-scale endeavours in precision medicine  ...  This word list was built from the stop words from the NLTK toolkit [9] , the most frequent 5,000 words based on the Corpus of Contemporary American English [10] and a stop word list associated with  ... 
dblp:conf/icbo/LeverJJ16 fatcat:ppsa7blwergc3bsp653esfzpka

IMI –- A Multilingual Semantic Annotation Environment

Francis Bond, Luís Morgado da Costa, Tuan Anh Lê
2015 Proceedings of ACL-IJCNLP 2015 System Demonstrations  
For the past six years, our tools have been tested and developed in parallel with the semantic annotation of a portion of this corpus in Chinese, English, Japanese and Indonesian.  ...  The system includes interfaces to help coordinating the annotation project and a corpus browsing interface designed specifically to meet the needs of a semantically annotated corpus.  ...  We would also like to thank our annotators for their hard work and patience during this system's development.  ... 
doi:10.3115/v1/p15-4002 dblp:conf/acl/BondCL15 fatcat:xlniluin7veqvdrsxxtof6xbee

Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language [chapter]

Tilda Neuberger, Dorottya Gyarmathy, Tekla Etelka Gráczi, Viktória Horváth, Mária Gósy, András Beke
2014 Lecture Notes in Computer Science  
In this paper, a large Hungarian spoken language database is introduced.  ...  This phonetically-based multi-purpose database contains various types of spontaneous and read speech from 333 monolingual speakers (about 50 minutes of speech sample per speaker).  ...  Development of a large spontaneous speech database of agglutinative Hungarian  ... 
doi:10.1007/978-3-319-10816-2_51 fatcat:4ucbmrlewvfwtbkfcjyp54nxbe

Methods for eliciting, annotating, and analyzing databases for child speech development

Mary E. Beckman, Andrew R. Plummer, Benjamin Munson, Patrick F. Reidy
2017 Computer Speech and Language  
Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant  ...  The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability.  ...  contributions to the ideas about the development and interpretation of elicitation and annotation protocols for children's speech that are described here.  ... 
doi:10.1016/j.csl.2017.02.010 pmid:28943715 pmcid:PMC5608260 fatcat:hz3mug564bgq7lvtri6fuk35ji

VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English

Jacob Sager, Ravi Shankar, Jacob Reinhold, Archana Venkataraman
2019 Interspeech 2019  
We use crowd sourcing to obtain ten human ratings for the perceived emotional content of each utterance. Our unique database construction enables a multitude of scientific and technical explorations.  ...  VESUS is a lexically controlled database, in which a semantically neutral script is portrayed with different emotional inflections.  ...  Here, the relevant resources for North American English include the RAVDESS [15] , SAVEE [16] and MSP-IMPROV [17] databases.  ... 
doi:10.21437/interspeech.2019-1413 dblp:conf/interspeech/SagerSRV19 fatcat:tyreru5fivfarlirwx4lw7dq4a

The Virtual Institute for Integrative Biology (VIIB) [article]

Gustavo Rivera, Fernando González-Nilo, Tomás Perez-Acle, Raul Isea, David S. Holmes
2010 arXiv   pre-print
The scientific agenda of VIIB includes: construction of databases for comparative genomics, the AlterORF database for alternate open reading frames discovery in genomes, bioinformatics services and protein  ...  visibility of Latin American science It may provide a useful paradigm for developing further e-Science initiatives in Latin America and other emerging regions.  ...  Comparisons involving the use of very demanding algorithms based on Hidden Markov Models. In all, about a terabyte of information has been stockpiled in the AlterORF database.  ... 
arXiv:1012.3437v1 fatcat:vyefihirunbnncb3bn6fiehjii

From archive to corpus: Transcription and annotation in the creation of signed language corpora

Trevor Johnston
2010 International Journal of Corpus Linguistics  
However, unique identifiers of sign types (or 'ID-glosses') can only be used if a comprehensive reference lexical database of the language already exists.  ...  In fact, the most important feature of a modern signed language corpus should be that it has been annotated rather than simply transcribed.  ...  Using the large databases of the most well-described and documented languages, such as English, this process is able to yield accuracy rates of up to 98% (Garside and Smith, 1997) .  ... 
doi:10.1075/ijcl.15.1.05joh fatcat:zkjx6atkxjd3vhhuxdi6cov4ce

The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems [article]

Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Ostadabbas and Thierry Dutoit
2018 arXiv   pre-print
Even though the system is a very simple one, the test show the efficiency of the data which is promising for future work.  ...  In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose.  ...  Database Content The data was recorded in 2 difference languages English (North American) and French (Belgian).  ... 
arXiv:1806.09514v1 fatcat:u5dgo2ocrvdt7itrx4l4njgrpi

The Belfast storytelling database: A spontaneous social interaction database with laughter focused annotation

Gary McKeown, William Curran, Johannes Wagner, Florian Lingenfelser, Elisabeth Andre
2015 2015 International Conference on Affective Computing and Intelligent Interaction (ACII)  
We thank all our participants and the members of the ILHAIRE consortium.  ...  Laugh particles have also been annotated, with a laugh particle being defined as a very short laugh aspiration that occurs as part of speech.  ...  The English speakers were all from Ireland; the Spanish speaking group contained people from Spain and Latin America-the Latin Americans had all been living for several years within the European Union.  ... 
doi:10.1109/acii.2015.7344567 dblp:conf/acii/McKeownCWLA15 fatcat:6yd5to5ixnf5jm3gkzkcyuis4e
« Previous Showing results 1 — 15 out of 15,228 results