Filters








6,561 Hits in 5.9 sec

Genre identification for office document search and browsing

Francine Chen, Andreas Girgensohn, Matthew Cooper, Yijuan Lu, Gerry Filby
2011 International Journal on Document Analysis and Recognition  
to improve the performance of genre identification.  ...  These results provide support for a topic-independent approach to identification of coarse office document genres.  ...  Related work Many sets of genre categories have been proposed for text genre identification and web genre identification. For web page genre, Roussinov et al.  ... 
doi:10.1007/s10032-011-0163-7 fatcat:ogby7vevq5h2lf5kwg55emlkb4

Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues [chapter]

Benno Stein, Sven Meyer zu Eissen, Nedim Lipka
2010 Text, Speech and Language Technology  
Special focus is put on the generalization capability of Web genre retrieval models, for which we present new evaluation measures and, for the first time, a quantitative analysis.  ...  It presents relevant use cases, discusses existing and new technology for the construction of Web genre retrieval models, and outlines implementation aspects for a genreenabled Web search.  ...  Evaluation This section addresses evaluation-related issues of Web genre identification.  ... 
doi:10.1007/978-90-481-9178-9_8 fatcat:ej7pnhz5lrg7tos2c5uxw7zqzu

Learning to recognize webpage genres

Ioannis Kanaris, Efstathios Stamatatos
2009 Information Processing & Management  
for the given corpus.  ...  Moreover, we perform a series of cross-check experiments (e.g., training using a genre palette and testing using a different genre palette as well as using the features extracted from one corpus to discriminate  ...  EXPERIMENTS Webpage Genre Corpora Although there is not yet a large reference corpus covering a wide variety of web genres, several small webpage corpora appropriate for evaluating genre identification  ... 
doi:10.1016/j.ipm.2009.05.003 fatcat:3sqi3jb7erebteomnsxxrpdnue

Toward Multilingual Identification of Online Registers

Veronika Laippala, Roosa Kyllönen, Jesse Egbert, Douglas Biber, Sampo Pyysalo
2019 Nordic Conference of Computational Linguistics  
The data set consists of 2,237 Finnish documents and follows the register taxonomy developed for the Corpus of Online Registers of English (CORE), the largest manually annotated language collection of  ...  We consider cross-and multilingual text classification approaches to the identification of online registers (genres), i.e. text varieties with specific situational characteristics.  ...  Acknowledgements We thank Fulbright Finland, Kone foundation and Emil Aaltonen Foundation for financial support.  ... 
dblp:conf/nodalida/LaippalaKEBP19 fatcat:ecbill2hnjaqjjp6pwsazt22fi

Overview of the PAN/CLEF 2015 Evaluation Lab [chapter]

Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, Benno Stein
2015 Lecture Notes in Computer Science  
A new corpus was built for this challenging, yet realistic, task covering four languages.  ...  In plagiarism detection, community-driven corpus construction is introduced as a new way of developing evaluation resources with diversity.  ...  Our special thanks go to all of PAN's participants. This work was partially supported by the WIQ-EI IRSES project (Grant No. 269180) within the FP7 Marie Curie action.  ... 
doi:10.1007/978-3-319-24027-5_49 fatcat:fcpf2p7nujet5ez4zswoiscatq

Corpus-Based Study of Discursive Representation of 'Other' in Diachronic Perspective

Larisa Kochetova, Anastasya Plavina, S. Cindori, O. Larouk, E.Yu. Malushko, L.N. Rebrina, N.L. Shamne
2018 SHS Web of Conferences  
of attributive and verbal evaluative lexemes that are used for discursive construing of other cultural society representatives in the consciousness of Western people.  ...  to the end of the 19 th century and the corpus of present day blogs on travel.  ...  This corpus is considered to be representative for the modern British variant of language in web genres, what determined its choice as the empirical base of our research.  ... 
doi:10.1051/shsconf/20185001132 fatcat:v7a42aicgrdd7fndosxegks32u

From Print form to Digital Communication: the One-way Journey of Academic Research

Rosa Lorés
2021 Dyskursy o Kulturze  
Conference announcements or journal call for papers are instances of genres whose relevance lies in their function as enablers of other genres (e.g. conference proposals, reviews from evaluators, research  ...  papers), thus being part of a longer genre chain (Räisänen, 1999; Swales, 2004).  ...  research: A linguistic, rhetorical and pragmatic study of digital genres in English as a language of international communication" (project code FFI2017-84205-P).  ... 
doi:10.36145/doc2021.04 doaj:632c87e70b354aa68580964339ba4601 fatcat:l4uom2yfhvciddr74ucjitpkya

Crowdsourcing for web genre annotation

Noushin Rezapour Asheghi, Serge Sharoff, Katja Markert
2016 Language Resources and Evaluation  
Recently, genre collection and automatic genre identification for the web has attracted much attention.  ...  In this paper, we tackle these problems by using crowd-sourcing for genre annotation, leading to the Leeds Web Genre Corpus-the first web corpus which is, demonstrably reliably annotated for genre and  ...  Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution  ... 
doi:10.1007/s10579-015-9331-6 fatcat:w3z6woisqvhi5f4wtjfpvvnm6i

Retrieval Models for Genre Classification

Benno Stein, Sven Meyer zu Eissen
2008 Scandinavian Journal of Information Systems  
We present a comprehensive survey which contrasts the genre retrieval models that have been developed for Web and non-Web corpora.  ...  The presented concepts go beyond the existing utilization of vocabulary-centered, genre-revealing features and open new possibilities for the construction of genre classifiers that operate in real-time  ...  Web genre identification is a key factor for reducing inadequate results of search engines, as the user would be able to specify the desired Web genre along with the keywords (Santini 2004) .  ... 
dblp:journals/sjis/SteinE08 fatcat:z5vdeavmtncwtpq5br6cxophz4

A corpus-based approach to online materials development for writing research articles

Ching-Fen Chang, Chih-Hua Kuo
2011 English for Specific Purposes  
A word frequency list derived from the corpus was analyzed to develop a vocabulary profile for the genre.  ...  Move analysis was also conducted based on a self-developed coding scheme of rhetorical moves in the target genre.  ...  Acknowledgements We would like to thank the editor and the two anonymous reviewers for their valuable comments on earlier versions of the manuscript.  ... 
doi:10.1016/j.esp.2011.04.001 fatcat:uaqrgs6avffxlnmh2pj3pvr3ge

ESP corpus construction: a plea for a needs-driven approach

Hilary Nesi
2015 ASp  
, narrow, genre categories in general corpora, or by using the web-as-corpus. 10 A central premise in applied linguistics is that "the meaning of a sentence is more than a combination of the meaning of  ...  The genres that they most need to engage with may be impossible to locate on the web. 8 Some progress is being made in the field of information and language technology towards the automatic differentiation  ... 
doi:10.4000/asp.4682 fatcat:ohvfpvf6ordmvdv4s3oa2xapem

Can Document-Genre Metadata Improve Information Access to Large Digital Collections?

Kevin Crowston, Barbara H. Kwasnik
2003 Library Trends  
We outline a research protocol that would provide guidance for identifymg Web document genres, for observing how genre is used in searching and evaluating search results, and finally for representing and  ...  Explicit identification of genre seems particularly important for such collections because any search usually retrieves documents with a diversity of genres that are undifferentiated by obvious clues as  ...  For example, in a study of Web documents, Crowston and Williams (2000) were able to ident i 9 documents of many familiar genres and of a few genres that seemed to be new to the Web, such as the home  ... 
dblp:journals/libt/CrowstonK03 fatcat:5gyhuvlmtfaxteby6ja55gv3vq

Genre Identification Based on SFL Principles: The Representation of Text Types and Genres in English Language Teaching Material

Maria N. Melissourgou, Katerina T. Frantzi
2017 Corpus Pragmatics  
At the same time, the rapid development of corpus linguistics studies has caused a reconsideration of methodological issues such as the classification of texts during corpus building.  ...  Reporting on experience from the Writing Model Answers corpus classification of texts, we explain how we identified genres based on Systemic Functional Linguistics principles.  ...  A contrastive study, however, using two corpora, one for past 'writing' papers (prompts) and one similar to the WriMA corpus would be interesting and helpful towards the evaluation and improvement of material  ... 
doi:10.1007/s41701-017-0013-z fatcat:lnzrxx3ccrgxbn4m5wej5yu64i

Scalable Construction of High-Quality Web Corpora

Chris Biemann, Felix Bildhauer, Stefan Evert, Dirk Goldhahn, Uwe Quasthoff, Roland Schäfer, Johannes Simon, Leonard Swiezinski, Torsten Zesch
2013 Journal for Language Technology and Computational Linguistics  
As we are working with web data, controlling the quality of the resulting corpus is an important issue, which we address by showing how corpus statistics and a linguistic evaluation can be used to assess  ...  Then, we describe how the crawled data can be linguistically pre-processed in a parallelized way that allows the processing of web-scale input data.  ...  Acknowledgments The second evaluation study reported in Section 4.2 is based on joint work with Sabine Bartsch.  ... 
dblp:journals/ldvf/BiemannBEGQSSSZ13 fatcat:eciovvcvazewnfuhk7shiksuiy

Language-Independent Text Parsing of Arbitrary HTML-Documents. Towards A Foundation For Web Genre Identification

Georg Rehm
2005 Journal for Language Technology and Computational Linguistics  
This text parser is being developed for a novel kind of search engine that aims to classify web pages into web genres so that the search engine user will be able to specify one or more keywords, as well  ...  as one or more web genres of the documents to be found.  ...  Towards Automatic Web Genre Identification Section 3 describes our theoretical framework and briefly mentions our ultimate goal: devising a web genre-enabled search engine.  ... 
dblp:journals/ldvf/Rehm05 fatcat:4lqq25sjlvfqdfmxt3naxhccle
« Previous Showing results 1 — 15 out of 6,561 results