Filters








526 Hits in 2.3 sec

Discovering structure in social networks of 19th century fiction

Siobhán Grayson, Karen Wade, Gerardine Meaney, Jennie Rothwell, Maria Mulvany, Derek Greene
2016 Proceedings of the 8th ACM Conference on Web Science - WebSci '16  
In this paper, we examine detailed social networks of characters, extracted from several works of 19th century fiction by Jane Austen and Charles Dickens.  ...  Inspired by the increasing availability of large text corpora online, digital humanities scholars are adopting computational approaches to explore questions in the field of literature from new perspectives  ...  This research was partly supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, in collaboration with the Nation, Genre and Gender project funded by the Irish Research Council  ... 
doi:10.1145/2908131.2908196 dblp:conf/websci/GraysonWMRMG16 fatcat:vs25kdoeurhzpcsdhkfmsvw26y

Reliable Editions from Unreliable Components: Estimating Ebooks from Print Editions Using Profile Hidden Markov Models [article]

A. B. Riddell
2022 arXiv   pre-print
A profile hidden Markov model, a popular model in biological sequence analysis, can be used to model related sequences of characters transcribed from books, magazines, and other printed materials.  ...  This paper documents one application of a profile HMM: automatically producing an ebook edition from distinct print editions.  ...  ACKNOWLEDGEMENTS Thanks in particular to Christof Schöch, whose work on text comparison [14] prompted the search that led me to learn about the profile HMM.  ... 
arXiv:2204.01638v2 fatcat:owaprowa3raijdhj62asdtr3se

The New Zealand Digital Library Project

Ian H. Witten, Sally Jo Cunningham, Mark D. Apperley
1996 D-Lib Magazine  
The migration of information from paper to computers promises to change the whole nature of research, and in particular the methods by which people locate information.  ...  The goal of the New Zealand Digital Library project is to explore the potential of Internet-based digital libraries, by which we mean large collections of electronic, predominantly textual, documents,  ...  For example, we have incorporated two small collections of English literature (totalling 550 books): the Oxford Text Archive (from the UK) and the Gutenberg collection (from the US).  ... 
doi:10.1045/november96-witten fatcat:g47vkqgudbd4bgi2yndc3b5vei

Illustrations Segmentation in Digitized Documents Using Local Correlation Features

Dalia Coppi, Costantino Grana, Rita Cucchiara
2014 Procedia Computer Science  
We identify and extract illustrations in digitized documents by learning the discriminative patterns of textual and pictorial regions.  ...  The proposal has been demonstrated to be effective on historical datasets and to outperform the state-of-the-art in presence of challenging documents with a large variety of pictorial elements.  ...  If Optical Character Recognition (OCR) methods almost yield completely reliable results, the task of identifying textual regions and separate them from other components of the page is more challenging  ... 
doi:10.1016/j.procs.2014.10.014 fatcat:32xydpox4je5nemm2hsngpaihi

Stroke-Like Pattern Noise Removal in Binary Document Images

Mudit Agrawal, David Doermann
2011 2011 International Conference on Document Analysis and Recognition  
In order to perform text extraction, and hence noise removal, at diacritic-level, this divide-and-conquer technique does not assume the availability of accurate and large amounts of ground-truth data at  ...  The method was tested on a collection of degraded and noisy, machine-printed and handwritten binary Arabic text documents. Results show pixel-level precision and recall of 98% and 97% respectively.  ...  Acknowledgment The partial support of this research by DARPA through BBN/DARPA Award and the US Government through NSF Award is gratefully acknowledged.  ... 
doi:10.1109/icdar.2011.13 dblp:conf/icdar/AgrawalD11 fatcat:iqkiol6cjbezfivfbpb5hsabnu

Crowdsourcing Model for Multilingual Corpus and Knowledge Construction: The Case of Transnational Mark Twain

Amel Fraisse, Ronald Jenn, Quoc-Tan Tran
2018 Zagadnienia informacji naukowej  
RESULTS AND CONCLUSIONS: The model promotes a dynamic approach to archives that increases the impact of traditional research by presenting the text from a new angle, accessible to a global public.PRACTICAL  ...  APPROACH/METHODS: We use a crowdsourcing model to collect and annotate translations of the same literary text.  ...  Acknowledgments This paper results from an ongoing research developed with the support of the European Center for the Humanities and Social Sciences in Lille (MESHS) as part of the Global Huck project.  ... 
doi:10.36702/zin.379 fatcat:oq6qreks4ngovlztyer7qkspke

Layout analysis and content enrichment of digitized books

Costantino Grana, Giuseppe Serra, Marco Manfredi, Dalia Coppi, Rita Cucchiara
2014 Multimedia tools and applications  
Moreover we present a solution to help the user in finding contemporary content connected to what is automatically extracted from the ancient documents.  ...  We propose a supervised learning approach to segment text and illustration of digitized old documents using a texture feature based on local correlation aimed at detecting the repeating patterns of text  ...  This dataset was created using a set of publicly available e-books from Project Gutenberg 3 .  ... 
doi:10.1007/s11042-014-2360-0 fatcat:mywxvsu53jhfxd7bqjc26xh2ve

A Survey on Sentiment and Emotion Analysis for Computational Literary Studies [article]

Evgeny Kim, Roman Klinger
2019 arXiv   pre-print
In the past, the affective dimension of literature was mainly studied in the context of literary hermeneutics.  ...  The research under review deals with a variety of topics including tracking dramatic changes of a plot development, network analysis of a literary text, and understanding the emotionality of texts, among  ...  Acknowledgements We thank Laura Ana Maria Bostan, Sebastian Padó, and Enrica Troiano for fruitful discussions and the ZfDG team for their help in preparation of this article.  ... 
arXiv:1808.03137v2 fatcat:by5csiqpefgexnlmntlgagretm

Document Analysis Systems for Digital Libraries: Challenges and Opportunities [chapter]

Henry S. Baird, Venugopal Govindaraju, Daniel P. Lopresti
2004 Lecture Notes in Computer Science  
The state-of-the-art is summarized, including a digest of themes that emerged during the recent International Workshop on Document Image Analysis for Libraries.  ...  We attempt to specify, in considerable detail, the essential features of document analysis systems that can assist in: (a) the creation of DL's; (b) automatic indexing and retrieval of doc-images within  ...  Scalar and profile features are extracted from the images and an entire historical document is modeled as a HMM, with words constituting the state sequence.  ... 
doi:10.1007/978-3-540-28640-0_1 fatcat:3szb2elcm5amvlhvma3kbwmzza

How to carry over historic books into social networks

Heimo Müller, Hermann Maurer
2011 Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing - BooksOnline '11  
The scans have to be of high quality, allow good OCR to permit full text searches; books need not only be "packaged" but also need meta-data and functionalities that one can expect from a computer supported  ...  We claim that the quality and the enhancements of an Interactive Internet Book go far beyond what is traditionally assumed: it is not enough to scan books.  ...  This implies the generation of a search index from the text version, extraction of the book's table of contents and of indices of item/persons, extraction of images and image captions, the possibility  ... 
doi:10.1145/2064058.2064065 dblp:conf/cikm/MullerM11 fatcat:y362bxfldff4niuq33c2avnjxa

An Approach to Document Fingerprinting [chapter]

Yunhyong Kim, Seamus Ross
2015 Lecture Notes in Computer Science  
The identifying features, regardless of whether the document content is textual, aural or visual, are often delineated in terms of descriptions about the document, for example, intended audience, coverage  ...  To secure a comprehensive view of a document, therefore, we must draw heavily on cognitive and/or computational resources not only to extract and classify information at multiple scales, but also to interlink  ...  Acknowledgement This research was supported in part by the Universities of Glasgow and Toronto, and the European Commission through Blogforever (FP7-ICT-2009-6-269963).  ... 
doi:10.1007/978-3-319-27974-9_11 fatcat:sn5pk3wt6jftjmg4ajzg4xicom

Stylometric Identification in Electronic Markets: Scalability and Robustness

Ahmed Abbasi, Hsinchun Chen, Jay F. Nunamaker
2008 Journal of Management Information Systems  
received his Ph.D. in systems engineering and operations research from case Institute of Technology, an M.S. and b.S. in engineering from the university of Pittsburgh, and a b.S. from carnegie Mellon  ...  Nunamaker received the LEO Award from the Association of Information Systems at IcIS in barcelona, Spain, December 2002.  ...  For the n-gram models, we used character-level n-grams, with profile sizes of 5,000 n-grams per identity.  ... 
doi:10.2753/mis0742-1222250103 fatcat:wnrsmmuzm5cf5lskb5562trjgi

Open Set Authorship Attribution toward Demystifying Victorian Periodicals [article]

Sarkhan Badirli, Mary Borgo Ton, Abdulmecit Gungor, Murat Dundar
2019 arXiv   pre-print
In this paper, we study AA in historical texts using anew data set compiled from the Victorian literature.  ...  We investigate the predictive capacity of most common English words in distinguishing writings of most prominent Victorian novelists.  ...  articles, marked files of the periodicals, publishers' lists and account books, and the correspondence of editors and leading contributors in British archives."  ... 
arXiv:1912.08259v1 fatcat:l6tt7u2tozbpxavayjxjaptqci

Thinking Outside the Box at Open-Air Archeological Contexts: Examples From Loess Landscapes in Southeast Romania

Kathryn E. Fitzsimmons, Adrian Doboş, Mathias Probst, Radu Iovita
2020 Frontiers in Earth Science  
Here we test the idea of aggregating "off-sites"-human traces which provide isolated evidence of activity in an area-to maximize the information which can meaningfully be extracted from Paleolithic open-air  ...  We present two case studies from the sediment-rich loess steppe of southeast Romania, Lipniţa and Dealul Peş terica.  ...  Surface survey at this site identified a c. 50 cm diameter block of sediment which had detached from the lower part of the quarry profile.  ... 
doi:10.3389/feart.2020.561207 fatcat:ik3hlm7d6fhsxopj4bx26cd47y

Feature-finding for text classification

RS Forsyth, DI Holmes
1996 Digital Scholarship in the Humanities  
Abstract Stylometrists have proposed and used a wide variety of textual features or markers, but until recently very little attention has been focused on the question: where do textual features come from  ...  In many text-categorization tasks the choice of textual features is a crucial determinant of success, yet is typically left to the intuition of the analyst.  ...  Project Gutenberg and the Oxford Text Archive, stylometry currently lacks an equivalent set of accepted test problems. Thus we have been forced to compile our own.  ... 
doi:10.1093/llc/11.4.163 fatcat:vhgrwbkhvbbixkrtvx4yzhzffm
« Previous Showing results 1 — 15 out of 526 results