A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Discovering structure in social networks of 19th century fiction
2016
Proceedings of the 8th ACM Conference on Web Science - WebSci '16
In this paper, we examine detailed social networks of characters, extracted from several works of 19th century fiction by Jane Austen and Charles Dickens. ...
Inspired by the increasing availability of large text corpora online, digital humanities scholars are adopting computational approaches to explore questions in the field of literature from new perspectives ...
This research was partly supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, in collaboration with the Nation, Genre and Gender project funded by the Irish Research Council ...
doi:10.1145/2908131.2908196
dblp:conf/websci/GraysonWMRMG16
fatcat:vs25kdoeurhzpcsdhkfmsvw26y
Reliable Editions from Unreliable Components: Estimating Ebooks from Print Editions Using Profile Hidden Markov Models
[article]
2022
arXiv
pre-print
A profile hidden Markov model, a popular model in biological sequence analysis, can be used to model related sequences of characters transcribed from books, magazines, and other printed materials. ...
This paper documents one application of a profile HMM: automatically producing an ebook edition from distinct print editions. ...
ACKNOWLEDGEMENTS Thanks in particular to Christof Schöch, whose work on text comparison [14] prompted the search that led me to learn about the profile HMM. ...
arXiv:2204.01638v2
fatcat:owaprowa3raijdhj62asdtr3se
The New Zealand Digital Library Project
1996
D-Lib Magazine
The migration of information from paper to computers promises to change the whole nature of research, and in particular the methods by which people locate information. ...
The goal of the New Zealand Digital Library project is to explore the potential of Internet-based digital libraries, by which we mean large collections of electronic, predominantly textual, documents, ...
For example, we have incorporated two small collections of English literature (totalling 550 books): the Oxford Text Archive (from the UK) and the Gutenberg collection (from the US). ...
doi:10.1045/november96-witten
fatcat:g47vkqgudbd4bgi2yndc3b5vei
Illustrations Segmentation in Digitized Documents Using Local Correlation Features
2014
Procedia Computer Science
We identify and extract illustrations in digitized documents by learning the discriminative patterns of textual and pictorial regions. ...
The proposal has been demonstrated to be effective on historical datasets and to outperform the state-of-the-art in presence of challenging documents with a large variety of pictorial elements. ...
If Optical Character Recognition (OCR) methods almost yield completely reliable results, the task of identifying textual regions and separate them from other components of the page is more challenging ...
doi:10.1016/j.procs.2014.10.014
fatcat:32xydpox4je5nemm2hsngpaihi
Stroke-Like Pattern Noise Removal in Binary Document Images
2011
2011 International Conference on Document Analysis and Recognition
In order to perform text extraction, and hence noise removal, at diacritic-level, this divide-and-conquer technique does not assume the availability of accurate and large amounts of ground-truth data at ...
The method was tested on a collection of degraded and noisy, machine-printed and handwritten binary Arabic text documents. Results show pixel-level precision and recall of 98% and 97% respectively. ...
Acknowledgment The partial support of this research by DARPA through BBN/DARPA Award and the US Government through NSF Award is gratefully acknowledged. ...
doi:10.1109/icdar.2011.13
dblp:conf/icdar/AgrawalD11
fatcat:iqkiol6cjbezfivfbpb5hsabnu
Crowdsourcing Model for Multilingual Corpus and Knowledge Construction: The Case of Transnational Mark Twain
2018
Zagadnienia informacji naukowej
RESULTS AND CONCLUSIONS: The model promotes a dynamic approach to archives that increases the impact of traditional research by presenting the text from a new angle, accessible to a global public.PRACTICAL ...
APPROACH/METHODS: We use a crowdsourcing model to collect and annotate translations of the same literary text. ...
Acknowledgments This paper results from an ongoing research developed with the support of the European Center for the Humanities and Social Sciences in Lille (MESHS) as part of the Global Huck project. ...
doi:10.36702/zin.379
fatcat:oq6qreks4ngovlztyer7qkspke
Layout analysis and content enrichment of digitized books
2014
Multimedia tools and applications
Moreover we present a solution to help the user in finding contemporary content connected to what is automatically extracted from the ancient documents. ...
We propose a supervised learning approach to segment text and illustration of digitized old documents using a texture feature based on local correlation aimed at detecting the repeating patterns of text ...
This dataset was created using a set of publicly available e-books from Project Gutenberg 3 . ...
doi:10.1007/s11042-014-2360-0
fatcat:mywxvsu53jhfxd7bqjc26xh2ve
A Survey on Sentiment and Emotion Analysis for Computational Literary Studies
[article]
2019
arXiv
pre-print
In the past, the affective dimension of literature was mainly studied in the context of literary hermeneutics. ...
The research under review deals with a variety of topics including tracking dramatic changes of a plot development, network analysis of a literary text, and understanding the emotionality of texts, among ...
Acknowledgements We thank Laura Ana Maria Bostan, Sebastian Padó, and Enrica Troiano for fruitful discussions and the ZfDG team for their help in preparation of this article. ...
arXiv:1808.03137v2
fatcat:by5csiqpefgexnlmntlgagretm
Document Analysis Systems for Digital Libraries: Challenges and Opportunities
[chapter]
2004
Lecture Notes in Computer Science
The state-of-the-art is summarized, including a digest of themes that emerged during the recent International Workshop on Document Image Analysis for Libraries. ...
We attempt to specify, in considerable detail, the essential features of document analysis systems that can assist in: (a) the creation of DL's; (b) automatic indexing and retrieval of doc-images within ...
Scalar and profile features are extracted from the images and an entire historical document is modeled as a HMM, with words constituting the state sequence. ...
doi:10.1007/978-3-540-28640-0_1
fatcat:3szb2elcm5amvlhvma3kbwmzza
How to carry over historic books into social networks
2011
Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing - BooksOnline '11
The scans have to be of high quality, allow good OCR to permit full text searches; books need not only be "packaged" but also need meta-data and functionalities that one can expect from a computer supported ...
We claim that the quality and the enhancements of an Interactive Internet Book go far beyond what is traditionally assumed: it is not enough to scan books. ...
This implies the generation of a search index from the text version, extraction of the book's table of contents and of indices of item/persons, extraction of images and image captions, the possibility ...
doi:10.1145/2064058.2064065
dblp:conf/cikm/MullerM11
fatcat:y362bxfldff4niuq33c2avnjxa
An Approach to Document Fingerprinting
[chapter]
2015
Lecture Notes in Computer Science
The identifying features, regardless of whether the document content is textual, aural or visual, are often delineated in terms of descriptions about the document, for example, intended audience, coverage ...
To secure a comprehensive view of a document, therefore, we must draw heavily on cognitive and/or computational resources not only to extract and classify information at multiple scales, but also to interlink ...
Acknowledgement This research was supported in part by the Universities of Glasgow and Toronto, and the European Commission through Blogforever (FP7-ICT-2009-6-269963). ...
doi:10.1007/978-3-319-27974-9_11
fatcat:sn5pk3wt6jftjmg4ajzg4xicom
Stylometric Identification in Electronic Markets: Scalability and Robustness
2008
Journal of Management Information Systems
received his Ph.D. in systems engineering and operations research from case Institute of Technology, an M.S. and b.S. in engineering from the university of Pittsburgh, and a b.S. from carnegie Mellon ...
Nunamaker received the LEO Award from the Association of Information Systems at IcIS in barcelona, Spain, December 2002. ...
For the n-gram models, we used character-level n-grams, with profile sizes of 5,000 n-grams per identity. ...
doi:10.2753/mis0742-1222250103
fatcat:wnrsmmuzm5cf5lskb5562trjgi
Open Set Authorship Attribution toward Demystifying Victorian Periodicals
[article]
2019
arXiv
pre-print
In this paper, we study AA in historical texts using anew data set compiled from the Victorian literature. ...
We investigate the predictive capacity of most common English words in distinguishing writings of most prominent Victorian novelists. ...
articles, marked files of the periodicals, publishers' lists and account books, and the correspondence of editors and leading contributors in British archives." ...
arXiv:1912.08259v1
fatcat:l6tt7u2tozbpxavayjxjaptqci
Thinking Outside the Box at Open-Air Archeological Contexts: Examples From Loess Landscapes in Southeast Romania
2020
Frontiers in Earth Science
Here we test the idea of aggregating "off-sites"-human traces which provide isolated evidence of activity in an area-to maximize the information which can meaningfully be extracted from Paleolithic open-air ...
We present two case studies from the sediment-rich loess steppe of southeast Romania, Lipniţa and Dealul Peş terica. ...
Surface survey at this site identified a c. 50 cm diameter block of sediment which had detached from the lower part of the quarry profile. ...
doi:10.3389/feart.2020.561207
fatcat:ik3hlm7d6fhsxopj4bx26cd47y
Feature-finding for text classification
1996
Digital Scholarship in the Humanities
Abstract Stylometrists have proposed and used a wide variety of textual features or markers, but until recently very little attention has been focused on the question: where do textual features come from ...
In many text-categorization tasks the choice of textual features is a crucial determinant of success, yet is typically left to the intuition of the analyst. ...
Project Gutenberg and the Oxford Text Archive, stylometry currently lacks an equivalent set of accepted test problems. Thus we have been forced to compile our own. ...
doi:10.1093/llc/11.4.163
fatcat:vhgrwbkhvbbixkrtvx4yzhzffm
« Previous
Showing results 1 — 15 out of 526 results