12,076 Hits in 4.1 sec

The Danish Gigaword Project [article]

Leon Strømberg-Derczynski, Manuel R. Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen (+3 others)
2021 arXiv   pre-print
This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text.  ...  The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects.  ...  For example, the social media Twitter section is currently limited to politics, a domain already well-represented in DAGW; thus, more short social media on other topics is a natural extension.  ... 
arXiv:2005.03521v3 fatcat:y6wqfycpf5hk7d534pjyte2key

Page 8 of Current Vol. , Issue 532 [page]

2011 Current  
whose servers are in Sweden), Tuenti (a Spanish social network), and Naver (a Korean one), are among the sites used most for political speech, conversation, and coordi- nation.  ...  Services based in the United States, such as Facebook, Twitter, Wikipedia, and YouTube, and those based overseas, such as QQ (a Chinese instant-messaging service), WikiLeaks (a repository of leaked documents  ... 


Pei-Yu Chi, Henry Lieberman
2011 Proceedings of the 2011 annual conference on Human factors in computing systems - CHI '11  
However, today's social media force a choice between real-time communication, and authoring a coherent story illustrated with digital media.  ...  We present Raconteur, which introduces a new style of social media combining aspects of the real-time and authored styles of communication.  ...  CONCLUSION AND FUTURE WORK We have presented Raconteur, an enhanced text chat system that introduces a new style of social media, assisted conversation.  ... 
doi:10.1145/1978942.1979411 dblp:conf/chi/ChiL11 fatcat:3oipug564zayndhzn2brs6udge

A Telegram Corpus for Hate Speech, Offensive Language, and Online Harm

Veronika Solopova, Tatjana Scheffler, Mihaela Popa-Wyatt
2021 Journal of Open Humanities Data  
We provide a new text corpus from the social medium Telegram, which is rich in indirect forms of divisive speech.  ...  own manual annotations of harmful language for a portion of the posts leading up to the January 2021 Capitol riot and its aftermath.  ...  ACKNOWLEDGEMENTS The authors would like to thank Lesley-Ann Kern for help with annotations. 5 Solopova et al.  ... 
doi:10.5334/johd.32 fatcat:n3w6tpjmp5fjla3elrfild4hm4

Bots in software engineering: a systematic mapping study

Sivasurya Santhanam, Tobias Hecking, Andreas Schreiber, Stefan Wagner
2022 PeerJ Computer Science  
This study aims to provide an introduction and a broad overview of bots used in software engineering.  ...  The spectrum of applications of bots in software engineering is so wide and diverse, that a comprehensive overview and categorization of such bots is needed.  ...  This facilitates bot developers and designers to model collaboratively through social media channels.  ... 
doi:10.7717/peerj-cs.866 pmid:35494879 pmcid:PMC9044364 fatcat:xmkvbhry3bg77g3c4l333aavfa

Building and curating conversational corpora for diversity-aware language science and technology [article]

Andreas Liesenfeld, Mark Dingemanse
2022 arXiv   pre-print
Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified  ...  We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages.  ...  media files (Figure 3 ).  ... 
arXiv:2203.03399v3 fatcat:av6imoyu2rfstkzh6ndjxk53fa

Automatic analysis of multiparty meetings

2011 Sadhana (Bangalore)  
We discuss the capture and annotation of the AMI meeting corpus, the development of a meeting speech recognition system, and systems for the automatic segmentation, summarisation and social processing  ...  This is a challenging task since the meetings consist of spontaneous and conversational interactions between a number of participants: it is a multimodal, multiparty, multistream problem.  ...  of acoustic models for conversational telephone speech (Hain et al. 2005 ).  ... 
doi:10.1007/s12046-011-0051-3 fatcat:mdhaduibcnck3b3yi763snppdm

Managing the Preservation of Records for Digital Primary Data: A Case of Malaysia Institution

Alwi Mohd Yunus, Irwan Kamaruddin Abdul Kadir
2017 International Journal of Academic Research in Business and Social Sciences  
The aim of the paper is to examine the needs for the surveyed social science research institutions in Malaysian government for the preservation of their digital primary research data.  ...  Concerns raised about the lack of preservation initiatives and poor access to global scientific knowledge in the form of primary research data and records for research in the turn of the 20 th century  ...  conversion for play back purpose); speech synthesis (the translation by computers of a coded description of a message into speech, i.e. computers 'talking'); and speech recognition and understanding,  ... 
doi:10.6007/ijarbss/v7-i11/3441 fatcat:zdylbljiqbeevm4ptuuprhbzr4

Report on the workshop on web archiving and digital libraries (WADL 2013)

Edward A. Fox, Mohamed M. Farag
2013 SIGIR Forum  
), (focused) crawling, curation, indexing, exploration (including searching and browsing), (text) analysis, archiving, and up through long-term preservation.  ...  scale, working with big data, mobile Web archiving, temporal issues, Memento, and SiteStory.  ...  a digital repository for UN documentation (parliamentary documents, conference related documents, publications, etc).  ... 
doi:10.1145/2568388.2568408 fatcat:uzzgtstg3rgprjoh6njnzbru7u

Event Detection and Summarization Based on Social Networks and Semantic Query Expansion

Sathiyamurthy K, Shanmugavalli G, Udayalakshmi N
2014 International Journal on Natural Language Computing  
Intuitively, documents describing the same event will contain similar sets of keywords, and the graph for a document collection will contain clusters individual events.  ...  The important data source for event detection is a Web search log because the information it contains reflects users' activities and interestingness to various real world events.  ...  The detection of events in web image document stream on social media based on clustering technique integrates with Kleinberg's burst detection [9] .  ... 
doi:10.5121/ijnlc.2014.3602 fatcat:okd6f2bpdberrpdvzan3ac6ix4

Multi-Document Information Consolidation (Dagstuhl Seminar 19182)

Ido Daga, Iryna Gurevych, Dan Roth, Amanda Stent, Michael Wagner
2019 Dagstuhl Reports  
and visualize multi-document repositories for decision support; and 4) how to do information validation on multi-document repositories.  ...  This report documents the program and the outcomes of Dagstuhl Seminar 19182 "Multi-Document Information Consolidation".  ...  visualize multi-document repositories for decision support; and 4) how to do information validation on multi-document repositories.  ... 
doi:10.4230/dagrep.9.4.124 dblp:journals/dagstuhl-reports/DaganGRS19 fatcat:lej5rn6i4vhphjun6hw4uj4sra

Corpus Sharing Strategy for Descriptive Linguistics

Kazushi Ohya
2015 Journal of the Japanese Association for Digital Humanities  
This paper introduces the idea of data sharing strategy based on a conversion service, not on a sharing application, scheme, or ontology, that are dominant in proposals for language documentation.  ...  an idea of personal diachronic data sharing; and finally, we propose an idea for sharing data based on data conversion services.  ...  Data conversion will be a key service or function for the regional-scale repository system.  ... 
doi:10.17928/jjadh.1.1_68 fatcat:jrbldprsanb4pppjj3xdcdydkm

Verbal Communication in Robotics: A Study on Salient Terms, Research Fields and Trends in the Last Decades Based on a Computational Linguistic Analysis

Alessandro Marin Vargas, Lorenzo Cominelli, Felice Dell'Orletta, Enzo Pasquale Scilingo
2021 Frontiers in Computer Science  
In particular, verbal communication resulted in being highly relevant for social robotics.  ...  We highlighted positive and negative trends for the most coherent topics and the distribution over the years for the most significant ones.  ...  As Nass and colleagues demonstrated in their book "The Media Equation" (Reeves and Nass, 1996) , people often respond socially to computers in ways similar to how they would interact socially with other  ... 
doi:10.3389/fcomp.2020.591164 fatcat:y3bg445k7bagze4crbl6noq5za

A systematic review of Hate Speech automatic detection using Natural Language Processing [article]

Md Saroar Jahan, Mourad Oussalah
2021 arXiv   pre-print
With the multiplication of social media platforms, which offer anonymity, easy access and online community formation, and online debate, the issue of hate speech detection and tracking becomes a growing  ...  Despite efforts for leveraging automatic techniques for automatic detection and monitoring, their performances are still far from satisfactory, which constantly calls for future research on the issue.  ...  of hate speech in social media forums as social media platforms constitute by far the dominant agora of hate speech because of easy access, fast spread and societal impact.  ... 
arXiv:2106.00742v1 fatcat:qwxjwgma4zaynemge57cu7xqlm

From Textual Information Sources to Linked Data in the Agatha Project [article]

Paulo Quaresma, Vitor Beires Nogueira, Kashyap Raiyani, Roy Bayot, and Teresa Gonçalves
2019 arXiv   pre-print
In this work we describe our proposal for representing and reasoning about Portuguese documents by means of Linked Data like ontologies and thesauri.  ...  Our approach resorts to a specialized pipeline of natural language processing (part-of-speech tagger, named entity recognition, semantic role labeling) to populate an ontology for the domain of criminal  ...  Acknowledgments The authors would like to thank COMPETE 2020, PORTUGAL 2020 Program, the European Union, and ALENTEJO 2020 for supporting this research as part of Agatha Project SI & IDT number 18022 (  ... 
arXiv:1909.05359v1 fatcat:377ymxrnvjanhn44vofx7vq6iu
« Previous Showing results 1 — 15 out of 12,076 results