1,041 Hits in 2.4 sec

Tracking Knowledge Propagation Across Wikipedia Languages [article]

Roldolfo Valentim, Giovanni Comarela, Souneil Park, Diego Saez-Trumper
2021 arXiv   pre-print
Together with the dataset, a holistic overview of the propagation and key insights about the underlying structural factors are provided to aid future research.  ...  Covering the entire 309 language editions and 33M articles, the dataset aims to track the full propagation history of Wikipedia concepts, and allow follow up research on building predictive models of them  ...  Giovanni Comarela is financed in part by CAPES (Finance Code 001), CNPq, and FAPES.  ... 
arXiv:2103.16613v1 fatcat:davbdexcrjhrximaqvyurbi7ie

Wiki2Prop: A Multimodal Approach for Predicting Wikidata Properties from Wikipedia

Michael Luggen, Julien Audiffren, Djellel Difallah, Philippe Cudré-Mauroux
2021 Proceedings of the Web Conference 2021  
In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content.  ...  With a Wikidata gadget, the relevant new properties can be presented directly on the page of the respective Wikidata entry to support editors with completing the entry.  ...  For future work, we plan to consider alternative sources of information, beyond Wikipedia and Wikimedia Commons.  ... 
doi:10.1145/3442381.3450082 fatcat:iski74moj5hqxdaev6fs2jcnzi

Building Automated Vandalism Detection Tools for Wikidata

Amir Sarabadani, Aaron Halfaker, Dario Taraborelli
2017 Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion  
However, it exposes the knowledge base to the risk of vandalism and low-quality contributions. In this work, we build on past work detecting vandalism in Wikipedia to detect vandalism in Wikidata.  ...  Wikidata, like Wikipedia, is a knowledge base that anyone can edit.  ...  Adam Wight, Helder Lima, Arthur Tilley and Gediz Aksit for their help. We also want to thanks community of Wikidata editors for providing feedback and reporting mistakes.  ... 
doi:10.1145/3041021.3053366 dblp:conf/www/SarabadaniHT17 fatcat:kokdkkyuubelbduhbwotac3cqq

The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits

Cristina Sarasua, Alessandro Checco, Gianluca Demartini, Djellel Difallah, Michael Feldman, Lydia Pintscher
2018 Computer Supported Cooperative Work (CSCW)  
Wikidata is an open general-interest knowledge base that is collaboratively developed and maintained by a community of thousands of volunteers.  ...  One of the major challenges faced in such a crowdsourcing project is to attain a high level of editor engagement. In order to intervene and  ...  Gianluca Demartini was affiliated with the University of Sheffield (United Kingdom) and Djellel Difallah was affiliated with the University of Fribourg (Switzerland).  ... 
doi:10.1007/s10606-018-9344-y fatcat:jr5aocsrdrbyxpr5pmikif3eoe

Improving Knowledge Base Construction from Robust Infobox Extraction

Boya Peng, Yejin Huh, Xiao Ling, Michele Banko
2019 Proceedings of the 2019 Conference of the North  
We demonstrate the empirical effectiveness of the proposed method in both precision and recall compared to a strong IBE baseline, DBpedia, with an absolute improvement of 41.3% in average F 1 .  ...  ; 2) Over-trusting Wikipedia anchor links can lead to entity disambiguation errors; 3) Heuristicbased extraction of unlinkable entities yields low precision, hurting both accuracy and completeness of the  ...  , Mark Biswas for proofreading our manuscript, Thomas Semere for leading the annotation project, and Eric Chahin, Vivek Raghuram, and Eric Choi for the engineering support.  ... 
doi:10.18653/v1/n19-2018 dblp:conf/naacl/PengHLB19 fatcat:6jypr37kljhtdlzjghrhibnaem

Mathematics in Wikidata

Philipp Scharpf, Moritz Schubotz, Bela Gipp
2021 Zenodo  
In the last years, there have been efforts to define several properties and seed formulae together with their constituting identifiers into Wikidata.  ...  Finally, we discuss community feedback and issues related to integrating Mathematical Entity Linking (MathEL) into Wikidata and Wikipedia, which was rejected in 33% and 12% of the test cases, for Wikidata  ...  Acknowledgment This work was supported by the German Research Foundation (DFG grant GI-1259-1).  ... 
doi:10.5281/zenodo.5589639 fatcat:isu6lmir5ve2hmgvvec4xibmfe

Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey [article]

Xiaokai Wei, Shen Wang, Dejiao Zhang, Parminder Bhatia, Andrew Arnold
2021 arXiv   pre-print
Finally, we discuss challenges that face KE-PLMs and also promising directions for future research.  ...  This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks.  ...  Yes entity category/relation type/masked entity Wikipedia/Wikidata LUKE [100] Yes entity prediction Wikipedia WKLM [99] Yes entity replacement detection Wikipedia/Wikidata CoLAKE [82] Yes masked  ... 
arXiv:2110.08455v1 fatcat:b2nw5jdu7neo3brveddmah6mra

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps [article]

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa
2020 arXiv   pre-print
The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model.  ...  We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning.  ...  We thank the anonymous reviewers for suggestions on how to improve the dataset and the paper. This work was supported by JSPS KAKENHI Grant Number 18H03297.  ... 
arXiv:2011.01060v2 fatcat:cptoirhk2fe57f4pjpqjxfcgde

Developing an automated mechanism to identify medical articles from Wikipedia for knowledge extraction

Lishan Yu, Sheng Yu
2020 International Journal of Medical Informatics  
Structured assertional knowledge in Infoboxes and Wikidata items associated with the identified medical articles were also extracted.  ...  This automatic mechanism is aimed to run periodically to update the results and share them with the informatics community.  ...  Acknowledgment The authors thank the following people for their help in data collection and preliminary analyses.  ... 
doi:10.1016/j.ijmedinf.2020.104234 pmid:32693245 pmcid:PMC7357526 fatcat:sbjaaell3jaxtfua3pq6azlft4

WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking [article]

Afshin Rahimi and Timothy Baldwin and Karin Verspoor
2020 arXiv   pre-print
We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources.  ...  We propose a cross-lingual neural reranking model to match a UMLS concept with a Wikipedia page, which achieves a recall@1 of 72%, a substantial improvement of 20% over word- and char-level BM25, enabling  ...  Acknowledgements This work was funded by the Australian Research Council through the ARC Training Centre in Cognitive Computing for Medical Technologies (project number ICI70200030)), and completed while  ... 
arXiv:2005.01281v3 fatcat:nxbiiu4c6nevve3t25bmun3ibm

Provenance Information in a Collaborative Knowledge Graph: An Evaluation of Wikidata External References [chapter]

Alessandro Piscopo, Lucie-Aimée Kaffee, Chris Phethean, Elena Simperl
2017 Lecture Notes in Computer Science  
The machine learning models outperformed the baseline and were able to accurately predict non-relevant and nonauthoritative references.  ...  Wikidata is a collaboratively-edited knowledge graph; it expresses knowledge in the form of subject-property-value triples, which can be enhanced with references to add provenance information.  ...  This project is supported by funding received from the European Union's Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreement No. 642795 (WDAqua ITN).  ... 
doi:10.1007/978-3-319-68288-4_32 fatcat:krbn7y6usbgydmyew5dbucmc6q

OpenTapioca: Lightweight Entity Linking for Wikidata [article]

Antonin Delpeuch
2020 arXiv   pre-print
Our model is lightweight to train, to run and to keep synchronous with Wikidata in real time.  ...  This demonstrates the strengths and weaknesses of this data source for this task and provides an easily reproducible baseline to compare other systems against.  ...  Note that Wikidata items to not need to be associated with any Wikipedia page: in fact, Wikidata's policy on the notability of the subjects it covers is much more permissive than in Wikipedia.  ... 
arXiv:1904.09131v2 fatcat:gt2hmgnbezcetha2gthkkxtw7a

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles [article]

Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp
2020 arXiv   pre-print
We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations.  ...  Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains.  ...  ACKNOWLEDGMENTS The research presented in this article is funded by the German Federal Ministry of Education and Research (BMBF) through the project QURATOR (Unternehmen Region, Wachstumskern, no. 03WKDA1A  ... 
arXiv:2003.09881v1 fatcat:g4coocrinfgxtcnkpenofn34u4

Mining Knowledge for Natural Language Inference from Wikipedia Categories [article]

Mingda Chen, Zewei Chu, Karl Stratos, Kevin Gimpel
2020 arXiv   pre-print
We conduct systematic comparisons with phrases extracted from other knowledge bases such as WordNet and Wikidata to find that pretraining on WikiNLI gives the best performance.  ...  We show that we can improve strong baselines such as BERT and RoBERTa by pretraining them on WikiNLI and transferring the models on downstream tasks.  ...  Stratos and K. Gimpel.  ... 
arXiv:2010.01239v1 fatcat:ifgtrylycngyveks3yee6a24mi

The ApposCorpus: A new multilingual, multi-domain dataset for factual appositive generation [article]

Yova Kementchedjhieva, Di Lu, Joel Tetreault
2020 arXiv   pre-print
Polish), two entity types (person and organization) and two domains (Wikipedia and News).  ...  The results we obtain with standard language generation methods show that the task is indeed non-trivial, and leaves plenty of room for improvement.  ...  Here, we use the same architecture as above, but initialize the embedding matrix of the model with the NTEE (Neural Text-Entity Encoder) word embeddings, trained on Wikipedia with WikiData grounding (  ... 
arXiv:2011.03287v1 fatcat:qozon6w7v5fpbgej62rj74rsle
« Previous Showing results 1 — 15 out of 1,041 results