Filters








1,800 Hits in 4.8 sec

Cleaning Noisy and Heterogeneous Metadata for Record Linking across Scholarly Big Datasets

Athar Sefid, Jian Wu, Allen C. Ge, Jing Zhao, Lu Liu, Cornelia Caragea, Prasenjit Mitra, C. Lee Giles
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset.  ...  One common way of cleaning metadata is to use a bibliographic reference dataset. The challenge is to match records between corpora with high precision.  ...  Introduction Since the advent of Scholarly Big Data (SBD) (Giles 2013) , there has been a growing interest in topics related to this big data instance, such as scholarly article discovery (Wesley-Smith  ... 
doi:10.1609/aaai.v33i01.33019601 fatcat:6b47fnxhbvcith6k7f6y5wzbwi

CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision

2019 Bioinformatics  
CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest.  ...  Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities.  ...  Intomics A/S is a contract research organization specialized in deriving core biological insight from big data.  ... 
doi:10.1093/bioinformatics/btz490 pmid:31199464 pmcid:PMC6956794 fatcat:exveu52lavgdpebnzzglzvzdwm

An Effective Entity Resolution Approach for Big Data

Randa Mohamed Abd El-ghafar, Department of Computer Science, Faculty of Graduate Studies for Statistical Research, Cairo University, Cairo, Egypt., Ali H. El-Bastawissy, Eman S. Nasr, Mervat H. Gheith, Faculty of Computer Science, Modern Sciences and Arts University, Cairo, Egypt., Independent Researcher, Cairo, Egypt., Department of Computer Science, Faculty of Graduate Studies for Statistical Research, Cairo University, Cairo, Egypt.
2021 VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE  
To overcome such issues, we propose a novel and efficient ER approach for big data implemented in Apache Spark.  ...  In addition, schema alignment of multiple datasets is not an easy task and may require either domain expert or ML algorithm to select which attributes to match.  ...  Supervised learning approaches aim to classify pairs of records as matched and unmatched based on a labeled dataset that will be trained using a machine learning classification model.  ... 
doi:10.35940/ijitee.k9503.09101121 fatcat:ghpexqsj2zbhtnixb7ordefoii

Extraction and Evaluation of Knowledge Entities from Scientific Documents

Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang
2021 Journal of Data and Information Science  
In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting  ...  As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field.  ...  Experiments compared word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.  ... 
doi:10.2478/jdis-2021-0025 fatcat:zevtlmfdbrenfpxrxb6cqefga4

Automatic Machine Learning Derived from Scholarly Big Data [article]

Asnat Greenstein-Messica, Roman Vainshtein, Gilad Katz, Bracha Shapira, Lior Rokach
2020 arXiv   pre-print
One of the challenging aspects of applying machine learning is the need to identify the algorithms that will perform best for a given dataset.  ...  to be used on the dataset.  ...  Unlike the meta-learning approach, which requires a large amount of datasets for each dataset cluster to train a machine learning model for algorithm recommendation, Sommelier relies on the scholarly big  ... 
arXiv:2003.03470v1 fatcat:yptnuspb2vealimgbdsgidoyfq

Neural Entity Linking: A Survey of Models Based on Deep Learning [article]

Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, Chris Biemann
2021 arXiv   pre-print
techniques including zero-shot and distant supervision methods, and cross-lingual approaches.  ...  In this survey, we provide a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in NLP.  ...  The work of Artem Shelmanov in the current study (preparation of sections related to application of entity linking to neural language models, entity ranking, contextmention encoding, and overall harmonization  ... 
arXiv:2006.00575v3 fatcat:ra3kwc4tmbfhlmgtlevkcshcqq

Cloud-Based Big Data Management and Analytics for Scholarly Resources: Current Trends, Challenges and Scope for Future Research [article]

Samiya Khan, Kashish A. Shakil, Mansaf Alam
2016 arXiv   pre-print
This research paper reviews the current trends and identifies the challenges existing in the architecture, services and applications of big scholarly data platform with a specific focus on directions for  ...  As a result, there is a growing need for scholarly applications like collaborator discovery, expert finding and research recommendation systems.  ...  It is important to understand that the big scholarly dataset is not just limited to scholarly documents.  ... 
arXiv:1606.01808v1 fatcat:pl6eoais75dxpckxw7xiz5cdri

A survey on scholarly data: From big data perspective

Samiya Khan, Xiufeng Liu, Kashish A. Shakil, Mansaf Alam
2017 Information Processing & Management  
It is important to understand that the big scholarly dataset is not just limited to scholarly documents.  ...  As a result, one canonical name can be used to cluster several entities, giving rise to name-entity resolution problem [99] .  ... 
doi:10.1016/j.ipm.2017.03.006 fatcat:3asm74kqwrg4bdqe6l7u2wgseq

Neural entity linking: A survey of models based on deep learning

Özge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, Chris Biemann, Mehwish Alam, Davide Buscaldi, Michael Cochez, Francesco Osborne, Diego Reforgiato Recupero, Harald Sack
2022 Semantic Web Journal  
including zero-shot and distant supervision methods, and cross-lingual approaches.  ...  This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in natural language processing.  ...  The work of Artem Shelmanov in the current study (preparation of sections related to application of entity linking to neural language models, entity ranking, context-mention encoding, and overall harmonization  ... 
doi:10.3233/sw-222986 fatcat:6gwmbtev7ngbliovf6cpf5hyde

Large-Scale Extraction of Canonical References: Challenges and Prospects

Matteo Romanello
2020 Zenodo  
Pre-print of a chapter to appear in an edited volume of proceedings.  ...  A supervised machine learning approach was adopted for the extraction of named entities from text, as this makes the system more easily extendable to new types of entities (e.g. references to papyri or  ...  This consideration applies to the steps performed using a machine learning approach (currently, only the extraction of named entities).  ... 
doi:10.5281/zenodo.3736454 fatcat:hzjgh7nv3revth44g5hd2xajua

CoCoScore: Context-aware co-occurrence scoring for text mining applications using distant supervision [article]

Alexander Junge, Lars Juhl Jensen
2018 bioRxiv   pre-print
CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest.  ...  Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities.  ...  Acknowledgements We would like to thank Rūdolfs Bērziņš for helpful discussions of this works as well as the team of the Danish National Supercomputer for Life Sciences (Computerome) for HPC support.  ... 
doi:10.1101/444398 fatcat:knvlwgax2jdo7evlkgcrb3jzk4

Generating Knowledge Graphs by Employing Natural Language Processing and Machine Learning Techniques within the Scholarly Domain [article]

Danilo Dessì, Francesco Osborne, Diego Reforgiato Recupero, Davide Buscaldi, Enrico Motta
2020 arXiv   pre-print
entities and relationships generated by these tools, iii) show the advantage of such an hybrid system over alternative approaches, and vi) as a chosen use case, we generated a scientific knowledge graph  ...  As such, in this paper, we present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications  ...  It contributes to the IdEx Université de Paris -ANR-18-IDEX-0001. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.  ... 
arXiv:2011.01103v1 fatcat:zhyjap4ukvgixdvapx6frk7yce

Information extraction from digital social trace data with applications to social media and scholarly communication data

Shubhanshu Mishra
2020 SIGIR Forum  
The thesis aims to act as a bridge between research questions and techniques used in DSTD from different domains.  ...  datasets.  ...  Connection with DSTD This chapter described an approach to improve named entity recognition using semi-supervised learning.  ... 
doi:10.1145/3451964.3451981 fatcat:36djwlckprhl5hymzhivrbnscu

Constrained Semi-supervised Learning in the Presence of Unanticipated Classes

Bhavana Bharat Dalvi
2016 SIGIR Forum  
A third approach to SSL involves unsupervised dimensionality reduction followed by supervised learning (e.g., [122, 125] ).  ...  Can a semi-supervised learning method leverage these class constraints to learn better classifiers?  ...  The hyponym dataset was constructed by finding all fillers that match one of the regular expressions in Table A .4.  ... 
doi:10.1145/2888422.2888447 fatcat:nqtcg5n5brbvphvh4c3clyrcei

Feature Selection For Web Page Classification Using Swarm Optimization

B. Leela Devi, A. Sankar
2015 Zenodo  
The extracted features were tested on the WebKB dataset using a parallel Neural Network to reduce the computational cost.  ...  The web's increased popularity has included a huge amount of information, due to which automated web page classification systems are essential to improve search engines' performance.  ...  CONCLUSION Automatic Web-page classification by using hypertext is a big approach to categorize large Webpage quantities.  ... 
doi:10.5281/zenodo.1099636 fatcat:bkkhpgxj55blpd2rpbga7uh3ny
« Previous Showing results 1 — 15 out of 1,800 results