968 Hits in 3.9 sec

Valentine: Evaluating Matching Techniques for Dataset Discovery [article]

Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, Asterios Katsifodimos
2021 arXiv   pre-print
dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a  ...  guide for employing schema matching in future dataset discovery methods.  ...  Evaluating Matching Techniques for Discovery We use Valentine to evaluate the performance of multiple schema matching methods by applying them each time on a pair of denormalized tabular datasets with  ... 
arXiv:2010.07386v2 fatcat:rfxhcwrn6veqnofbxixutgoeke

Disentangled Representations from Non-Disentangled Models [article]

Valentin Khrulkov, Leyla Mirvakhabova, Ivan Oseledets, Artem Babenko
2021 arXiv   pre-print
These terms, however, introduce additional hyperparameters responsible for the trade-off between disentanglement and generation quality.  ...  While tuning these hyperparameters is crucial for proper disentanglement, it is often unclear how to tune them without external supervision.  ...  As a separate technical contribution, we propose a new simple technique, which outperforms the existing prior methods of controllable generation. • We extensively evaluate all the methods on several popular  ... 
arXiv:2102.06204v1 fatcat:yeg24kjkkzdnhhcxv4igm5mz3u

Dataset Search In Biodiversity Research: Do Metadata In Data Repositories Reflect Scholarly Information Needs? [article]

Felicitas Löffler, Valentin Wesp, Birgitta König-Ries, Friederike Klan
2020 arXiv   pre-print
We analyze the primary source in dataset search - metadata - and determine if they reflect scholarly search interests.  ...  However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice.  ...  The authors would also like to thank the annotators and reviewers for their time and valuable comments.  ... 
arXiv:2002.12021v1 fatcat:rnegb77pyvawjhpkupofib3m44

Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery

Rohit Nandakumar, Valentin Dinu
2020 PeerJ  
Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized.  ...  Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict  ...  Irina Moreira, for providing the code and existing dataset that this study is built on top of. I would also like to thank Dr.  ... 
doi:10.7717/peerj.10381 pmid:33354416 pmcid:PMC7727375 fatcat:5caz7tsfbnawnekkry6wz4745i

Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?

Felicitas Löffler, Valentin Wesp, Birgitta König-Ries, Friederike Klan, Hussein Suleman
2021 PLoS ONE  
Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.  ...  In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system.  ...  Acknowledgments The authors would like to thank the annotators for their time and valuable comments. Author Contributions Conceptualization: Felicitas Löffler, Friederike Klan.  ... 
doi:10.1371/journal.pone.0246099 pmid:33760822 fatcat:75vgcuhzibgbxeqrabb76uruje

The CMS data aggregation system

Valentin Kuznetsov, Dave Evans, Simon Metson
2010 Procedia Computer Science  
Even though we can apply an information retrieval approach to non-relational data sources, we can't do so for relational ones, where information is accessed via a pre-established set of data-services.  ...  It requires that all keywords are matched in a row from a single table or joined tables. Authors of [2] make a step forward and evaluate information around matched values.  ...  For example, the query dataset, file=abc, run > 10 is represented as {"fields": ["dataset"], "spec": {"file": "abc", "run": {"$gt": 10}}} Therefore users queries are stored into Analytics DB similar to  ... 
doi:10.1016/j.procs.2010.04.172 fatcat:u5m2debfsnexplralnti64c674

CRYSPNet: Crystal Structure Predictions via Neural Network [article]

Haotong Liang, Valentin Stanev, A. Gilad Kusne, Ichiro Takeuchi
2020 arXiv   pre-print
Standard theoretical tools for this task are computationally expensive and at times inaccurate. Here we present an alternative approach utilizing machine learning for crystal structure prediction.  ...  Made available to the public (at, it can be used both as an independent prediction engine or as a method to generate candidate structures for further computational  ...  ACKNOWLEDGEMENTS The authors are grateful to Peter Zavalij, Jason Hattrick Simpers, Brian DeCost, and Johnpierre Paglione for valuable discussions and suggestions, and to Stephan Rühl for help with ICSD  ... 
arXiv:2003.14328v1 fatcat:gty2zyjjwjgfjolumu4karpyom

A simple and effective predictive resource scaling heuristic for large-scale cloud applications [article]

Valentin Flunkert, Quentin Rebjock, Joel Castellon, Laurent Callot, Tim Januschowski
2020 arXiv   pre-print
We propose a simple yet effective policy for the predictive auto-scaling of horizontally scalable applications running in cloud environments, where compute resources can only be added with a delay, and  ...  Our contributions to predictive auto-scaling for cloud applications are as follows. (i) We develop an approach for analyzing and evaluating auto scaling policies for cloud applications.  ...  Datasets and predictors.  ... 
arXiv:2008.01215v1 fatcat:xgknfdvqr5ef7i76lpeoiqm3sm

Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System

Elena Arsevska, Sarah Valentin, Julien Rabatel, Jocelyn de Goër de Hervé, Sylvain Falala, Renaud Lancelot, Mathieu Roche, Fernanda C. Dórea
2018 PLoS ONE  
We evaluated the combined method for IE on a dataset of 352 disease-related news reports mentioning the diseases involved, locations, dates, hosts and the number of cases.  ...  The core component of PADI-web is a combined information extraction (IE) method founded on rule-based systems and data mining techniques.  ...  Chavernac for his contribution in developing the Data collection step in PADI-web. We thank B. Dufour for her expertise in epidemiological surveillance.  ... 
doi:10.1371/journal.pone.0199960 pmid:30074992 pmcid:PMC6075742 fatcat:broylolvtzbpvio4jxlednyhqm

Bayesian Nonparametrics for Offline Skill Discovery [article]

Valentin Villecroze, Harry J. Braviner, Panteha Naderian, Chris J. Maddison, Gabriel Loaiza-Ganem
2022 arXiv   pre-print
Recent work in offline reinforcement learning and imitation learning has proposed several techniques for skill discovery from a set of expert trajectories.  ...  We first propose a method for offline learning of options (a particular skill framework) exploiting advances in variational inference and continuous relaxations.  ...  Acknowledgements We thank the anonymous reviewers for their feedback, which helped improve our paper. We also thank Junfeng Wen for a useful suggestion on an ablation experiment.  ... 
arXiv:2202.04675v3 fatcat:3qpvhu3qmvbpbjfofpxzzmydmy

Differential expression of microRNAs as predictors of glioblastoma phenotypes

Barrie S Bradley, Joseph C Loftus, Clinton J Mielke, Valentin Dinu
2014 BMC Bioinformatics  
Results: Our research data comprise gene expression values for a set of 805 human miRs collected from matched pairs of migratory and migration-restricted cell populations from seven different glioblastoma  ...  We identified 62 down-regulated and 2 up-regulated miRs that exhibit significant differential expression in the migratory (edge) cell population compared to matched migration-restricted (core) cells.  ...  SAS 9.2 was used for all statistical analysis. Statistical output for each analyzed miR included tests for normalcy to ensure appropriateness of analytical techniques.  ... 
doi:10.1186/1471-2105-15-21 pmid:24438171 pmcid:PMC3901345 fatcat:kce4yol57fhvtb3oxmqwwggdju

BING: Biomedical informatics pipeline for Next Generation Sequencing

Jeffrey Kriseman, Christopher Busick, Szabolcs Szelinger, Valentin Dinu
2010 Journal of Biomedical Informatics  
This manuscript introduces a biomedical informatics pipeline (BING) for the analysis of NGS data that offers several novel computational approaches to 1. image alignment, 2. signal correlation, compensation  ...  David Craig for his contributions to this research effort.  ...  This approach is a more efficient and simplified technique, removing the need for complex base calling algorithms.  ... 
doi:10.1016/j.jbi.2009.11.003 pmid:19925883 fatcat:v3m4nfcqujgqpftf5r6ck7vvma

CuBlock: a cross-platform normalization method for gene-expression microarrays

Valentin Junet, Judith Farrés, José M Mas, Xavier Daura
2021 Bioinformatics  
, applicable in systematic studies aimed at extracting knowledge from the wealth of microarray data available in public repositories; for example, for the extraction of Real-World Data to complement data  ...  Our main focus or criterion for performance was on the capacity of the algorithm to properly separate samples from different biological groups.  ...  Acknowledgements The authors thank Malu Calle for useful discussions during the preparation of the manuscript.  ... 
doi:10.1093/bioinformatics/btab105 pmid:33609102 fatcat:6nvcjwpzurgoxezyznrvt4nel4

Agronomic Linked Data (AgroLD): a Knowledge-based System to Enable Integrative Biology in Agronomy [article]

Aravind Venkatesan, Gildas Tagny Ngompe, Nordine El Hassouni, Imene Chentli, Valentin Guignon, Clement Jonquet, Manuel Ruiz, Pierre Larmande
2018 bioRxiv   pre-print
Our evaluation results show users appreciate the multiple query modes which support different use cases.  ...  We are facing an urgent need to effectively integrate and assimilate complementary datasets to understand the biological system as a whole.  ...  Acknowledgments Authors thank the technical staffs of the South Green Bioinformatics platform for their support. Authors thank the providers of databases listed in  ... 
doi:10.1101/325423 fatcat:57xxtwu3uvc55jflrjmycsw2ja

Development of a Framework to Understand Tables in Engineering Specification Documents

Valentin Agossou, Hyo-Won Suh, Heejung Lee, Jae Hyun Lee
2020 Applied Sciences  
The proposed framework could be used for searching product specification and for discovering hidden knowledge from tables in engineering specification documents.  ...  Several works have been done in the last decades for understanding tables in documents, but most of them were not specifically designed to understand tables in engineering specification documents.  ...  After training, a set of test data is provided to evaluate the neural network. A dataset for 80 tables was used, and 60 of them were used for training and the rest were used for testing.  ... 
doi:10.3390/app10186182 fatcat:osrpfdf7xngmdgjrdjmxxugaki
« Previous Showing results 1 — 15 out of 968 results