Filters








85 Hits in 8.6 sec

An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema

Roman Zoun, Kay Schallert, David Broneske, Ivayla Trifonova, Xiao Chen, Robert Heyer, Dirk Benndorf, Gunter Saake
2021 Algorithms  
Hence, the radix tree is a suitable data structure for transforming protein sequences into the indexed schema.  ...  Hence, in this work, we present a schema for distributed column-based database management systems using a column-oriented index to store sequence data.  ...  This work was a cooperation with Bruker Daltonik GmbH and is dedicated to the memory of Mikhail Zoun.  ... 
doi:10.3390/a14020059 fatcat:xopdgbyrizerrm3ymza7zq43xu

Benchmarking distributed data warehouse solutions for storing genomic variant information

Marek S. Wiewiórka, Dawid P. Wysakowicz, Michał J. Okoniewski, Tomasz Gambin
2017 Database: The Journal of Biological Databases and Curation  
To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated  ...  At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying.  ...  Supplementary data Conflict of interest. None declared.  ... 
doi:10.1093/database/bax049 pmid:29220442 pmcid:PMC5504537 fatcat:hgwwc2buifbjfj5i77jrxeh6xi

Rethinking Data-Intensive Science Using Scalable Analytics Systems

Frank Austin Nothaft, Michael Linderman, Michael J. Franklin, Anthony D. Joseph, David A. Patterson, Matt Massie, Timothy Danford, Zhao Zhang, Uri Laserson, Carl Yeksigian, Jey Kottalam, Arun Ahuja (+1 others)
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
To demonstrate the generality of our architecture, we then implement a scalable astronomy image processing system which achieves a 2.8-8.9× improvement over the state-of-the-art MPI-based system.  ...  In this paper, we describe ADAM, an example genomics pipeline that leverages the open-source Apache Spark and Parquet systems to achieve a 28× speedup over current genomics pipelines, while reducing cost  ...  As ADAM is an open source project, we also would like to thank the community members who have contributed code and use cases to the project, and would especially like to thank Neil Ferguson, Andy Petrella  ... 
doi:10.1145/2723372.2742787 dblp:conf/sigmod/NothaftMDZLYKAH15 fatcat:nokfli3y4fe6zi6avrluhncvau

IBM Functional Genomics Platform, A Cloud-Based Platform for Studying Microbial Life at Scale [article]

Edward E. Seabolt, Gowri Nayar, Harsha Krishnareddy, Akshay Agarwal, Kristen L. Beck, Ignacio Terrizzano, Eser Kandogan, Mary Roth, Vandana Mukherjee, James H. Kaufman
2020 arXiv   pre-print
The database can be queried across hundreds of millions of entities and returns results in a fraction of the time required by traditional methods.  ...  The rapid growth in biological sequence data is revolutionizing our understanding of genotypic diversity and challenging conventional approaches to informatics.  ...  ACKNOWLEDGMENT The authors would like to acknowledge Dr. C. A.  ... 
arXiv:1911.02095v3 fatcat:bn6kssh2ezbqnashkxqzga3faq

An overview of graph databases and their applications in the biomedical domain

Santiago Timón-Reina, Mariano Rincón, Rafael Martínez-Tomás
2021 Database: The Journal of Biological Databases and Curation  
In this work, we survey the literature to explore the evolution, performance and how the most recent graph database solutions are applied in the biomedical domain, compiling a great variety of use cases  ...  From early graph models to more recent native graph databases, the landscape of implementations has evolved to cover enterprise-ready requirements.  ...  In this case, the modeling strategy follows a protein-centric approach without a rigid schema or upper model (such as an ontology).  ... 
doi:10.1093/database/baab026 pmid:34003247 pmcid:PMC8130509 fatcat:xku5npedwzgs3ayzsuvz6iattq

Report from the 3rd Workshop on Extremely Large Databases

Jacek Becla, Kian-Tat Lim, Daniel Liwei Wang
2010 Data Science Journal  
MonetDB presented a successful port of the SDSS multi-terabyte database. Cloudera discussed activities to support the Hadoop community.  ...  SciDB demonstrated a from-scratch prototype supporting an ndimensional array data model, running in a shared-nothing environment.  ...  MR developers have also explored the addition of indexes, schemas, and other database-ish features 12 . Some are building a complete relational database system on top of MR 13 .  ... 
doi:10.2481/dsj.xldb09 fatcat:574dpairjbb6zh2l5qywipgm4m

Advancing clinical cohort selection with genomics analysis on a distributed platform

Jaclyn M. Smith, Melvin Lathara, Hollis Wright, Brian Hill, Nalini Ganapati, Ganapati Srinivasa, Christopher T. Denny, Jianjiong Gao
2020 PLoS ONE  
The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research  ...  This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort  ...  Additional thanks to the Digital technology team at DGSOM and the UCLA CTSA I2B2 support team who helped in the deployment and testing of the framework at UCLA.  ... 
doi:10.1371/journal.pone.0231826 pmid:32324802 fatcat:tmjjrbkfjjb4hdhfaboh367sbu

29th International Conference on Data Engineering [book of abstracts]

2013 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW)  
of relational data let us find an alternative to cope with them.  ...  It presents users an alternative interface to browse tweets more effectively. TUe/9 a generic Database Benchmarking service Martin Kaufmann (ETH Zürich / SAP AG) Peter M.  ...  VolUnTeers ICDE-13 would like to extend our warm appreciation to our conference volunteers who assisted before, during and after the conference, to help make sure that everyone enjoys a great conference  ... 
doi:10.1109/icdew.2013.6547409 fatcat:wadzpuh3b5htli4mgb4jreoika

Graphical pangenomics

Erik Garrison, Richard Durbin
2018 Zenodo  
By indexing the topology, sequence space, and haplotype space of these graphs and developing generalizations of sequence alignment suitable to them, I am able to use them as reference systems in the analysis  ...  Completely sequencing genomes is expensive, and to save costs we often analyze new genomic data in the context of a reference genome.  ...  When aligning a short sequence against a large database we expect to obtain a sensitive alignment, but provided sufficient homology between the sequence and the database it is unlikely that we need to  ... 
doi:10.5281/zenodo.3269840 fatcat:glracsk2jvgb3lehvepq2jl5l4

Isthmal stem cells sustain intestinal homeostasis and regeneration [article]

E. Malagola, A. Vasciaveo, Y. Ochiai, W. Kim, M. Middelhoff, H. Nienhüser, B. Belin, J. LaBella, LB. Zamechek, M.H. Wong, L. Li, C. Guha (+3 others)
2022 bioRxiv   pre-print
To test if an alternative model may help reconcile these perspectives, we studied the hierarchical organization of crypt epithelial cells in an unbiased fashion, by combining high-resolution, single-cell  ...  However, recent intestinal regeneration studies have uncovered limitations of the 'Lgr5-CBC' model2, 3, leading to two major views: one favoring the presence of a quiescent reserve stem cell population4  ...  PPIs were modeled using PrePPI 15 , a largescale database of human PPI, by retaining the top 5% of high confidence interactions.  ... 
doi:10.1101/2022.04.26.489611 fatcat:apgweph35vdqpj4h3vm2fljiua

Proteomics as a Complementary Technique to Characterize Bladder Cancer

Rubén López-Cortés, Sergio Vázquez-Estévez, Javier Álvarez Fernández, Cristina Núñez
2021 Cancers  
However, innovations in next-generation sequencing have led to molecular classifications of BC.  ...  In parallel, immunohistochemistry is still the clinical reference to discriminate histological layers and to stage BC. Key contributions have been made to enlarge the panel of protein immunomarkers.  ...  In addition, SRM-MS and MRM-MS offer great sensitivity and accuracy for quantifying a selection of proteins of interest; hence, they are an alternative to antibody-based methods with the added advantage  ... 
doi:10.3390/cancers13215537 pmid:34771699 pmcid:PMC8582709 fatcat:h4ufnvu64vdzbaufitkgn5fevy

Addressing big data variety using an automated approach for data characterization

Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson
2022 Journal of Big Data  
A principal challenge with Variety is being able to understand and comprehend the data.  ...  The focus of the experiments was to confirm that repetitive manual tasks can be automated, thus reducing the focus of a Data Scientist on data identification and thereby providing more focus towards the  ...  A file can be delimited with any character or sequence of characters.  ... 
doi:10.1186/s40537-021-00554-3 fatcat:66xln54bajd37gzz3m3ln2fdga

The Medaka Inbred Kiyosu-Karlsruhe (MIKK) Panel [article]

Tomas W Fitzgerald, Ian Brettell, Adrien Leger, Nadeshda Wolf, Natalja Kusminski, Jack Monahan, Carl Barton, Cathrin Herder, Narendar Aadepu, Jakob Gierten, Clara Becker, Omar Hammouda (+12 others)
2021 bioRxiv   pre-print
The teleost medaka (Oryzias latipes) is an established genetic model system with a long history of genetic research and a high tolerance to inbreeding from the wild.  ...  To address this, we have established a vertebrate genetic resource specifically to allow for robust genotype-to-phenotype investigations.  ...  The RepeatModeler library of repeats was filtered to remove non-TE protein coding sequences by using a protein BLAST (Altschul et al., 1990) to align (E-value ≤ 1e-5) the Oryzias latipes proteome (Ensembl  ... 
doi:10.1101/2021.05.17.444412 fatcat:qqsdf3li4fc5xege3abqmgq7ke

Management and Administration [chapter]

2012 Advances in Police Theory and Practice  
Another focus is the imaging of custom developed artificial substrates to challenge the cells with a defined biophysical stimuli (Yoshikawa et al. 2011) or microfluidic chambers for chemical stimulation  ...  Acquisition can be guided by image analysis: during a low resolution scan of the 96 well plate positions of interesting cells can be identified for high content imaging.  ...  Mergers of core facilities from other institutions might be an alternative to retain the advantages of a local facility and at an affordable price.  ... 
doi:10.1201/b12530-8 fatcat:mkscpdo24nbznpk3xdhp4i6pdm

The complexity of pancreatic ductal cancers and multidimensional strategies for therapeutic targeting

Scott E Kern, Chanjuan Shi, Ralph H Hruban
2010 Journal of Pathology  
As a result, multiple variables affect success when individualizing screening or therapy.  ...  Simplistic expectations, often falsely optimistic, for individualized care may fail to 'pan out' in the real world.  ...  We can use the resultant database of mutations (an average of 63 somatic mutations per tumour!)  ... 
doi:10.1002/path.2813 pmid:21125682 pmcid:PMC3767138 fatcat:mjg3lf5hmnevno5d7wxpod6x3e
« Previous Showing results 1 — 15 out of 85 results