A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Databases of discovery
2005
Queue
The National Center for Biotechnology Information (NCBI) is the part of the National Institutes of Health (NIH) responsible for the largest public bibliographic database in biomedicine (PubMed), the U.S. national DNA sequence database (GenBank), an online free full text research article database, PubMed Central (PMC), assembly, annotation, and distribution of a reference set of genes, genomes, and chromosomes (RefSeq) from human to viruses, through online text search and retrieval systems
doi:10.1145/1059791.1059806
pmid:16467894
pmcid:PMC1343446
fatcat:3rnhanaw6rao3o5ul5ktmulyym
more »
... z) and specialized molecular biology data search engines (BLAST, CDD search, others), as well dozens of other resources. At the time of writing this article, NCBI receives about 50 million web hits a day, at peak rates of about 1900 hits a second, and about 400,000 BLAST searches a day from about 2.5 million users a day. The web site transfers about 0.6 terabytes a day and people interested in local copies of bulk data ftp about 1.2 terabytes a day.
Digital BioCuration: A Question of Balance
2009
Nature Precedings
Human Dog
doi:10.1038/npre.2009.3257.1
fatcat:ys2pz6sj2bbyxkbxbx3r5v45xu
GenBank
1993
Nucleic Acids Research
The GenBank sequence database has undergone an expansion in data coverage, annotation content and the development of new services for the scientific community. In addition to nucleotide sequences, data from the major protein sequence and structural databases, and from U.S. and European patents is now included in an integrated system. MEDLINE abstracts from published articles describing the sequences provide an important new source of biological annotation for sequence entries. In addition to
doi:10.1093/nar/21.13.2963
pmid:8332518
pmcid:PMC309721
fatcat:4oqnfqywpnfybm66uzinsgbmwi
more »
... continued support of existing services, new CD-ROM and network-based systems have been implemented for literature retrieval and sequence similarity searching. Major releases of GenBank are now more frequent and the data are distributed in several new forms for both end users and software developers.
The NCBI Data Model
[chapter]
2006
Methods of biochemical analysis
Detailed discussions about the choice of ASN.1 for this task and its overall form can be found elsewhere (Ostell, 1995) .
What to Define? ...
doi:10.1002/9780470110607.ch6
fatcat:ainfsykafrg2lalrdz6tdeoj6u
GenBank
1994
Nucleic Acids Research
The GenBank sequence database continues to expand its data coverage, quality control, annotation content and retrieval services for the scientific community. Besides handling direct submissions of sequence data from authors, GenBank also incorporates DNA sequences from all available public sources; an integrated retrieval system, known as Entrez, also makes available data from the major protein sequence and structural databases, and from U.S. and European patents. MEDLINE abstracts from
doi:10.1093/nar/22.17.3441
pmid:7937042
pmcid:PMC308298
fatcat:ihc23fsh7rgixocbbqu6rfweyi
more »
... d articles describing the sequences are also included as an additional source of biological annotation for sequence entries. GenBank supports distribution of the data via FTP, CD-ROM, and E-mail servers. Network serverclient programs provide access to an integrated database for literature retrieval and sequence similarity searching.
GenBank
2015
Nucleic Acids Research
GenBank R (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for over 340 000 formally described species. Recent developments include a new starting page for submitters, a shift toward using accession.version identifiers rather than GI numbers, a wizard for submitting 16S rRNA sequences, and an Identical Protein Report to address growing issues of data redundancy. GenBank organizes the sequence data received from individual
doi:10.1093/nar/gkv1276
pmid:26590407
pmcid:PMC4702903
fatcat:czyvsrb4gffjfop3wlvryqj7km
more »
... ies and largescale sequencing projects into 18 divisions, and Gen-Bank staff assign unique accession.version identifiers upon data receipt. Most submitters use the web-based BankIt or standalone Sequin programs. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the nuccore, nucest, and nucgss databases of the Entrez retrieval system, which integrates these records with a variety of other data including taxonomy nodes, genomes, protein structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP.
GenBank
2009
Nucleic Acids Research
GenBank Õ is a comprehensive database that contains publicly available nucleotide sequences for more than 300 000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff
doi:10.1093/nar/gkp1024
pmid:19910366
pmcid:PMC2808980
fatcat:5hx2nirjdjbrfj3khqjbfp5avq
more »
... n receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.
GenBank
2013
Nucleic Acids Research
GenBank Õ is a comprehensive database that contains publicly available nucleotide sequences for over 280 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assign accession numbers upon data receipt.
doi:10.1093/nar/gkt1030
pmid:24217914
pmcid:PMC3965104
fatcat:wjdyeblmtbh2nchbemgybvfsle
more »
... aily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.
GenBank
2014
Nucleic Acids Research
GenBank R (http://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for over 300 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from largescale sequencing projects, including whole-genome shotgun and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assign
doi:10.1093/nar/gku1216
pmid:25414350
pmcid:PMC4383990
fatcat:rrzuod6pgrah3hq4nyuy6bzpyy
more »
... accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP.
GenBank
2016
Nucleic Acids Research
GenBank ® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 370 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or the NCBI Submission Portal. GenBank staff assign
doi:10.1093/nar/gkw1070
pmid:27899564
pmcid:PMC5210553
fatcat:u2pjbbcpwvgifdolo6wmqt5vvi
more »
... numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include changes to policies regarding sequence identifiers, an improved 16S submission wizard, targeted loci studies, the ability to submit methylation and BioNano mapping files, and a database of anti-microbial resistance genes.
GenBank
2012
Nucleic Acids Research
GenBank Õ (http://www.ncbi.nlm.nih.gov) is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from largescale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns
doi:10.1093/nar/gks1195
pmid:23193287
pmcid:PMC3531190
fatcat:jjf53eywsndvpbfees2ly4jmxi
more »
... accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.
GenBank
2017
Nucleic Acids Research
GenBank ® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 400 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects. Most submissions are made using BankIt, the National Center for Biotechnology Information (NCBI) Submission Portal, or
doi:10.1093/nar/gkx1094
pmid:29140468
pmcid:PMC5753231
fatcat:tcapoinydngldd7erbnmv3x2aq
more »
... he tool tbl2asn. GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include changes to sequence identifiers, submission wizards for 16S and Influenza sequences, and an Identical Protein Groups resource.
A tool for aligning very similar DNA sequences
1997
Bioinformatics
Results: We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is
doi:10.1093/bioinformatics/13.1.75
fatcat:3brmm33avjfcholmsfkmdyz4am
more »
... to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information. Availability: A version of sim3 for UNIX machines can be obtained by anonymous ftp from ncbi. nlm. nih. gov, in the publsimS directory.
A local alignment tool for very long DNA sequences
1995
Bioinformatics
This paper presents a practical program, called sim2, for building local alignments of two sequences, each of which may be hundreds of kilobases long. Sim2 first constructs n best non-intersecting chains of "fragments," such as all occurrences of identical 5-tuples in each of two DNA sequences, for any specified n ≥ 1. Each chain is then refined by delivering an optimal alignment in a region delimited by the chain. Sim2 requires only space proportional to the size of the input sequences and the
doi:10.1093/bioinformatics/11.2.147
fatcat:ly57dervybbele25ya567kbxkm
more »
... output alignments, and the same source code runs on UNIX machines, on Macintosh, on PC, and on DEC ALPHA PC. We also describe an application of sim2 for aligning long DNA sequences from E. coli. Sim2 facilitates contig-building by providing a complete view of the related sequences, so differences can be analyzed and inconsistencies resolved. Examples are shown using the alignment display and editing functions from the software tool, ChromoScope.
Spidey: A Tool for mRNA-to-Genomic Alignments
2001
Genome Research
METHODS
Spidey -Design and Overview Spidey is written in C and is incorporated in the NCBI Toolkit (Ostell 1996) . ...
It relies heavily on the alignment manager (Wheelan and Ostell, unpubl.) , which is an indexing system used for easy management of and quick access to alignments and sets of alignments. ...
doi:10.1101/gr.195301
pmid:11691860
pmcid:PMC311166
fatcat:ibsx3vkkc5g7bjdwp6ovrnysxq
« Previous
Showing results 1 — 15 out of 443 results