Glossary

F LEWITTER
1998 Trends in Biotechnology  
Affine gap costs -A scoring system for gaps within alignments that charges a penalty for the existence of a gap and an additional per-residue penalty proportional to the gap's length. Algorithm -A fixed procedure, embodied in a computer program. Alignment score -A numerical value that describes the overall quality of an alignment. Higher numbers correspond to higher similarity. Batch Entrez -A feature in Entrez that allows the retrieval of many sequences at once and saves the sequences to a
more » ... on a local computer. This is particularly useful if you want to download a set of sequences to analyze locally (for example, to do a multiple alignment on a local computer). bfind mode-One of two modes used by DBGET. Keywords can be entered in a search box. See 'bget' mode. bget mode -One of two modes used by DBGET. Entry name or the accession number of an entry can be entered in a search box. See 'bfind mode'. Biochemical pathway -A network of interacting molecules that is responsible for a specific biochemical function, such as a metabolic pathway or a signal transduction pathway. Bit score -A scaled version of an alignment's raw score that accounts for the statistical properties of the scoring system used. BLAST -Basic Local Alignment Search Tool. A heuristic sequence comparison algorithm, developed by researchers at the National Center for Biotechnology Information (NCBI) and others, that is used to search sequence databases for optimal local alignments to a query. Bootstrapping -A statistical method that is often used to estimate the reproducibility of specific features of phylogenetic trees. Cluster analysis -A process of assigning data points (sequences) into groups (clusters), starting from pairwise distances. Useful for identifying outliers and weak links between groups. Fairly easy to do by hand for small datasets. Command line -Interacting with software by typing specific commands. Generally considered less 'user friendly' than a 'graphical user interface'. Comparative genomics -The study of comparing complete genome sequences, often by computational methods, to understand general principles of genome structure and function. Content -An extended or variable-length region of genomic DNA with a particular function, such as an exon. Controlled vocabulary -A vocabulary that contains specific words that are consistently applied to all entries in a database. The MeSH system is an example of a controlled vocabulary. Deductive database -A database that contains both facts (often in the form of a relational database) and rules for reasoning (often in logic programming) so that new facts can be dynamically generated from stored facts. DNA chip technology -New technology for parallel processing thousands of DNA segments, such as for detecting mutation patterns in genomic DNAs or expression patterns of mRNAs. Domain -A portion of a protein that folds independently of the rest of the protein, or is at least assumed to do so. DUST -Program for filtering low-complexity regions of DNA structure. Dynamic programing -A type of algorithm widely used for constructing sequence alignments and for evaluating all possible candidate gene structures. E value -Expectation value. The number of distinct alignments, with score equivalent to or better than the one of interest, that are expected to occur in a database search purely by chance. The lower the E value, the more significant the score is. EST -Expressed Sequence Tag. A short cDNA (complementary DNA) sequence. Extreme value distribution -The probability distribution applicable to the scores of optimal local alignments. Family, subfamily, superfamily -Family groups, sequences or domains that are clearly related and usually have a similar function. A superfamily groups several families that are related by (divergent) evolution and usually still share some functional elements. A subfamily groups sequences within a family that are particularly closely related.There is no truly accepted consensus on the use of these terms.
doi:10.1016/s0167-7799(98)00136-x fatcat:mwgzfif7qfbu3bp3decsod2rhq