- Biological database
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high throughput experiment technology, and computational analyses. They contain information from research areas including
genomics, proteomics, metabolomics, microarraygene expression, and phylogenetics. [cite journal |author=Altman RB |title=Building successful biological databases |journal=Brief. Bioinformatics |volume=5 |issue=1 |pages=4–5 |year=2004 |month=March |pmid=15153301 |doi= |url=http://bib.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15153301] Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. Relational databaseconcepts of computer scienceand Information retrievalconcepts of digital libraries are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of Bioinformatics. [cite journal |author=Bourne P |title=Will a biological database be different from a biological journal? |journal=PLoS Comput. Biol. |volume=1 |issue=3 |pages=179–81 |year=2005 |month=August |pmid=16158097 |doi=10.1371/journal.pcbi.0010034 |url=] . Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi- structured data, and can be represented as tables, key delimited records, and XML structures. Cross-references among databases are common, using database accession numbers.
Biological databases have become an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of
biomolecules and their interaction, to the whole metabolismof organisms and to understanding the evolutionof species. This knowledge helps facilitate the fight against diseases, assists in the development of medications and in discovering basic relationships amongst species in the history of life.
The biological knowledge is distributed amongst many different general and specialized databases. This sometimes makes it difficult to ensure the consistency of information. Biological databases cross-reference other databases with accession numbers as one way of linking their related knowledge together.
An important resource for finding biological databases is a special yearly issue of the journal
Nucleic Acids Research(NAR). The [http://www3.oup.co.uk/nar/database/c/ Database Issue of NAR] is freely available, and categorizes many of the publicly available online databases related to biologyand bioinformatics.
Example public databases for molecular biology
(from [http://www.kokocinski.net/bioinformatics/databases.php www.kokocinski.net] )
Primary sequence databases
The International Nucleotide Sequence Database (INSD) consists of the following databases.
# [http://www.ddbj.nig.ac.jp/Welcome-e.html DDBJ ] (DNA Data Bank of Japan)
# [http://www.ebi.ac.uk/embl/index.html EMBL Nucleotide DB] (
European Molecular Biology Laboratory)
GenBank[http://www.ncbi.nlm.nih.gov/Genbank/index.html] ( National Center for Biotechnology Information)These databanks represent the current knowledge about the sequences of all organisms. They interchange the stored information and are the source for many other databases.
Strictly speaking a meta-database can be considered a database of databases, rather than any one integration project or technology. They collect data from different sources and usually makes them available in new and more convenient form, or with an emphasis on a particular disease or organism.
Entrez[http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi] ( National Center for Biotechnology Information)
# [http://eugenes.org euGenes] (Indiana University)
# [http://www.genecards.org GeneCards] (Weizmann Inst.)
# [http://genome-www4.stanford.edu/cgi-bin/SMD/source/sourceSearch SOURCE] (
# [http://www.cyber-indian.com/bioperl/index.html mGen] containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ - easy and simple program friendly gene extraction
Bioinformatic Harvester[http://harvester.fzk.de] ( Karlsruhe Institute of Technology) - Integrating 26 major protein/gene resources.
MetaBase[http://BioDatabase.Org] ( KOBIC) - A user contributed database of biological databases. GenomeDatabases
These databases collect organism
genomesequences, annotate and analyze them, and provide public access. Some add curationof experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organismgenome.
Ensemblprovides automatic annotation databases for human, mouse, other vertebrateand eukaryotegenomes.
# [http://genome.jgi.doe.gov/ JGI Genomes] of the DOE-
Joint Genome Instituteprovides databases of many eukaryoteand microbialgenomes.
# [http://camera.calit2.net/index.php/ CAMERA] Resource for microbial genomics and metagenomics
# [http://www.informatics.jax.org MGI Mouse Genome] (Jackson Lab.)
# [http://www.maizegdb.org/ Corn] , the Maize Genetics and Genomics Database
Saccharomyces Genome Database, genome of the yeastmodel organism.
Wormbase, genome of the model organism Caenorhabditis elegans
Flybase, genome of the model organism Drosophila melanogaster
Zebrafish Information Network, genome of this fishmodel organism.
# [http://troy.bioc.uvic.ca/ Viral Bioinformatics Resource Center] Curated database containing annotated genome data for eleven virus families.
# [http://www.ericbrc.org/ ERIC (Enteropathogen Resource Integration Center)] Curated database containing annotated genome data for five enteropathogens -
Escherichia coli, Shigella, Salmonella, Yersinia enterocolitica, and Y. pestis.
Genome Browsers enable researchers to visualize and browse entire
genomes (most have many complete genomes) with annotated data including gene prediction and structure, proteins, expression, regulation, variation, comparative analysis, etc. Annotated data is usually from multiple diverse sources.
# [http://img.jgi.doe.gov/ Integrated Microbial Genomes] (IMG) system by the DOE-
Joint Genome Institute
# [http://genome.ucsc.edu UCSC Genome Bioinformatics] Genome Browser and Tools (UCSC)
# [http://www.ensembl.org/ Ensembl] The
EnsemblGenome Browser ( Sanger Instituteand EBI)
# [http://www.gmod.org/?q=node/71 GBrowse] The GMOD GBrowse Project
# [http://bioinformatics.ai.sri.com/ptools/ Pathway Tools] Genome Browser
# [http://xmap.picr.man.ac.uk X:Map] A genome browser that shows
Affymetrix ExonMicroarray hit locations alongside the gene, transcript and exon data on a Google mapsapi
# [http://troy.bioc.uvic.ca/tools/VGO Viral Genome Organizer (VGO)] A genome browser providing visualization and analysis tools for annotated whole genomes from the eleven virus families in the VBRC (Viral Bioinformatics Resource Center) databases
# [http://apollo.berkeleybop.org/current/index.html Apollo Genome Annotation Curation Tool] A cross-platform, JAVA-based standalone genome viewer with enterprise-level functionality and customizations. The standard for many model organism databases.
UniProt[http://www.uniprot.org] Universal ProteinResource (UniProt Consortium: EBI, Expasy, PIR)
# [http://www-nbrf.georgetown.edu/pir/searchdb.html PIR] Protein Information Resource (
Georgetown UniversityMedical Center (GUMC))
Swiss-Prot[http://www.expasy.org/sprot/] Protein Knowledgebase ( Swiss Institute of Bioinformatics)
# [http://pedant.gsf.de PEDANT] Protein Extraction, Description and ANalysis Tool (Forschungszentrum f. Umwelt & Gesundheit)
# [http://www.expasy.org/prosite/ PROSITE] Database of Protein Families and Domains
# [http://dip.doe-mbi.ucla.edu DIP] Database of Interacting Proteins (Univ. of California)
# [http://www.sanger.ac.uk/Software/Pfam Pfam] Protein families database of alignments and HMMs (
# [http://protein.foulouse.inra.fr/prodom/current/html/home.php ProDom] Comprehensive set of Protein Domain Families (INRA/CNRS)
# [http://www.cbs.dtu.dk/services/SignalP/ SignalP 3.0] Server for
signal peptideprediction (including cleavage site prediction), based on artificial neural networks and HMMs
# [http://supfam.org/SUPERFAMILY/ SUPERFAMILY] Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
Protein Data Bank[http://www.rcsb.org/pdb/] (PDB) (Research Collaboratory for Structural Bioinformatics (RCSB))
# [http://www.cathdb.info/ CATH] Protein Structure Classification
# [http://scop.mrc-lmb.cam.ac.uk/scop/ SCOP]
Structural Classification of Proteins
# [http://swissmodel.expasy.org//SWISS-MODEL.html SWISS-MODEL] Server and Repository for Protein Structure Models
# [http://salilab.org/modbase ModBase] Database of Comparative Protein Structure Models (Sali Lab, UCSF)
BioGRID[http://www.thebiogrid.org] A General Repository for Interaction Datasets ( Samuel Lunenfeld Research Institute)
# [http://string.embl.de STRING: STRING is a database of known and predicted protein-protein interactions.] (EMBL)
# [http://dip.doe-mbi.ucla.edu/ DIP Database of Interacting Proteins]
BioCyc Database Collectionincluding EcoCycand MetaCyc
KEGG PATHWAY Database[http://www.genome.ad.jp/kegg/pathway.html] (Univ. of Kyoto)
MANET database[http://www.manet.uiuc.edu/] ( University of Illinois)
Reactome[http://www.reactome.org] ( Cold Spring Harbor Laboratory, EBI, Gene Ontology Consortium) Microarraydatabases
# [http://www.ebi.ac.uk/arrayexpress ArrayExpress] (
European Bioinformatics Institute)
# [http://www.ncbi.nlm.nih.gov/geo Gene Expression Omnibus] (
National Center for Biotechnology Information)
# [http://www.bioinf.man.ac.uk/microarray/maxd/index.html maxd] (Univ. of Manchester)
# [http://genome-www5.stanford.edu/MicroArray/SMD SMD] (
# [http://www.gti.ed.ac.uk/GPX GPX] (Scottish Centre for Genomic Technology and Informatics)
Mathematical Model Databases
# [http://www.cellml.org/models CellML]
# [http://www.ebi.ac.uk/biomodels/ Biomodels Database]
PCR/ Real time PCR primerDatabases
# [http://www.pathooligodb.com/ PathoOligoDB: A free QPCR oligo database for pathogens ]
# [http://biomovie.ethz.ch BIOMOVIE] (
ETH Zurich) movies related to biology and biotechnology
# [http://cgap.nci.nih.gov/Genes/GeneFinder CGAP Cancer Genes] (
National Cancer Institute)
# [http://www.ncbi.nlm.nih.gov/genome/clone Clone Registry Clone Collections] (
National Center for Biotechnology Information)
# [http://www.genome.ad.jp/dbget-bin/www_bfind?h.sapiens DBGET H.sapiens] (Univ. of Kyoto)
# [http://www.gdb.org/gdb GDB Hum. Genome Db] (
Human Genome Organisation)
# [http://shmpd.bii.a-star.edu.sg SHMPD] The Singapore Human Mutation and Polymorphism Database
# [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene NCBI-UniGene] (National Center for Biotechnology Information)
# [http://www.ncbi.nlm.nih.gov/Omim OMIM Inherited Diseases] (Online Mendelian Inheritance in Man)
# [http://www.gene.ucl.ac.uk/nomenclature Off. Hum. Genome Db] (HUGO Gene Nomenclature Committee)
# [http://www.hgmd.cf.ac.uk/ HGMD disease-causing mutations] (HGMD Human Gene Mutation Database)
# [http://www.bx.psu.edu/phencode/ PhenCode] linking human mutations with phenotype
# [http://hgvbase.cgb.ki.se/databases.htm List with SNP-Databases]
# [http://p53.bii.a-star.edu.sg p53] The p53 Knowledgebase
# [http://genex.hgu.mrc.ac.uk/ Edinburgh Mouse Atlas]
# [http://www.hvrbase.org/ HvrBase++] Human and primate mitochondrial DNA
# [http://www.polygenicpathways.co.uk/ PolygenicPathways] Genes and risk factors implicated in Alzheimer's disease, Bipolar disorder or Schizophrenia
# [http://www.broad.mit.edu/cmap/ Connectivity map] Transcriptional expression data and correlation tools for drugs
# [http://ctd.mdibl.org/ CTD] The
Comparative Toxicogenomics Databasedescribes chemical-gene-disease interactions
Wiki style databases
# [http://ecoliwiki.net/ EcoliWiki]
# [http://openwetware.org/ OpenWetWare]
# [http://pdbwiki.org/ PDBWiki]
# [http://www.proteopedia.org/ Proteopedia]
# [http://www.topsan.org/ Topsan]
# [http://www.wikigenes.org/ WikiGenes]
# [http://www.wikipathways.org/ WikiPathways]
* [http://www.gpse.org Genome Proteome Search Engine] to search across biological databases
* [http://www.biodbs.info DBD: Database of Biological Databases]
* [http://camera.calit2.net/index.php CAMERA] Cyberinfrastructure for Metagenomics, free data repository and bioinformatics tools for metagenomics.
Wikimedia Foundation. 2010.
Look at other dictionaries:
Biological data — is data or measurements collected from biological sources, which is stored or exchanged in a digital form. Biological data is commonly stored in files or databases. Examples of biological data are DNA base pair sequences, and population data used … Wikipedia
Database design — is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which… … Wikipedia
Database (Journal) — Database: The Journal of Biological Databases and Curation File:Cover database.gif Abbreviated title (ISO) … Wikipedia
Biological target — A biological target is a biopolymer such as a protein or nucleic acid whose activity can be modified by an external stimulus. The definition is context dependent and can refer to the biological target of a pharmacologically active drug compound,… … Wikipedia
Biological network inference — Many types of biological networks exist. Few such networks are known in anything approaching their complete structure, even in the simplest bacteria. Still less is known on the parameters governing the behavior of such networks over time, how the … Wikipedia
Biological system — An example of a system: The nervous system. This basic diagram shows that this system is made up of 4 different basic organs: the brain, the cerebellum, the spinal cord, and the nerves. In biology, a biological system (or organ system or body… … Wikipedia
Database of Molecular Motions — The Database of Macromolecular Motions (molmovdb) is a bioinformatics database that attempts to categorize macromolecular motions, sometimes also known as conformational change. It was original developed by Mark B. Gerstein, Samuel Flores,… … Wikipedia
Database of protein conformational diversity — PCDB Content Description protein conformational diversity. Contact Research center Universidad Nacional de Quilmes … Wikipedia
Database of Interacting Proteins — The Database of Interacting Proteins (DIP) catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein–protein interactions. The data stored… … Wikipedia
Crystallographic database — A crystallographic database is a database specifically designed to store information about crystals and crystal structures. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or… … Wikipedia