Molecular Biology Databases

What is a database?

Types of databases

(A) Flat-file Databases

(B) Relational Databases

(C) World Wide Web access to databases

(D) The historical problem

(E) Unifying approaches to link databases

================================================================================

A. Molecular Biology DataBases

Bioinformatics scientists collect, organize and make sequence data that is generated, available to all biologists

Today data is shared and integrated between the three major data depositories, namely, GenBank, which forms part of the NCBI, European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ)]

During Oct. 1996, GenBank contained 1,021,211 sequence records = 652,000,000 bases of DNA sequence = 3.1 gigabytes of computer storage space. In June 1997 this escalated to 1,491,000 records and 967,000,000 bases. Check the sequence record out for for 2005

The contents of GenBank are now doubling in less than a year, and the doubling rate is accelerating ie the data generated and collected is growing exponentially.

Genomes in their entirity have been sequenced since 1995. Institutes generating whole genome sequences make these sequences publically available electronicaly. In addition, data and analysis tools for all genomes sequenced todate can be accessed with a commercial license at ERGO and a complete listing of the all past, present and future genomes being investigated can be found at Genomes On Line Database GOLD.

B. The Resources at NCBI

NCBI was Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.

The NCBI can be summarised as having 3 arms:

The various Sequence Data Bases and PubMed literature Data Base are linked as shown below

entrez
ENTREZ is the text-based search and retrieval system used at NCBI for the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, and others. ENTREZ is therefore at the core of the search and retrieval system that integrates and links the various databases. In order to maximise the benfits of the various databases it is imperative that you read and learn from the ENTREZHELP FILE

C. The resoures at Protein Data Bank (PDB)

PDB is the single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data. It provides access to a range of protein crystallography data. It is needless to say that the rate at which crystal data is submitted to PDB is exponential indicating the high level of interest in understanding structure and function relationships of macromolecules. However, the numbers of new folds found in structures appears to have plateaued.

D. Ribosomal DataBase Project (RDP)

RDPII data base contains aligned and unaligned small subunit ribosomal rRNA sequences. Most of the sequences have been extracted from the GenBank Data Base and RDP is now updated on a regular basis. It can therefore be regarded as a GenBank subset specialist 16S rRNA Data Base.

In addition, the database conatins a set of integrated online analysis bioinformatics tools useful for aligning user input sequences based on rRNA secondary structural constraints and for constructing phylogeny. It is also possible to download sequences in the aligned and unaligned forms. The sequences are in GenBank format.

E. KEGG Data Base

Kyoto Encyclopedia of Genes and Genomes (KEGG) data base is an excellent data base which links the metabolic pathways of all the organisms whose genomes have been sequenced. It also has links to the genes involved in the metabolic pathways. Kyoto Encyclopedia of Genes and Genomes (KEGG) data base is an excellent data base which links the metabolic pathways of all the organisms whose genomes have been sequenced. It also has links to the genes involved in the metabolic pathways.

NOTE: KEGG is part of the GenomeNet and Bioinformatics in Japan and also houses a range of on-line tools such as BLAST, CLUSTAL etc and is worth looking through.


Send comments to Professor Bharat Patel: b.patel@griffith.edu.au
[Created: 03 March 2003]
[Updated: 09 March 2008]