Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Finding published research datasets for biology and biochemistry research
Data sharing has been embedded in the biological sciences research culture for a significant length of time, which has resulted in the creation of numerous discipline and sub-domain specific data archives. The most widely used interdisciplinary data archive for the life sciences is Dryad, which is indexed in the Web of Science Data Citation Index.
University of Bath Library Subject Homepages
Model organism data repositories
The Arabidoposis Information Resources (TAIR)
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
Eukaryotic Pathogen Database Resources (EuPathDB)
EuPathDB (formerly ApiDB) is an integrated database covering the eukaryotic pathogens.
FlyBase is a database of genetic, genomic and functional data for Drosophila species.
Mouse Genome Informatics
MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
Rat Genome Database
The goal of the database is to establishment of a Rat Genome Database, to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community.
VectorBase provides data on arthropod vectors of human pathogens. Sequence data, gene expression data, images, population data, and insecticide resistance data for arthropod vectors are available for download.
International collaboration to provide accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and some related nematodes.
Database providing access the diverse and rich genomic, expression and functional data available from Xenopus research.
Zebrafish Model Organism Database (ZFIN)
Model organism database for the Zebrafish research community.
Sequencing data archives
Database of Genomic Variants (DGVa)
Data archive of genomic structural variants.
NCBI Short Genetic Variations database (dbSNP) contains data on short variations in nucleotide sequences from a wide range of organisms.
NCBI database of genomic structural variation with data from multiple gene studies using a variety of organisms and techniques.
DNA Databank of Japan
The sole nucleotide sequence databank for Asia containing international data on nucleotide sequences.
MGNify (formerly EBI Metagenomics)
Archive of microbiome data to help determine taxonomic diversity and functional and metabolic potential.
European Nucleotide Archive (ENA)
Data repository for experimental workflows based around nucleotide sequencing.
European Variation Archive
Database of genetic variation data from all species with highly detailed, granular raw data from human genetic data.
Annotated collection of publicly-available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive (ENA), and GenBank at NCBI. These three organizations exchange data on a daily basis.
Searchable database of published annotated miRNA sequences.
NCBI Sequence Read Archive
Raw sequencing data from platforms including the Roche 454 GS system, the Illumina Genome analyser, the Applied Biosystems SOLiD System, the Helicos Helicoscope and the Complete Genomics.
NCBI Trace Archive
Archive of DNA sequence chromatograms, base calls and quality estimates for single pass reads from various large-scale sequencing projects.
Universal Protein Resource (UniProt)
Data archive of annotated protein sequence data.
Biochemical data archives
Ecological and environmental data archives
Structural data repositories
The Binding Database
Database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules.
Biological Magnetic Resonance Data Bank (BMRB)
BioMagResBank repository for NMR results from peptides, proteins and nucleic acids.
Cambridge Structural Database
Web access to the Cambridge Structural Database, with access to over 900,000 organic and organometallic crystal structures.
Archive for crystal structures generated by the Southampton Chemical Crystallography Group and the EPSRC UK National Crystallography Service.
Electron Microscopy Data Bank (EMDB)
Electron microscopy density maps of macromolecular complexes and subcelluar structures covering single-particle analysis, electronic tomography and electron crystallography.
IntAct Molecular Interaction Database
Database and analysis tool for molecular interaction data curated from the literature or direct deposits.
Protein Circular Dicronism Databank (PCDDB)
The Protein Circular Dicronism Data Bank (PCDDB) provides and accepts a circular dichroism spectra data.
Structural Biology Data Grid
Open access to macromolecular X-ray diffraction and MicroED datasets. The repository complements the Worldwide Protein Data Bank.
WorldWide Protein Data Bank
The data contained in the archive include atomic coordinates, crystallographic structure factors and NMR experimental data.
Many of the archive descriptions used on this page are adapted from those provided in the re3data registry of data repositories.