About Metalog

Metalog is a repository of manually curated metadata for metagenomic samples. This includes all contextual data, for example clinical or demographic data for human subjects or water depth and chemical composition for water samples. We focus on samples with metagenomic sequencing data and do not include other data such as metatranscriptomic data or amplicon sequencing data.

It is being developed in the Bork Group at EMBL Heidelberg. We manually annotate datasets of interest and control for the quality of the extracted metadata using automated and manual checks. If a study you are interested in is not on Metalog, then it is probably on our queue of studies to annotate. We aim to provide metadata for all studies in the SPIRE database (and its upcoming next version). If you find an error, please go to the respective study page and use the links towards the bottom to report the error in our GitHub repository.

The purpose of this website is to provide an overview over the data that is available in Metalog. For further analysis, please download the metadata and associated taxonomic profiles. We have prepared an R script as an usage example that downloads data from Metalog and looks for associations between bacteria and medication in the gut microbiome of adults.

Raw sequence data is available on sequencing archives like ENA or SRA. A small number of studies have sequencing data on MG-RAST. Please note that in some cases, we had to split or merge samples from the sequencing repositories, as e.g. some authors uploaded data from different biological samples under the same sample accession. A mapping file can be found on the download page.

Metadata updates are usually published every weekend. Study pages show when the study was last updated, and this information is also included in the metadata downloads. However, because it is possible that we are asked to remove a study from the database, we do not keep an archive of previous versions.

Citing Metalog

Metalog: curated and harmonised contextual data for global metagenomics samples
Michael Kuhn, Thomas Sebastian B Schmidt, Pamela Ferretti, Anna Głazek, Shahriyar Mahdi Robbani, Wasiu Akanni, Anthony Fullam, Christian Schudoma, Ela Cetin, Mariam Hassan, Kasimir Noack, Anna Schwarz, Roman Thielemann, Leonie Thomas, Moritz von Stetten, Renato Alves, Anandhi Iyappan, Ece Kartal, Ivan Kel, Marisa I Keller, Oleksandr Maistrenko, Anna Mankowski, Suguru Nishijima, Daniel Podlesny, Jonas Schiller, Sarah Schulz, Thea Van Rossum, Peer Bork
Nucleic Acids Research, 2025; https://doi.org/10.1093/nar/gkaf1118

Statistics

Category Count
Studies 1058
All samples 148144
Human samples 104495
Animal samples 11218
Ocean samples 5608
Environmental samples 26823
Human subjects 75428

Metadata counts for human samples

Sample counts for human
Heatmap of pairwise sample counts for human

Metadata counts for animal samples

Sample counts for animal
Heatmap of pairwise sample counts for animal

Metadata counts for environmental samples

Sample counts for environmental
Heatmap of pairwise sample counts for environmental

Metadata counts for ocean samples

Sample counts for ocean
Heatmap of pairwise sample counts for ocean