Genomics and metagenomics have become the core scientific tools to investigate patterns in the evolution and ecology of microbes. The genetic content revealed by sequencing provides us with invaluable information about the identity, abundance, diversity, and distribution of microbes in different ecosystems. An important feature that a microbial genome reveals is the set of genome-encoded metabolic reactions. Based on these reactions, one can reconstruct the biochemical landscape of a microorganism and identify the pathways by which environmental metabolites are imported and used. Microorganisms obtain free energy to perform their functions and matter to build their biomass from their metabolic reactions. Biochemical reaction networks reconstructed from genomes are termed genome-scale metabolic models (GSMMs). In this thesis, I used GSMMs combined with three different computational frameworks to, respectively, predict and describe three important patterns of microbial systems. The investigated patterns were: (i) the frequency distribution of genes in pan-genomes (Chapter 2); (ii) metagenomic signatures in human colorectal cancer (CRC) (Chapters 3 and 4), and (iii) the species abundance distribution in the metagenomes of the human microbiome (Chapter 5). GSMMs were used as the basic building blocks to explain these patterns and were integrated with the composition of the external environment into frameworks that mechanistically connect the patterns, the genome-encoded metabolic reactions, and the external environment in meaningful ways. Overall, this thesis provides novel tools and frameworks to model and explain microbial systems starting from DNA sequences.
Chapter 1.1 contains a general introduction to some of the important patterns of metabolism and explains the three patterns of microbial systems that are listed above. Chapter 1.2 is a second introductory chapter where most of the microbial systems that were used in the following studies are reviewed in detail, including cultured and uncultured microorganisms, microbial genomics, metagenomics, patterns in microbial assembly, and the development of mechanistic models of microbial systems. Chapter 2 reports an investigation of the patterns in the dynamics and composition of pan-genomes. In this chapter, metabolic reactions were used as functional proxies for genes and a framework that mechanistically assesses the major drivers of gene frequency in pan-genomes was developed. Chapters 3 and 4 report investigations of the patterns found in metagenomic signatures of human colorectal cancer (CRC). First experimentally (chapter 3), the potential effect of secreted bacterial molecules and surface proteins in CRC cells are assessed and associated to bacterial genomes. Next, a framework (chapter 4) that associates metabolites enriched in CRC with the bacteria that are also found to be enriched in CRC metagenomes is described. Chapter 5 reports an investigation of the patterns in species abundance distribution in microbiomes. A framework that predicts the environmental metabolomes from the association of growth rates predicted from GSMMs and the species abundance distribution measured by metagenomics is described. Chapter 6, concludes the thesis by discussing how the chapters are integrated and identifying their main limitations. We conclude by summarizing important future steps for the development of general, unified, predictive, and informative models of microbial systems.
Supplementary material:
Supplementary File | Description |
---|
Chapter 2 |
|
Reactions in the toy example. List of toy reactions used in the in silico simulations of the toy model displayed in Figure 2.1. | |
Bacterial and archaeal strains used in this study | |
Environment ball | |
Environment-driven reaction scores | |
Elastic net predictions of the metabolite usage of 46 prokaryote families | |
Correlation of variables related to FIRS, pan-reactomes, and metagenomes | |
Chapter 3 |
|
Bacterial strains used in this study | |
Cancer mutational profile of six human cell lines used in this study | |
Growth rate scores, z-scores, and p-values measured from human cells incubated with bacterial cells. These values were computed from the average of four experimental replicates (see “cell growth analysis” in the methods section) | |
Growth rate scores, z-scores, and p-values measured from human cells incubated with bacterial secretomes.These values were computed from the average of four experimental replicates (see “cell growth analysis” in the methods section) | |
Literature summary of microbial virulence factors potentially associated to cancer | |
Correlation between growth rate scores of cells and secretomes.The correlation values were obtained from the growth rate scores computed from the average of four experimental replicates of bacterial cells and secretomes for the group of strains that belong to the indicated bacterial family | |
Statistical significance analysis of family-specific clustering of the growth rate scores | |
Correlation between the pairwise phylogenetic distance between bacterial strains used in this study and the pairwise Euclidean distance of the growth rate scores | |
Distribution of genes coding for virulence factors in the genomes of the bacteria used in this study. The TcdA toxin was present in bacteria of the Clostridiales order while other toxins were present within bacterial families | |
Functional genomic terms significantly associated to growth rate scores within bacterial families | |
Chapter 4 |
|
MAMBO, Western diet, and high-fiber diet basal environment | |
MI, SGA, MR scores, CRC enrichment p-values, AUC, and mOTU prediction for all GSMMs | |
Important metabolites for GSMMs | |
Chapter 5 |
|
Top 20 metabolites for human skin predicted by averaging the normalized results of MAMBO on 50 human skin metagenomes. Myristic acid is used as a fragrance ingredient, cleansing agent, and emulsifier, and is readily adsorbed by the skin. Citrate is a commonly used ingredient to adjust the acidity of cosmetics. Nicotinamide ribonucleotide, aspartate, and N-acetyl glucosamine are used in skin conditioner products. N-acetyl glucosamine is a precursor to hyaluronic acid, a major component of skin structure, a pathway that responds to UV irradiation in skin37. Complete lists of all predicted metabolomes for 175 metagenomes are provided in Supplementary File 1 | |
Metabolomic profiles predicted by MAMBO and genes-only approach based on 37 oral, 50 skin, 39 stool and 49 vaginal metagenomes, and 6 experimentally measured metabolomic profiles. Values are normalised predicted abundances | |
Pearson correlations between 6 measured metabolomic profiles and 175 predicted metabolomic profiles by MAMBO and genes-only approach. Correlations are only shown if >5 metabolites of the predicted metabolites were measured and vice versa |