Mgpipe

adaptVMHDietToAGORA(VMHDiet, setupUsed, AGORAPath)[source]

Part of the Microbiome Modeling Toolbox. This function adapts a diet generated by the Diet Designer on https://www.vmh.life such that microbiome community models created from the AGORA resource can generate biomass. All metabolites required by at least one AGORA model are added. Note that the adapted diet that is the output of this function is specific to the AGORA resource. It is not guaranteed that other constraint-based models can produce biomass on this diet. Units are given in mmol/day/person.

USAGE

[adaptedDietConstraints, growthOK] = adaptVMHDietToAGORA (VMHDiet, setupUsed, AGORAPath)

INPUTS
  • VMHDiet – Name of text file with VMH exchange reaction IDs and values on lower bounds generated by Diet Designer on https://www.vmh.life (or manually).

  • setupUsed – Model setup for which the adapted diet will be used. Allowed inputs are AGORA (the single AGORA models), Pairwise (the microbe-microbe models generated by the pairwise modeling module), and Microbiota (the microbe community models generated by MgPipe).

OPTIONAL INPUTS

AGORAPath – Path to the AGORA model resource. If entered, growth of all single models on the adapted diet will be tested.

OUTPUT

adaptedDiet – Cell array of exchange reaction IDs, values on lower bounds, and values on upper bounds that can serve as input for the function useDiet.

OPTIONAL OUTPUT:
growthOK: Variable indicating whether all AGORA models could

grow on the adapted diet (if 1 then yes).

addMicrobeCommunityBiomass(model, microbeNames, abundances)[source]

Adds a community biomass reaction to a model structure with multiple microbes based on their relative abundances. If no abundance values are provided, all n microbes get equal weights (1/n). Assumes a lumen compartment [u] and a fecal secretion comparment [fe]. Creates a community biomass metabolite ‘microbeBiomass’ that is secreted from [u] to [fe] and exchanged from fecal compartment.

USAGE

model = addMicrobeCommunityBiomass (model, microbeNames, abundances)

INPUTS
  • model – COBRA model structure with n joined microbes with biomass metabolites ‘Microbe_biomass[c]’.

  • microbeNames – nx1 cell array of n unique strings that represent each microbe in the model.

OPTIONAL INPUT

abundances – nx1 vector with the relative abundance of each microbe.

OUTPUT

model – COBRA model structure

buildModelStorage(microbeNames, modPath, dietFilePath, adaptMedium, includeHumanMets, numWorkers, pruneModels, biomasses)[source]

This function builds the internal exchange space and the coupling constraints for models to join within mgPipe so they can be merged into microbiome models afterwards. exchanges that can never carry flux on the given diet are removed to reduce computation time.

USAGE

[activeExMets,couplingMatrix] = buildModelStorage(microbeNames,modPath,dietFilePath,adaptMedium,includeHumanMets,numWorkers,pruneModels)

INPUTS

microbeNames: list of microbe models included in the microbiome models modPath: char with path of directory where models are stored dietFilePath: char with path of directory where the diet is saved adaptMedium: boolean indicating if the medium should be adapted through the

adaptVMHDietToAGORA function or used as is (default=true)

includeHumanMets: boolean indicating if human-derived metabolites

present in the gut should be provexchangesed to the models (default: true)

numWorkers: integer indicating the number of cores to use for parallelization pruneModels: boolean indicating whether reactions that do not carry flux on the

input diet should be removed from the microbe models. Recommended for large datasets (default: false)

biomasses: Cell array containing names of biomass objective functions

of models to join. Needs to be the same length as the length of models in the abundance file.

OUTPUTS
activeExMets: list of exchanged metabolites present in at

least one microbe model that can carry flux

couplingMatrix: matrix containing coupling constraints for each model to join

createPanModels(agoraPath, panPath, taxonLevel, agoraVersion, numWorkers, builtTaxa)[source]

This function creates pan-models for all unique taxa (e.g., species) included in the AGORA resource. If reconstructions of multiple strains in a given taxon are present, the reactions in these reconstructions will be combined into a pan-reconstruction. The pan-biomass reactions will be built from the average of all biomasses. Futile cycles that result from the newly combined reaction content are removed by setting certain reactions irreversible. These reactions have been determined manually. NOTE: Futile cycle removal has only been tested at the species and genus level. Pan-models at higher taxonomical levels (e.g., family) may contain futile cycles and produce unrealistically high ATP flux. The pan-models can be used an input for mgPipe if taxon abundance data is available at a higher level than strain, e.g., species, genus.

USAGE

createPanModels (agoraPath,panPath,taxonLevel, agoraVersion, numWorkers)

INPUTS
  • agoraPath String containing the path to the AGORA reconstructions. – Must end with a file separator.

  • panPath String containing the path to an empty folder that the – created pan-models will be stored in. Must end with a file separator.

  • taxonLevel String with desired taxonomical level of the pan-models. – Allowed inputs are ‘Species’,’Genus’,’Family’,’Order’, ‘Class’,’Phylum’.

  • agoraVersion Version of AGORA that will be used (allowed inputs – ‘AGORA’, ‘AGORA2’, alternatively: path to custom table with reconstruction information)

OPTIONAL INPUTS

numWorkers Number of workers for parallel pool (default: no pool) builtTaxa Names of taxa in table that will be built (default:

all). Need to be entered as a cell array of strings with names written exactly as in the corresponding column in the table.

createPersonalizedModel(abundance, resPath, model, sampNames, orglist, couplingMatrix, host, hostBiomassRxn)[source]

This function creates personalized models from integration of given organisms abundances into the previously built global setup. Coupling constraints are also added for each organism. All the operations are parallelized and the generated personalized models directly saved in .mat format.

USAGE

[createdModels] = createPersonalizedModel (abundance, resPath, model, sampNames, orglist, host, hostBiomassRxn)

INPUTS
  • abundance – table with abundance information

  • resPath – char with path of directory where results are saved

  • model – model in COBRA model structure format

  • sampNames – cell array with names of individuals in the study

  • orglist – cell array with names of organisms in the study

  • couplingMatrix – cell array containing pre-created coupling matrices for each organism to be joined (created by buildModelStorage function)

  • host – Contains the host model if path to host model was defined. Otherwise empty.

  • hostBiomassRxn – char with name of biomass reaction in host (default: empty)

OUTPUT

createdModels – created personalized models

detectOutput(resPath, objNam)[source]

This function checks the existence of a specific file in the results folder.

USAGE

mapP = detectOutput (resPath, objNam)

INPUTS
  • resPath – char with path of directory where results are saved

  • objNam – char with name of object to find in the results folder

OUTPUTS

mapP – double indicating if object was found in the result folder

extractFullRes(resPath, ID, dietType, sampName, fvaCt, nsCt)[source]

This function is called from the MgPipe pipeline. Its purpose is to retrieve and export, in a comprehensive way, all the results (fluxes) computed during the simulations for a specified diet. Since FVA is computed on diet and fecal exchanges, every metabolite will have four different values for each individual, values corresponding min and max of uptake and secretion.

USAGE

[finRes]= extractFullRes (resPath, ID, dietType, sampName, fvaCt, nsCt)

INPUTS
  • resPath – char with path of directory where results are saved

  • ID – cell array with list of all unique Exchanges to diet/ fecal compartment

  • dietType – char indicating under which diet to extract results: rDiet (rich diet), sDiet(previously specified diet) set by default, and pDiet(personalized)if available

  • sampName – nx1 cell array cell array with names of individuals in the study

  • fvaCt – cell array containing FVA values for maximal uptake

  • nsCt – cell array containing FVA values for minimal uptake and secretion for setup lumen / diet exchanges

OUTPUTS

finRes – cell array with min and max value of uptake and secretion for each metabolite

fastSetupCreator(exMets, microbeNames, host)[source]

creates a microbiota model (min 1 microbe) that can be coupled with a host model. Microbes and host are connected with a lumen compartment [u], host can secrete metabolites into body fluids [b]. Diet is simulated as uptake through the compartment [d], transporters are unidirectional from [d] to [u]. Secretion goes through the fecal compartment [fe], transporters are unidirectional from [u] to [fe]. Reaction types Diet exchange: ‘EX_met[d]’: ‘met[d] <=>’ Diet transporter: ‘DUt_met’: ‘met[d] -> met[u]’ Fecal transporter: ‘UFEt_met’: ‘met[u] -> met[fe]’ Fecal exchanges: ‘EX_met[fe]’: ‘met[fe] <=>’ Microbe uptake/secretion: ‘Microbe_IEX_met[c]tr’: ‘Microbe_met[c] <=> met[u]’ Host uptake/secretion lumen: ‘Host_IEX_met[c]tr’: ‘Host_met[c] <=> met[u]’ Host exchange body fluids: ‘Host_EX_met(e)b’: ‘Host_met[b] <=>’

INPUTS

exMets

cell array with all unique extracellular

metabolites contained in the models

microbeNames: nx1 cell array of n unique strings that represent

each microbe model. Reactions and metabolites of each microbe will get the corresponding microbeNames (e.g., ‘Ecoli’) prefix. Reactions will be named ‘Ecoli_RxnAbbr’ and metabolites ‘Ecoli_MetAbbr[c]’).

host: Host COBRA model structure, can be left empty if

there is no host model

OUTPUT

model – COBRA model structure with all models combined

getIndividualSizeName(abunFilePath, modPath)[source]

This function automatically detects organisms, names and number of individuals present in the study.

USAGE

[sampNames, organisms, exMets] = getIndividualSizeName (abunFilePath,modPath)

INPUTS
  • abunFilePath – char with path and name of file from which to retrieve information

  • modPath – char with path of directory where models are stored

OUTPUTS
  • sampNames – nx1 cell array cell array with names of individuals in the study

  • organisms – nx1 cell array cell array with names of organisms in the study

  • exMets – cell array with all unique extracellular metabolites contained in the models

getMappingInfo(modPath, organisms, abunFilePath)[source]

This function automatically extracts information from strain abundances in different individuals and combines this information into different tables.

USAGE

[reac, exMets, micRea, binOrg, patOrg, reacPat, reacNumb, reacSet, reacTab, reacAbun, reacNumber] = getMappingInfo (modPath, organisms, abunFilePath, patNumb)

INPUTS
  • organisms – nx1 cell array cell array with names of organisms in the study

  • modPath – char with path of directory where models are stored

  • abunFilePath – char with path and name of file from which to retrieve abundance information

  • patNumb – number of individuals in the study

OUTPUTS
  • reac – cell array with all the unique set of reactions contained in the models

  • exMets – cell array with all unique extracellular metabolites contained in the models

  • micRea – binary matrix assessing presence of set of unique reactions for each of the microbes

  • binOrg – binary matrix assessing presence of specific strains in different individuals

  • reacPat – matrix with number of reactions per individual (organism resolved)

  • reacSet – matrix with names of reactions of each individual

  • reacTab – char with names of individuals in the study

  • reacAbun – binary matrix with presence/absence of reaction per individual: to compare different individuals

  • reacNumber – number of unique reactions of each individual

guidedSim(model, rl)[source]

This function is part of the MgPipe pipeline and runs FVAs on a series of selected reactions with different possible FVA functions. Solver is automatically set to ‘cplex’, objective function is maximized, and optPercentage set to 99.99.

USAGE

[minFlux, maxFlux] = guidedSim (model, fvaType, rl)

INPUTS
  • model – COBRA model structure with n joined microbes with biomass metabolites ‘Microbe_biomass[c]’.

  • rl – nx1 vector with the reactions of interest.

  • solver – char with slver name to use.

OUTPUTS
  • minFlux – Minimum flux for each reaction

  • maxFlux – Maximum flux for each reaction

..Author: Federico Baldini, 2017-2018

initMgPipe(modPath, abunFilePath, computeProfiles, varargin)[source]

This function initializes the mgPipe pipeline and sets the optional input variables if not defined.

USAGE

[init, netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = initMgPipe(modPath, abunFilePath, computeProfiles, varargin)

INPUTS
  • modPath – char with path of directory where models are stored

  • abunFilePath – char with path and name of file from which to retrieve abundance information

  • computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.

OPTIONAL INPUTS
  • resPath – char with path of directory where results are saved

  • dietFilePath – char with path of directory where the diet is saved. Can also be a character array with a separate diet for each individual, in that case, size(dietFilePath,1) needs to equal the length of samples, and the first row needs to be sample names and the second row needs to be the respective files with diet information.

  • infoFilePath – char with path to stratification criteria if available

  • biomasses – Cell array containing names of biomass objective functions of models to join. Needs to be the same length as the length of models in the abundance file.

  • hostPath – char with path to host model, e.g., Recon3D (default: empty)

  • hostBiomassRxn – char with name of biomass reaction in host (default: empty)

  • hostBiomassRxnFlux – double with the desired upper bound on flux through the host biomass reaction (default: 1)

  • numWorkers – integer indicating the number of cores to use for parallelization

  • rDiet – boolean indicating if to enable also rich diet simulations (default: ‘false’)

  • pDiet – boolean indicating if to enable also personalized diet simulations (default: ‘false’)

  • lowerBMBound – lower bound on community biomass (default=0.4)

  • upperBMBound – upper bound on community biomass (default=1)

  • includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provided to the models (default: true)

  • adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)

  • pruneModels – boolean indicating whether reactions that do not carry flux on the input diet should be removed from the microbe models. Recommended for large datasets (default: false)

OUTPUTS
  • init – status of initialization

  • netSecretionFluxes – Net secretion fluxes by microbiome community models

  • netUptakeFluxes – Net uptake fluxes by microbiome community models

  • Y – Classical multidimensional scaling

  • modelStats – Reaction and metabolite numbers for each model

  • summary – Table with average, median, minimal, and maximal reactions and metabolites

  • statistics – If info file with stratification is provided, will determine if there is a significant difference.

  • modelsOK – Boolean indicating if the created microbiome models passed verifyModel. If true, all models passed.

loadUncModels(modPath, organisms, objre, printLevel)[source]

This function loads and unconstrains metabolic models from a specific folder

USAGE

models = loadUncModels (modPath, organisms, objre)

INPUTS
  • organisms – nx1 cell array cell array with names of organisms in the study

  • modPath – char with path of directory where models are stored

  • objre – char with reaction name of objective function of organisms

  • printLevel – Verbose level (default: printLevel = 1)

OUTPUT

models – nx1 cell array cell array with models of organisms in the study

makeDummyModel(numMets, numRxns)[source]

Makes an empty model with numMets rows for metabolites and numRxns columns for reactions. Includes all fields that are necessary to join models.

USAGE

dummy = makeDummyModel (numMets, numRxns)

INPUTS
  • numMets – Number of metabolites

  • numRxns – Number of reactions

OUTPUT

dummy – Empty COBRA model structure

mgPipe(modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)[source]

mgPipe is a MATLAB based pipeline to integrate microbial abundances (coming from metagenomic data) with constraint based modeling, creating individuals’ personalized models. The pipeline is divided in 3 parts: [PART 1] Analysis of individuals’ specific microbes abundances are computed. [PART 2]: 1 Constructing a global metabolic model (setup) containing all the microbes listed in the study. 2 Building individuals’ specific models integrating abundance data retrieved from metagenomics. For each organism, reactions are coupled to the objective function. [PART 3] Simulations under different diet regimes.

USAGE

[netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = mgPipe (modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)

INPUTS
  • modPath – char with path of directory where models are stored

  • abunFilePath – char with path and name of file from which to retrieve abundance information

  • computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.

  • resPath – char with path of directory where results are saved

  • dietFilePath – char with path of directory where the diet is saved

  • infoFilePath – char with path to stratification criteria if available

  • biomasses – Cell array containing names of biomass objective functions of models to join. Needs to be the same length as the length of models in the abundance file.

  • hostPath – char with path to host model, e.g., Recon3D (default: empty)

  • hostBiomassRxn – char with name of biomass reaction in host (default: empty)

  • hostBiomassRxnFlux – double with the desired flux through the host biomass reaction (default: zero)

  • figForm – format to use for saving figures

  • numWorkers – integer indicating the number of cores to use for parallelization

  • rDiet – boolean indicating if to enable also rich diet simulations (default: ‘false’)

  • pDiet – boolean indicating if to enable also personalized diet simulations (default: ‘false’)

  • lowerBMBound – lower bound on community biomass (default=0.4)

  • upperBMBound – upper bound on community biomass (default=1)

  • includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provided to the models (default: true)

  • adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)

  • pruneModels – boolean indicating whether reactions that do not carry flux on the input diet should be removed from the microbe models. Recommended for large datasets (default: false)

OUTPUTS
  • init – status of initialization

  • netSecretionFluxes – Net secretion fluxes by microbiome community models

  • netUptakeFluxes – Net uptake fluxes by microbiome community models

  • Y – Classical multidimensional scaling

  • modelStats – Reaction and metabolite numbers for each model

  • summary – Table with average, median, minimal, and maximal reactions and metabolites

  • statistics – If info file with stratification is provided, will determine if there is a significant difference.

  • modelsOK – Boolean indicating if the created microbiome models passed verifyModel. If true, all models passed.

AUTHORS

  • Federico Baldini, 2017-2018

  • Almut Heinken, 07/20: converted to function

  • Almut Heinken, 01/21: adapted inputs

mgSimResCollect(resPath, sampNames, exchanges, rDiet, pDiet, infoFilePath, netProduction, netUptake, figForm)[source]

This function is called from the MgPipe pipeline. Its purpose is to compute NMPCs from simulations with different diet on multiple microbiota models. Results are outputted as .csv and a PCoA on NMPCs to group microbiota models of individuals for similar metabolic profile is also computed and outputted.

USAGE

[fSp, Y]= mgSimResCollect (resPath, sampNames, sampNames, rDiet, pDiet, infoFilePath, netProduction, figForm)

INPUTS
  • resPath – char with path of directory where results are saved

  • sampNames – nx1 cell array cell array with names of individuals in the study

  • exchanges – cell array with list of all unique exchanges to diet/ fecal compartment that were interrogated in simulations

  • rDiet – number (double) indicating if to simulate a rich diet

  • pDiet – number (double) indicating if a personalized diet is available and should be simulated

  • infoFilePath – char indicating, if stratification criteria are available, full path and name to related documentation(default: no) is available

  • netProduction – cell array containing FVA values for maximal uptake

  • figForm – char indicating the format of figures

OUTPUTS
  • netSecretionFluxes – cell array with computed NMPCs

  • netUptakeFluxes – cell array with computed uptake potential

  • Y – classical multexchangesimensional scaling

microbiotaModelSimulator(resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)[source]

This function is called from the MgPipe pipeline. Its purpose is to apply different diets (according to the user’s input) to the microbiota models and run simulations computing FVAs on exchanges reactions of the microbiota models. The output is saved in multiple .mat objects. Intermediate saving checkpoints are present.

USAGE

[exchanges, netProduction, netUptake, growthRates, infeasModels] = microbiotaModelSimulator (resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)

INPUTS
  • resPath – char with path of directory where results are saved

  • exMets – list of exchanged metabolites present in at least one microbe model that can carry flux

  • sampNames – cell array with names of individuals in the study

  • dietFilePath – path to and name of the text file with dietary information Can also be a list of the sample names with individual diet files.

  • hostPath – char with path to host model, e.g., Recon3D (default: empty)

  • hostBiomassRxn – char with name of biomass reaction in host (default: empty)

  • hostBiomassRxnFlux – double with the desired upper bound on flux through the host biomass reaction (default: 1)

  • numWorkers – integer indicating the number of cores to use for parallelization

  • rDiet – boolean indicating if to simulate a rich diet

  • pDiet – boolean indicating if a personalized diet is available and should be simulated

  • computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.

  • lowerBMBound Minimal amount of community biomass in mmol/person/day enforced (default=0.4)

  • upperBMBound Maximal amount of community biomass in mmol/person/day enforced (default=1)

  • includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provexchangesed to the models (default: true)

  • adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)

OUTPUTS
  • exchanges – cell array with list of all unique exchanges to diet/ fecal compartment that were interrogated in simulations

  • netProduction – cell array containing FVA values for maximal uptake and secretion for setup lumen / diet exchanges

  • netUptake – cell array containing FVA values for minimal uptake and secretion for setup lumen / diet exchanges

  • growthRates – array containing values of microbiota models objective function

  • infeasModels – cell array with names of infeasible microbiota models

normalizeCoverage(abunFilePath, cutoff)[source]

This functions normalizes the coverage in a given file with organism coverages such that they sum up to 1 for each sample.

USAGE

[normalizedCoverage,normalizedCoveragePath] = normalizeCoverage(abunFilePath,cutoff)

INPUT abunFilePath Path to table with not yet normalized relative

coverages

OPTIONAL INPUT cutoff Cutoff for normalized coverages that are

considered below detection limit, respective organisms will be removed from the samples (default: 0.0001)

OUTPUTS normalizedCoverage Table with normalized coverages normalizedCoveragePath Path to csv file with normalized coverages

parsave(fname, data)[source]

Saves a data variable (e.g., model) from a parfor loop - might not work in R2105b

USAGE

parsave (fname, data)

INPUTS
  • fname – name of file

  • data – name of variable

plotMappingInfo(resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)[source]

This function computes and automatically plots information coming from the mapping data as metabolic diversity and classical multidimensional scaling of individuals’ reactions repertoire. If the last 2 arguments are specified MDS plots will be annotated with samples and organisms names

USAGE

Y = plotMappingInfo (resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)

INPUTS
  • resPath – char with path of directory where results are saved

  • reac – nx1 cell array with all the unique set of reactions contained in the models

  • micRea – binary matrix assessing presence of set of unique reactions for each of the microbes

  • reacSet – matrix with names of reactions of each individual

  • reacTab – binary matrix with presence/absence of reaction per individual.

  • reacAbun – matrix with abundance of reaction per individual

  • reacNumber – number of unique reactions of each individual

  • infoFilePath – char indicating, if stratification criteria are available, full path and name to related documentation(default: no) is available

  • figForm – format to use for saving figures

  • sampNames – nx1 cell array cell array with names of individuals in the study

  • organisms – nx1 cell array cell array with names of organisms in the study

OUTPUTS

Y – classical multidimensional scaling of individuals’ reactions repertoire

retrieveModelStats(modelPath, modelList, abunFilePath, numWorkers, infoFilePath)[source]

This function retrieves statistics on the number of reactions and metabolites across microbiome models. If a file with stratification information on individuals is provided, it will also determine if reaction and metabolites numbers are significantly different between groups.

USAGE

[modelStats,summary,statistics]=retrieveModelStats (modelPath, modelList, numWorkers, infoFilePath)

INPUTS modelPath: Path to models for which statistics should be retrieved modelList: Cell array with names of models for which statistics

should be retrieved

abunFilePath: char with path and name of file from which to retrieve

abundance information

numWorkers: integer indicating the number of cores to use for parallelization

OPTIONAL INPUT: infoFilePath: char with path to stratification criteria if available

OUTPUT modelStats: Reaction and metabolite numbers for each model summary: Table with average, median, minimal, and maximal

reactions and metabolites

OPTIONAL OUTPUT: statistics: If info file with stratification is provided, will

determine if there is a significant difference.

translateMetagenome2AGORA(MetagenomeAbundancePath, sequencingDepth, reconstructionResource)[source]

Translates organism identifiers in a published metagenomic or 16S rRNA data file with organism abundances (retrieved e.g., from MetaPhlAn) to AGORA pan-model IDs. This will not catch every case since the format of input files with abundance data greatly varies. Feel free to modify this function and submit a pull request to enable more input files to be translated to AGORA. Moreover, slight spelling variations in taxa across input files may result to taxa not being mapped. Check the unmappedRows output to identify these cases and modify the function accordingly. Pan-models that can be used to create microbiome models in mgPipe can be created with the function createPanModels. Consider running the function updateTaxonomyInfoAGORA to retrieve the most recent taxonomic assignment for the taxa to map. You might need to update your input file based on updated taxonomic classifications.

USAGE

[translatedAbundances,normalizedAbundances,unmappedRows]=translateMetagenome2AGORA (MetagenomeAbundancePath,sequencingDepth,reconstructionResource)

INPUT

MetagenomeAbundancePath String containing the path to csv file with – organism abundance data retrieved from 16S rRNA or metagenomic samples (example: ‘SRP065497_taxonomy_abundances_v3.0.tsv’).

OPTIONAL INPUTS
sequencingDepth Sequencing depth on the taxonomical level – in the input data (e.g., genus, species).

Allowed inputs are ‘Species’,’Genus’, ‘Family’,’Order’, ‘Class’, ‘Phylum’. Default: ‘Species’.

reconstructionResource Name of the reconstruction resource to map

the abundances to. Allowed inputs are ‘AGORA’, ‘AGORA2’. Default: ‘AGORA’

OUTPUTS
  • translatedAbundances Abundances with organism names from the – input file translated to AGORA pan-model IDs

  • normalizedAbundances Translated abundances normalized so they sum – up to 1 for each sample

  • unmappedRows Taxa on the selected taxonomical level that – could not be mapped to AGORA pan-models

updateTaxonomyInfoAGORA()[source]

This function retrieves the newest taxonomy information for each AGORA strain from NCBI Taxonomy to facilitate mapping taxonomic assignments from metagenomic sequencing data to AGORA. An updated version of the AGORA information table is saved in spreadsheet format.