Mgpipe¶

adaptVMHDietToAGORA(VMHDiet, setupUsed, AGORAPath)[source]¶

Part of the Microbiome Modeling Toolbox. This function adapts a diet generated by the Diet Designer on https://www.vmh.life such that microbiome community models created from the AGORA resource can generate biomass. All metabolites required by at least one AGORA model are added. Note that the adapted diet that is the output of this function is specific to the AGORA resource. It is not guaranteed that other constraint-based models can produce biomass on this diet. Units are given in mmol/day/person.

USAGE

[adaptedDietConstraints, growthOK] = adaptVMHDietToAGORA (VMHDiet, setupUsed, AGORAPath)

INPUTS

VMHDiet – Name of text file with VMH exchange reaction IDs and values on lower bounds generated by Diet Designer on https://www.vmh.life (or manually).
setupUsed – Model setup for which the adapted diet will be used. Allowed inputs are AGORA (the single AGORA models), Pairwise (the microbe-microbe models generated by the pairwise modeling module), and Microbiota (the microbe community models generated by MgPipe).

OPTIONAL INPUTS

AGORAPath – Path to the AGORA model resource. If entered, growth of all single models on the adapted diet will be tested.

OUTPUT

adaptedDiet – Cell array of exchange reaction IDs, values on lower bounds, and values on upper bounds that can serve as input for the function useDiet.

OPTIONAL OUTPUT:

growthOK: Variable indicating whether all AGORA models could: grow on the adapted diet (if 1 then yes).

addMicrobeCommunityBiomass(model, microbeNames, abundances)[source]¶

Adds a community biomass reaction to a model structure with multiple microbes based on their relative abundances. If no abundance values are provided, all n microbes get equal weights (1/n). Assumes a lumen compartment [u] and a fecal secretion comparment [fe]. Creates a community biomass metabolite ‘microbeBiomass’ that is secreted from [u] to [fe] and exchanged from fecal compartment.

USAGE

model = addMicrobeCommunityBiomass (model, microbeNames, abundances)

INPUTS

model – COBRA model structure with n joined microbes with biomass metabolites ‘Microbe_biomass[c]’.
microbeNames – nx1 cell array of n unique strings that represent each microbe in the model.

OPTIONAL INPUT

abundances – nx1 vector with the relative abundance of each microbe.

OUTPUT

model – COBRA model structure

buildModelStorage(microbeNames, modPath, dietFilePath, adaptMedium, includeHumanMets, numWorkers, pruneModels, biomasses)[source]¶

This function builds the internal exchange space and the coupling constraints for models to join within mgPipe so they can be merged into microbiome models afterwards. exchanges that can never carry flux on the given diet are removed to reduce computation time.

USAGE

[activeExMets,couplingMatrix] = buildModelStorage(microbeNames,modPath,dietFilePath,adaptMedium,includeHumanMets,numWorkers,pruneModels)

INPUTS

microbeNames: list of microbe models included in the microbiome models modPath: char with path of directory where models are stored dietFilePath: char with path of directory where the diet is saved adaptMedium: boolean indicating if the medium should be adapted through the

adaptVMHDietToAGORA function or used as is (default=true)

includeHumanMets: boolean indicating if human-derived metabolites: present in the gut should be provexchangesed to the models (default: true)

numWorkers: integer indicating the number of cores to use for parallelization pruneModels: boolean indicating whether reactions that do not carry flux on the

input diet should be removed from the microbe models. Recommended for large datasets (default: false)

biomasses: Cell array containing names of biomass objective functions: of models to join. Needs to be the same length as the length of models in the abundance file.

OUTPUTS

activeExMets: list of exchanged metabolites present in at: least one microbe model that can carry flux

couplingMatrix: matrix containing coupling constraints for each model to join

createPanModels(agoraPath, panPath, taxonLevel, agoraVersion, numWorkers, builtTaxa)[source]¶

This function creates pan-models for all unique taxa (e.g., species) included in the AGORA resource. If reconstructions of multiple strains in a given taxon are present, the reactions in these reconstructions will be combined into a pan-reconstruction. The pan-biomass reactions will be built from the average of all biomasses. Futile cycles that result from the newly combined reaction content are removed by setting certain reactions irreversible. These reactions have been determined manually. NOTE: Futile cycle removal has only been tested at the species and genus level. Pan-models at higher taxonomical levels (e.g., family) may contain futile cycles and produce unrealistically high ATP flux. The pan-models can be used an input for mgPipe if taxon abundance data is available at a higher level than strain, e.g., species, genus.

USAGE

createPanModels (agoraPath,panPath,taxonLevel, agoraVersion, numWorkers)

INPUTS

agoraPath String containing the path to the AGORA reconstructions. – Must end with a file separator.
panPath String containing the path to an empty folder that the – created pan-models will be stored in. Must end with a file separator.
taxonLevel String with desired taxonomical level of the pan-models. – Allowed inputs are ‘Species’,’Genus’,’Family’,’Order’, ‘Class’,’Phylum’.
agoraVersion Version of AGORA that will be used (allowed inputs – ‘AGORA’, ‘AGORA2’, alternatively: path to custom table with reconstruction information)

OPTIONAL INPUTS: numWorkers Number of workers for parallel pool (default: no pool) builtTaxa Names of taxa in table that will be built (default:

all). Need to be entered as a cell array of strings with names written exactly as in the corresponding column in the table.

createPersonalizedModel(abundance, resPath, model, sampNames, orglist, couplingMatrix, host, hostBiomassRxn)[source]¶

This function creates personalized models from integration of given organisms abundances into the previously built global setup. Coupling constraints are also added for each organism. All the operations are parallelized and the generated personalized models directly saved in .mat format.

USAGE

[createdModels] = createPersonalizedModel (abundance, resPath, model, sampNames, orglist, host, hostBiomassRxn)

INPUTS

abundance – table with abundance information
resPath – char with path of directory where results are saved
model – model in COBRA model structure format
sampNames – cell array with names of individuals in the study
orglist – cell array with names of organisms in the study
couplingMatrix – cell array containing pre-created coupling matrices for each organism to be joined (created by buildModelStorage function)
host – Contains the host model if path to host model was defined. Otherwise empty.
hostBiomassRxn – char with name of biomass reaction in host (default: empty)

OUTPUT

createdModels – created personalized models

detectOutput(resPath, objNam)[source]¶

This function checks the existence of a specific file in the results folder.

USAGE

mapP = detectOutput (resPath, objNam)

INPUTS

resPath – char with path of directory where results are saved
objNam – char with name of object to find in the results folder

OUTPUTS

mapP – double indicating if object was found in the result folder

extractFullRes(resPath, ID, dietType, sampName, fvaCt, nsCt)[source]¶

This function is called from the MgPipe pipeline. Its purpose is to retrieve and export, in a comprehensive way, all the results (fluxes) computed during the simulations for a specified diet. Since FVA is computed on diet and fecal exchanges, every metabolite will have four different values for each individual, values corresponding min and max of uptake and secretion.

USAGE

[finRes]= extractFullRes (resPath, ID, dietType, sampName, fvaCt, nsCt)

INPUTS

resPath – char with path of directory where results are saved
ID – cell array with list of all unique Exchanges to diet/ fecal compartment
dietType – char indicating under which diet to extract results: rDiet (rich diet), sDiet(previously specified diet) set by default, and pDiet(personalized)if available
sampName – nx1 cell array cell array with names of individuals in the study
fvaCt – cell array containing FVA values for maximal uptake
nsCt – cell array containing FVA values for minimal uptake and secretion for setup lumen / diet exchanges

OUTPUTS

finRes – cell array with min and max value of uptake and secretion for each metabolite

fastSetupCreator(exMets, microbeNames, host)[source]¶

creates a microbiota model (min 1 microbe) that can be coupled with a host model. Microbes and host are connected with a lumen compartment [u], host can secrete metabolites into body fluids [b]. Diet is simulated as uptake through the compartment [d], transporters are unidirectional from [d] to [u]. Secretion goes through the fecal compartment [fe], transporters are unidirectional from [u] to [fe]. Reaction types Diet exchange: ‘EX_met[d]’: ‘met[d] <=>’ Diet transporter: ‘DUt_met’: ‘met[d] -> met[u]’ Fecal transporter: ‘UFEt_met’: ‘met[u] -> met[fe]’ Fecal exchanges: ‘EX_met[fe]’: ‘met[fe] <=>’ Microbe uptake/secretion: ‘Microbe_IEX_met[c]tr’: ‘Microbe_met[c] <=> met[u]’ Host uptake/secretion lumen: ‘Host_IEX_met[c]tr’: ‘Host_met[c] <=> met[u]’ Host exchange body fluids: ‘Host_EX_met(e)b’: ‘Host_met[b] <=>’

INPUTS

exMets –

cell array with all unique extracellular: metabolites contained in the models
microbeNames: nx1 cell array of n unique strings that represent: each microbe model. Reactions and metabolites of each microbe will get the corresponding microbeNames (e.g., ‘Ecoli’) prefix. Reactions will be named ‘Ecoli_RxnAbbr’ and metabolites ‘Ecoli_MetAbbr[c]’).
host: Host COBRA model structure, can be left empty if: there is no host model

OUTPUT

model – COBRA model structure with all models combined

getIndividualSizeName(abunFilePath, modPath)[source]¶

This function automatically detects organisms, names and number of individuals present in the study.

USAGE

[sampNames, organisms, exMets] = getIndividualSizeName (abunFilePath,modPath)

INPUTS

abunFilePath – char with path and name of file from which to retrieve information
modPath – char with path of directory where models are stored

OUTPUTS

sampNames – nx1 cell array cell array with names of individuals in the study
organisms – nx1 cell array cell array with names of organisms in the study
exMets – cell array with all unique extracellular metabolites contained in the models

getMappingInfo(modPath, organisms, abunFilePath)[source]¶

This function automatically extracts information from strain abundances in different individuals and combines this information into different tables.

USAGE

[reac, exMets, micRea, binOrg, patOrg, reacPat, reacNumb, reacSet, reacTab, reacAbun, reacNumber] = getMappingInfo (modPath, organisms, abunFilePath, patNumb)

INPUTS

organisms – nx1 cell array cell array with names of organisms in the study
modPath – char with path of directory where models are stored
abunFilePath – char with path and name of file from which to retrieve abundance information
patNumb – number of individuals in the study

OUTPUTS

reac – cell array with all the unique set of reactions contained in the models
exMets – cell array with all unique extracellular metabolites contained in the models
micRea – binary matrix assessing presence of set of unique reactions for each of the microbes
binOrg – binary matrix assessing presence of specific strains in different individuals
reacPat – matrix with number of reactions per individual (organism resolved)
reacSet – matrix with names of reactions of each individual
reacTab – char with names of individuals in the study
reacAbun – binary matrix with presence/absence of reaction per individual: to compare different individuals
reacNumber – number of unique reactions of each individual

guidedSim(model, rl)[source]¶

This function is part of the MgPipe pipeline and runs FVAs on a series of selected reactions with different possible FVA functions. Solver is automatically set to ‘cplex’, objective function is maximized, and optPercentage set to 99.99.

USAGE

[minFlux, maxFlux] = guidedSim (model, fvaType, rl)

INPUTS

model – COBRA model structure with n joined microbes with biomass metabolites ‘Microbe_biomass[c]’.
rl – nx1 vector with the reactions of interest.
solver – char with slver name to use.

OUTPUTS

minFlux – Minimum flux for each reaction
maxFlux – Maximum flux for each reaction

..Author: Federico Baldini, 2017-2018

initMgPipe(modPath, abunFilePath, computeProfiles, varargin)[source]¶

This function initializes the mgPipe pipeline and sets the optional input variables if not defined.

USAGE: [init, netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = initMgPipe(modPath, abunFilePath, computeProfiles, varargin)

INPUTS

modPath – char with path of directory where models are stored
abunFilePath – char with path and name of file from which to retrieve abundance information
computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.

OPTIONAL INPUTS

resPath – char with path of directory where results are saved
dietFilePath – char with path of directory where the diet is saved. Can also be a character array with a separate diet for each individual, in that case, size(dietFilePath,1) needs to equal the length of samples, and the first row needs to be sample names and the second row needs to be the respective files with diet information.
infoFilePath – char with path to stratification criteria if available
biomasses – Cell array containing names of biomass objective functions of models to join. Needs to be the same length as the length of models in the abundance file.
hostPath – char with path to host model, e.g., Recon3D (default: empty)
hostBiomassRxn – char with name of biomass reaction in host (default: empty)
hostBiomassRxnFlux – double with the desired upper bound on flux through the host biomass reaction (default: 1)
numWorkers – integer indicating the number of cores to use for parallelization
rDiet – boolean indicating if to enable also rich diet simulations (default: ‘false’)
pDiet – boolean indicating if to enable also personalized diet simulations (default: ‘false’)
lowerBMBound – lower bound on community biomass (default=0.4)
upperBMBound – upper bound on community biomass (default=1)
includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provided to the models (default: true)
adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)
pruneModels – boolean indicating whether reactions that do not carry flux on the input diet should be removed from the microbe models. Recommended for large datasets (default: false)

OUTPUTS

init – status of initialization
netSecretionFluxes – Net secretion fluxes by microbiome community models
netUptakeFluxes – Net uptake fluxes by microbiome community models
Y – Classical multidimensional scaling
modelStats – Reaction and metabolite numbers for each model
summary – Table with average, median, minimal, and maximal reactions and metabolites
statistics – If info file with stratification is provided, will determine if there is a significant difference.
modelsOK – Boolean indicating if the created microbiome models passed verifyModel. If true, all models passed.

loadUncModels(modPath, organisms, objre, printLevel)[source]¶

This function loads and unconstrains metabolic models from a specific folder

USAGE

models = loadUncModels (modPath, organisms, objre)

INPUTS

organisms – nx1 cell array cell array with names of organisms in the study
modPath – char with path of directory where models are stored
objre – char with reaction name of objective function of organisms
printLevel – Verbose level (default: printLevel = 1)

OUTPUT

models – nx1 cell array cell array with models of organisms in the study

makeDummyModel(numMets, numRxns)[source]¶

Makes an empty model with numMets rows for metabolites and numRxns columns for reactions. Includes all fields that are necessary to join models.

USAGE

dummy = makeDummyModel (numMets, numRxns)

INPUTS

numMets – Number of metabolites
numRxns – Number of reactions

OUTPUT

dummy – Empty COBRA model structure

mgPipe(modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)[source]¶

mgPipe is a MATLAB based pipeline to integrate microbial abundances (coming from metagenomic data) with constraint based modeling, creating individuals’ personalized models. The pipeline is divided in 3 parts: [PART 1] Analysis of individuals’ specific microbes abundances are computed. [PART 2]: 1 Constructing a global metabolic model (setup) containing all the microbes listed in the study. 2 Building individuals’ specific models integrating abundance data retrieved from metagenomics. For each organism, reactions are coupled to the objective function. [PART 3] Simulations under different diet regimes.

USAGE

[netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = mgPipe (modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)

INPUTS

modPath – char with path of directory where models are stored
abunFilePath – char with path and name of file from which to retrieve abundance information
computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.
resPath – char with path of directory where results are saved
dietFilePath – char with path of directory where the diet is saved
infoFilePath – char with path to stratification criteria if available
biomasses – Cell array containing names of biomass objective functions of models to join. Needs to be the same length as the length of models in the abundance file.
hostPath – char with path to host model, e.g., Recon3D (default: empty)
hostBiomassRxn – char with name of biomass reaction in host (default: empty)
hostBiomassRxnFlux – double with the desired flux through the host biomass reaction (default: zero)
figForm – format to use for saving figures
numWorkers – integer indicating the number of cores to use for parallelization
rDiet – boolean indicating if to enable also rich diet simulations (default: ‘false’)
pDiet – boolean indicating if to enable also personalized diet simulations (default: ‘false’)
lowerBMBound – lower bound on community biomass (default=0.4)
upperBMBound – upper bound on community biomass (default=1)
includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provided to the models (default: true)
adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)
pruneModels – boolean indicating whether reactions that do not carry flux on the input diet should be removed from the microbe models. Recommended for large datasets (default: false)

OUTPUTS

init – status of initialization
netSecretionFluxes – Net secretion fluxes by microbiome community models
netUptakeFluxes – Net uptake fluxes by microbiome community models
Y – Classical multidimensional scaling
modelStats – Reaction and metabolite numbers for each model
summary – Table with average, median, minimal, and maximal reactions and metabolites
statistics – If info file with stratification is provided, will determine if there is a significant difference.
modelsOK – Boolean indicating if the created microbiome models passed verifyModel. If true, all models passed.

AUTHORS

Federico Baldini, 2017-2018
Almut Heinken, 07/20: converted to function
Almut Heinken, 01/21: adapted inputs

mgSimResCollect(resPath, sampNames, exchanges, rDiet, pDiet, infoFilePath, netProduction, netUptake, figForm)[source]¶

This function is called from the MgPipe pipeline. Its purpose is to compute NMPCs from simulations with different diet on multiple microbiota models. Results are outputted as .csv and a PCoA on NMPCs to group microbiota models of individuals for similar metabolic profile is also computed and outputted.

USAGE

[fSp, Y]= mgSimResCollect (resPath, sampNames, sampNames, rDiet, pDiet, infoFilePath, netProduction, figForm)

INPUTS

resPath – char with path of directory where results are saved
sampNames – nx1 cell array cell array with names of individuals in the study
exchanges – cell array with list of all unique exchanges to diet/ fecal compartment that were interrogated in simulations
rDiet – number (double) indicating if to simulate a rich diet
pDiet – number (double) indicating if a personalized diet is available and should be simulated
infoFilePath – char indicating, if stratification criteria are available, full path and name to related documentation(default: no) is available
netProduction – cell array containing FVA values for maximal uptake
figForm – char indicating the format of figures

OUTPUTS

netSecretionFluxes – cell array with computed NMPCs
netUptakeFluxes – cell array with computed uptake potential
Y – classical multexchangesimensional scaling

microbiotaModelSimulator(resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)[source]¶

This function is called from the MgPipe pipeline. Its purpose is to apply different diets (according to the user’s input) to the microbiota models and run simulations computing FVAs on exchanges reactions of the microbiota models. The output is saved in multiple .mat objects. Intermediate saving checkpoints are present.

USAGE

[exchanges, netProduction, netUptake, growthRates, infeasModels] = microbiotaModelSimulator (resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)

INPUTS

resPath – char with path of directory where results are saved
exMets – list of exchanged metabolites present in at least one microbe model that can carry flux
sampNames – cell array with names of individuals in the study
dietFilePath – path to and name of the text file with dietary information Can also be a list of the sample names with individual diet files.
hostPath – char with path to host model, e.g., Recon3D (default: empty)
hostBiomassRxn – char with name of biomass reaction in host (default: empty)
hostBiomassRxnFlux – double with the desired upper bound on flux through the host biomass reaction (default: 1)
numWorkers – integer indicating the number of cores to use for parallelization
rDiet – boolean indicating if to simulate a rich diet
pDiet – boolean indicating if a personalized diet is available and should be simulated
computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.
lowerBMBound Minimal amount of community biomass in mmol/person/day enforced (default=0.4)
upperBMBound Maximal amount of community biomass in mmol/person/day enforced (default=1)
includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provexchangesed to the models (default: true)
adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)

OUTPUTS

exchanges – cell array with list of all unique exchanges to diet/ fecal compartment that were interrogated in simulations
netProduction – cell array containing FVA values for maximal uptake and secretion for setup lumen / diet exchanges
netUptake – cell array containing FVA values for minimal uptake and secretion for setup lumen / diet exchanges
growthRates – array containing values of microbiota models objective function
infeasModels – cell array with names of infeasible microbiota models

normalizeCoverage(abunFilePath, cutoff)[source]¶

This functions normalizes the coverage in a given file with organism coverages such that they sum up to 1 for each sample.

USAGE: [normalizedCoverage,normalizedCoveragePath] = normalizeCoverage(abunFilePath,cutoff)

INPUT abunFilePath Path to table with not yet normalized relative

coverages

OPTIONAL INPUT cutoff Cutoff for normalized coverages that are

considered below detection limit, respective organisms will be removed from the samples (default: 0.0001)

OUTPUTS normalizedCoverage Table with normalized coverages normalizedCoveragePath Path to csv file with normalized coverages

parsave(fname, data)[source]¶

Saves a data variable (e.g., model) from a parfor loop - might not work in R2105b

USAGE

parsave (fname, data)

INPUTS

fname – name of file
data – name of variable

plotMappingInfo(resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)[source]¶

This function computes and automatically plots information coming from the mapping data as metabolic diversity and classical multidimensional scaling of individuals’ reactions repertoire. If the last 2 arguments are specified MDS plots will be annotated with samples and organisms names

USAGE

Y = plotMappingInfo (resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)

INPUTS

resPath – char with path of directory where results are saved
reac – nx1 cell array with all the unique set of reactions contained in the models
micRea – binary matrix assessing presence of set of unique reactions for each of the microbes
reacSet – matrix with names of reactions of each individual
reacTab – binary matrix with presence/absence of reaction per individual.
reacAbun – matrix with abundance of reaction per individual
reacNumber – number of unique reactions of each individual
infoFilePath – char indicating, if stratification criteria are available, full path and name to related documentation(default: no) is available
figForm – format to use for saving figures
sampNames – nx1 cell array cell array with names of individuals in the study
organisms – nx1 cell array cell array with names of organisms in the study

OUTPUTS

Y – classical multidimensional scaling of individuals’ reactions repertoire

retrieveModelStats(modelPath, modelList, abunFilePath, numWorkers, infoFilePath)[source]¶

This function retrieves statistics on the number of reactions and metabolites across microbiome models. If a file with stratification information on individuals is provided, it will also determine if reaction and metabolites numbers are significantly different between groups.

USAGE: [modelStats,summary,statistics]=retrieveModelStats (modelPath, modelList, numWorkers, infoFilePath)

INPUTS modelPath: Path to models for which statistics should be retrieved modelList: Cell array with names of models for which statistics

should be retrieved

abunFilePath: char with path and name of file from which to retrieve: abundance information

numWorkers: integer indicating the number of cores to use for parallelization

OPTIONAL INPUT: infoFilePath: char with path to stratification criteria if available

OUTPUT modelStats: Reaction and metabolite numbers for each model summary: Table with average, median, minimal, and maximal

reactions and metabolites

OPTIONAL OUTPUT: statistics: If info file with stratification is provided, will

determine if there is a significant difference.

translateMetagenome2AGORA(MetagenomeAbundancePath, sequencingDepth, reconstructionResource)[source]¶

Translates organism identifiers in a published metagenomic or 16S rRNA data file with organism abundances (retrieved e.g., from MetaPhlAn) to AGORA pan-model IDs. This will not catch every case since the format of input files with abundance data greatly varies. Feel free to modify this function and submit a pull request to enable more input files to be translated to AGORA. Moreover, slight spelling variations in taxa across input files may result to taxa not being mapped. Check the unmappedRows output to identify these cases and modify the function accordingly. Pan-models that can be used to create microbiome models in mgPipe can be created with the function createPanModels. Consider running the function updateTaxonomyInfoAGORA to retrieve the most recent taxonomic assignment for the taxa to map. You might need to update your input file based on updated taxonomic classifications.

USAGE

[translatedAbundances,normalizedAbundances,unmappedRows]=translateMetagenome2AGORA (MetagenomeAbundancePath,sequencingDepth,reconstructionResource)

INPUT

MetagenomeAbundancePath String containing the path to csv file with – organism abundance data retrieved from 16S rRNA or metagenomic samples (example: ‘SRP065497_taxonomy_abundances_v3.0.tsv’).

OPTIONAL INPUTS

sequencingDepth Sequencing depth on the taxonomical level – in the input data (e.g., genus, species).: Allowed inputs are ‘Species’,’Genus’, ‘Family’,’Order’, ‘Class’, ‘Phylum’. Default: ‘Species’.
reconstructionResource Name of the reconstruction resource to map: the abundances to. Allowed inputs are ‘AGORA’, ‘AGORA2’. Default: ‘AGORA’

OUTPUTS

translatedAbundances Abundances with organism names from the – input file translated to AGORA pan-model IDs
normalizedAbundances Translated abundances normalized so they sum – up to 1 for each sample
unmappedRows Taxa on the selected taxonomical level that – could not be mapped to AGORA pan-models

updateTaxonomyInfoAGORA()[source]¶: This function retrieves the newest taxonomy information for each AGORA strain from NCBI Taxonomy to facilitate mapping taxonomic assignments from metagenomic sequencing data to AGORA. An updated version of the AGORA information table is saved in spreadsheet format.