mgPipe

adaptVMHDietToAGORA(VMHDiet, setupUsed, AGORAPath)

Part of the Microbiome Modeling Toolbox. This function adapts a diet generated by the Diet Designer on https://www.vmh.life such that microbiome community models created from the AGORA resource can generate biomass. All metabolites required by at least one AGORA model are added. Note that the adapted diet that is the output of this function is specific to the AGORA resource. It is not guaranteed that other constraint-based models can produce biomass on this diet. Units are given in mmol/day/person.

USAGE:

[adaptedDietConstraints, growthOK] = adaptVMHDietToAGORA(VMHDiet, setupUsed, AGORAPath)

INPUTS:
VMHDiet: Name of text file with VMH exchange reaction IDs

and values on lower bounds generated by Diet Designer on https://www.vmh.life (or manually).

setupUsed: Model setup for which the adapted diet will be

used. Allowed inputs are AGORA (the single AGORA models), Pairwise (the microbe-microbe models generated by the pairwise modeling module), and Microbiota (the microbe community models generated by MgPipe).

OPTIONAL INPUTS:
AGORAPath: Path to the AGORA model resource. If entered,

growth of all single models on the adapted diet will be tested.

OUTPUT:
adaptedDiet: Cell array of exchange reaction IDs, values on

lower bounds, and values on upper bounds that can serve as input for the function useDiet.

OPTIONAL OUTPUT:
growthOK: Variable indicating whether all AGORA models could

grow on the adapted diet (if 1 then yes).

addMicrobeCommunityBiomass(model, microbeNames, abundances)

Adds a community biomass reaction to a model structure with multiple microbes based on their relative abundances. If no abundance values are provided, all n microbes get equal weights (1/n). Assumes a lumen compartment [u] and a fecal secretion comparment [fe]. Creates a community biomass metabolite ‘microbeBiomass’ that is secreted from [u] to [fe] and exchanged from fecal compartment.

USAGE:

model = addMicrobeCommunityBiomass(model, microbeNames, abundances)

INPUTS:
model: COBRA model structure with n joined microbes with biomass

metabolites ‘Microbe_biomass[c]’.

microbeNames: nx1 cell array of n unique strings that represent

each microbe in the model.

OPTIONAL INPUT:

abundances: nx1 vector with the relative abundance of each microbe.

OUTPUT:

model: COBRA model structure

buildModelStorage(microbeNames, modPath, dietFilePath, adaptMedium, includeHumanMets, numWorkers, pruneModels, biomasses)

This function builds the internal exchange space and the coupling constraints for models to join within mgPipe so they can be merged into microbiome models afterwards. exchanges that can never carry flux on the given diet are removed to reduce computation time.

USAGE

[activeExMets,couplingMatrix] = buildModelStorage(microbeNames,modPath,dietFilePath,adaptMedium,includeHumanMets,numWorkers,pruneModels)

INPUTS

microbeNames: list of microbe models included in the microbiome models modPath: char with path of directory where models are stored dietFilePath: char with path of directory where the diet is saved adaptMedium: boolean indicating if the medium should be adapted through the

adaptVMHDietToAGORA function or used as is (default=true)

includeHumanMets: boolean indicating if human-derived metabolites

present in the gut should be provexchangesed to the models (default: true)

numWorkers: integer indicating the number of cores to use for parallelization pruneModels: boolean indicating whether reactions that do not carry flux on the

input diet should be removed from the microbe models. Recommended for large datasets (default: false)

biomasses: Cell array containing names of biomass objective functions

of models to join. Needs to be the same length as the length of models in the abundance file.

OUTPUTS
activeExMets: list of exchanged metabolites present in at

least one microbe model that can carry flux

couplingMatrix: matrix containing coupling constraints for each model to join

createPanModels(agoraPath, panPath, taxonLevel, numWorkers, taxTable)

This function creates pan-models for all unique taxa (e.g., species) included in the AGORA resource. If reconstructions of multiple strains in a given taxon are present, the reactions in these reconstructions will be combined into a pan-reconstruction. The pan-biomass reactions will be built from the average of all biomasses. Futile cycles that result from the newly combined reaction content are removed by setting certain reactions irreversible. These reactions have been determined manually. NOTE: Futile cycle removal has only been tested at the species and genus level. Pan-models at higher taxonomical levels (e.g., family) may contain futile cycles and produce unrealistically high ATP flux. The pan-models can be used an input for mgPipe if taxon abundance data is available at a higher level than strain, e.g., species, genus.

USAGE:

createPanModels(agoraPath,panPath,taxonLevel)

INPUTS:
agoraPath String containing the path to the AGORA reconstructions.

Must end with a file separator.

panPath String containing the path to an empty folder that the

created pan-models will be stored in. Must end with a file separator.

taxonLevel String with desired taxonomical level of the pan-models.

Allowed inputs are ‘Species’,’Genus’,’Family’,’Order’, ‘Class’,’Phylum’.

OPTIONAL INPUTS

numWorkers Number of workers for parallel pool (default: no pool) taxTable File with information on taxonomy of reconstruction

resource (default: ‘AGORA_infoFile.xlsx’)

createPersonalizedModel(abundance, resPath, model, sampNames, orglist, couplingMatrix, host, hostBiomassRxn)

This function creates personalized models from integration of given organisms abundances into the previously built global setup. Coupling constraints are also added for each organism. All the operations are parallelized and the generated personalized models directly saved in .mat format.

USAGE:

[createdModels] = createPersonalizedModel(abundance, resPath, model, sampNames, orglist, host, hostBiomassRxn)

INPUTS:

abundance: table with abundance information resPath: char with path of directory where results are saved model: model in COBRA model structure format sampNames: cell array with names of individuals in the study orglist: cell array with names of organisms in the study couplingMatrix: cell array containing pre-created coupling matrices for

each organism to be joined (created by buildModelStorage function)

host: Contains the host model if path to host model was

defined. Otherwise empty.

hostBiomassRxn: char with name of biomass reaction in host (default: empty)

OUTPUT:

createdModels: created personalized models

detectOutput(resPath, objNam)

This function checks the existence of a specific file in the results folder.

USAGE:

mapP = detectOutput(resPath, objNam)

INPUTS:

resPath: char with path of directory where results are saved objNam: char with name of object to find in the results folder

OUTPUTS:

mapP: double indicating if object was found in the result folder

extractFullRes(resPath, ID, dietType, sampName, fvaCt, nsCt)

This function is called from the MgPipe pipeline. Its purpose is to retrieve and export, in a comprehensive way, all the results (fluxes) computed during the simulations for a specified diet. Since FVA is computed on diet and fecal exchanges, every metabolite will have four different values for each individual, values corresponding min and max of uptake and secretion.

USAGE:

[finRes]= extractFullRes(resPath, ID, dietType, sampName, fvaCt, nsCt)

INPUTS:

resPath: char with path of directory where results are saved ID: cell array with list of all unique Exchanges to diet/

fecal compartment

dietType: char indicating under which diet to extract results:

rDiet (rich diet), sDiet(previously specified diet) set by default, and pDiet(personalized)if available

sampName: nx1 cell array cell array with names of individuals in the study fvaCt: cell array containing FVA values for maximal uptake nsCt: cell array containing FVA values for minimal uptake

and secretion for setup lumen / diet exchanges

OUTPUTS:
finRes: cell array with min and max value of uptake and

secretion for each metabolite

fastSetupCreator(exMets, microbeNames, host)

creates a microbiota model (min 1 microbe) that can be coupled with a host model. Microbes and host are connected with a lumen compartment [u], host can secrete metabolites into body fluids [b]. Diet is simulated as uptake through the compartment [d], transporters are unidirectional from [d] to [u]. Secretion goes through the fecal compartment [fe], transporters are unidirectional from [u] to [fe]. Reaction types Diet exchange: ‘EX_met[d]’: ‘met[d] <=>’ Diet transporter: ‘DUt_met’: ‘met[d] -> met[u]’ Fecal transporter: ‘UFEt_met’: ‘met[u] -> met[fe]’ Fecal exchanges: ‘EX_met[fe]’: ‘met[fe] <=>’ Microbe uptake/secretion: ‘Microbe_IEX_met[c]tr’: ‘Microbe_met[c] <=> met[u]’ Host uptake/secretion lumen: ‘Host_IEX_met[c]tr’: ‘Host_met[c] <=> met[u]’ Host exchange body fluids: ‘Host_EX_met(e)b’: ‘Host_met[b] <=>’

INPUTS:
exMets: cell array with all unique extracellular

metabolites contained in the models

microbeNames: nx1 cell array of n unique strings that represent

each microbe model. Reactions and metabolites of each microbe will get the corresponding microbeNames (e.g., ‘Ecoli’) prefix. Reactions will be named ‘Ecoli_RxnAbbr’ and metabolites ‘Ecoli_MetAbbr[c]’).

host: Host COBRA model structure, can be left empty if

there is no host model

OUTPUT:

model: COBRA model structure with all models combined

getIndividualSizeName(abunFilePath, modPath)

This function automatically detects organisms, names and number of individuals present in the study.

USAGE:

[sampNames, organisms, exMets] = getIndividualSizeName(abunFilePath,modPath)

INPUTS:

abunFilePath: char with path and name of file from which to retrieve information modPath: char with path of directory where models are stored

OUTPUTS:

sampNames: nx1 cell array cell array with names of individuals in the study organisms: nx1 cell array cell array with names of organisms in the study exMets: cell array with all unique extracellular metabolites

contained in the models

getMappingInfo(modPath, organisms, abunFilePath)

This function automatically extracts information from strain abundances in different individuals and combines this information into different tables.

USAGE:

[reac, exMets, micRea, binOrg, patOrg, reacPat, reacNumb, reacSet, reacTab, reacAbun, reacNumber] = getMappingInfo(modPath, organisms, abunFilePath, patNumb)

INPUTS:

organisms: nx1 cell array cell array with names of organisms in the study modPath: char with path of directory where models are stored abunFilePath: char with path and name of file from which to retrieve abundance information patNumb: number of individuals in the study

OUTPUTS:
reac: cell array with all the unique set of reactions

contained in the models

exMets: cell array with all unique extracellular metabolites

contained in the models

micRea: binary matrix assessing presence of set of unique

reactions for each of the microbes

binOrg: binary matrix assessing presence of specific strains in

different individuals

reacPat: matrix with number of reactions per individual

(organism resolved)

reacSet: matrix with names of reactions of each individual reacTab: char with names of individuals in the study reacAbun: binary matrix with presence/absence of reaction per

individual: to compare different individuals

reacNumber: number of unique reactions of each individual

guidedSim(model, rl)

This function is part of the MgPipe pipeline and runs FVAs on a series of selected reactions with different possible FVA functions. Solver is automatically set to ‘cplex’, objective function is maximized, and optPercentage set to 99.99.

USAGE:

[minFlux, maxFlux] = guidedSim(model, fvaType, rl)

INPUTS:
model: COBRA model structure with n joined microbes with biomass

metabolites ‘Microbe_biomass[c]’.

rl: nx1 vector with the reactions of interest. solver: char with slver name to use.

OUTPUTS:

minFlux: Minimum flux for each reaction maxFlux: Maximum flux for each reaction

..Author: Federico Baldini, 2017-2018

initMgPipe(modPath, abunFilePath, computeProfiles, varargin)

This function initializes the mgPipe pipeline and sets the optional input variables if not defined.

USAGE

[init, netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = initMgPipe(modPath, abunFilePath, computeProfiles, varargin)

INPUTS:

modPath: char with path of directory where models are stored abunFilePath: char with path and name of file from which to retrieve abundance information computeProfiles: boolean defining whether flux variability analysis to

compute the metabolic profiles should be performed.

OPTIONAL INPUTS:

resPath: char with path of directory where results are saved dietFilePath: char with path of directory where the diet is saved.

Can also be a character array with a separate diet for each individual, in that case, size(dietFilePath,1) needs to equal the length of samples, and the first row needs to be sample names and the second row needs to be the respective files with diet information.

infoFilePath: char with path to stratification criteria if available biomasses: Cell array containing names of biomass objective functions

of models to join. Needs to be the same length as the length of models in the abundance file.

hostPath: char with path to host model, e.g., Recon3D (default: empty) hostBiomassRxn: char with name of biomass reaction in host (default: empty) hostBiomassRxnFlux: double with the desired upper bound on flux through the host

biomass reaction (default: 1)

numWorkers: integer indicating the number of cores to use for parallelization rDiet: boolean indicating if to enable also rich diet simulations (default: ‘false’) pDiet: boolean indicating if to enable also personalized diet simulations (default: ‘false’) lowerBMBound: lower bound on community biomass (default=0.4) upperBMBound: upper bound on community biomass (default=1) includeHumanMets: boolean indicating if human-derived metabolites

present in the gut should be provided to the models (default: true)

adaptMedium: boolean indicating if the medium should be adapted through the

adaptVMHDietToAGORA function or used as is (default=true)

pruneModels: boolean indicating whether reactions that do not carry flux on the

input diet should be removed from the microbe models. Recommended for large datasets (default: false)

OUTPUTS:

init: status of initialization netSecretionFluxes: Net secretion fluxes by microbiome community models netUptakeFluxes: Net uptake fluxes by microbiome community models Y: Classical multidimensional scaling modelStats: Reaction and metabolite numbers for each model summary: Table with average, median, minimal, and maximal

reactions and metabolites

statistics: If info file with stratification is provided, will

determine if there is a significant difference.

modelsOK: Boolean indicating if the created microbiome models

passed verifyModel. If true, all models passed.

loadUncModels(modPath, organisms, objre, printLevel)

This function loads and unconstrains metabolic models from a specific folder

USAGE:

models = loadUncModels(modPath, organisms, objre)

INPUTS:

organisms: nx1 cell array cell array with names of organisms in the study modPath: char with path of directory where models are stored objre: char with reaction name of objective function of organisms printLevel: Verbose level (default: printLevel = 1)

OUTPUT:

models: nx1 cell array cell array with models of organisms in the study

makeDummyModel(numMets, numRxns)

Makes an empty model with numMets rows for metabolites and numRxns columns for reactions. Includes all fields that are necessary to join models.

USAGE:

dummy = makeDummyModel(numMets, numRxns)

INPUTS:

numMets: Number of metabolites numRxns: Number of reactions

OUTPUT:

dummy: Empty COBRA model structure

mgPipe(modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)

mgPipe is a MATLAB based pipeline to integrate microbial abundances (coming from metagenomic data) with constraint based modeling, creating individuals’ personalized models. The pipeline is divided in 3 parts: [PART 1] Analysis of individuals’ specific microbes abundances are computed. [PART 2]: 1 Constructing a global metabolic model (setup) containing all the microbes listed in the study. 2 Building individuals’ specific models integrating abundance data retrieved from metagenomics. For each organism, reactions are coupled to the objective function. [PART 3] Simulations under different diet regimes.

USAGE:

[netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = mgPipe(modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)

INPUTS:

modPath: char with path of directory where models are stored abunFilePath: char with path and name of file from which to retrieve abundance information computeProfiles: boolean defining whether flux variability analysis to

compute the metabolic profiles should be performed.

resPath: char with path of directory where results are saved dietFilePath: char with path of directory where the diet is saved infoFilePath: char with path to stratification criteria if available biomasses: Cell array containing names of biomass objective functions

of models to join. Needs to be the same length as the length of models in the abundance file.

hostPath: char with path to host model, e.g., Recon3D (default: empty) hostBiomassRxn: char with name of biomass reaction in host (default: empty) hostBiomassRxnFlux: double with the desired flux through the host

biomass reaction (default: zero)

figForm: format to use for saving figures numWorkers: integer indicating the number of cores to use for parallelization rDiet: boolean indicating if to enable also rich diet simulations (default: ‘false’) pDiet: boolean indicating if to enable also personalized diet simulations (default: ‘false’) lowerBMBound: lower bound on community biomass (default=0.4) upperBMBound: upper bound on community biomass (default=1) includeHumanMets: boolean indicating if human-derived metabolites

present in the gut should be provided to the models (default: true)

adaptMedium: boolean indicating if the medium should be adapted through the

adaptVMHDietToAGORA function or used as is (default=true)

pruneModels: boolean indicating whether reactions that do not carry flux on the

input diet should be removed from the microbe models. Recommended for large datasets (default: false)

OUTPUTS:

init: status of initialization netSecretionFluxes: Net secretion fluxes by microbiome community models netUptakeFluxes: Net uptake fluxes by microbiome community models Y: Classical multidimensional scaling modelStats: Reaction and metabolite numbers for each model summary: Table with average, median, minimal, and maximal

reactions and metabolites

statistics: If info file with stratification is provided, will

determine if there is a significant difference.

modelsOK: Boolean indicating if the created microbiome models

passed verifyModel. If true, all models passed.

AUTHORS:
  • Federico Baldini, 2017-2018

  • Almut Heinken, 07/20: converted to function

  • Almut Heinken, 01/21: adapted inputs

mgSimResCollect(resPath, sampNames, exchanges, rDiet, pDiet, infoFilePath, netProduction, netUptake, figForm)

This function is called from the MgPipe pipeline. Its purpose is to compute NMPCs from simulations with different diet on multiple microbiota models. Results are outputted as .csv and a PCoA on NMPCs to group microbiota models of individuals for similar metabolic profile is also computed and outputted.

USAGE:

[fSp, Y]= mgSimResCollect(resPath, sampNames, sampNames, rDiet, pDiet, infoFilePath, netProduction, figForm)

INPUTS:

resPath: char with path of directory where results are saved sampNames: nx1 cell array cell array with names of individuals in the study exchanges: cell array with list of all unique exchanges to diet/

fecal compartment that were interrogated in simulations

rDiet: number (double) indicating if to simulate a rich diet pDiet: number (double) indicating if a personalized diet

is available and should be simulated

infoFilePath: char indicating, if stratification criteria are available,

full path and name to related documentation(default: no) is available

netProduction: cell array containing FVA values for maximal uptake figForm: char indicating the format of figures

OUTPUTS:

netSecretionFluxes: cell array with computed NMPCs netUptakeFluxes: cell array with computed uptake potential Y: classical multexchangesimensional scaling

microbiotaModelSimulator(resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)

This function is called from the MgPipe pipeline. Its purpose is to apply different diets (according to the user’s input) to the microbiota models and run simulations computing FVAs on exchanges reactions of the microbiota models. The output is saved in multiple .mat objects. Intermediate saving checkpoints are present.

USAGE:

[exchanges, netProduction, netUptake, growthRates, infeasModels] = microbiotaModelSimulator(resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)

INPUTS:

resPath: char with path of directory where results are saved exMets: list of exchanged metabolites present in at least

one microbe model that can carry flux

sampNames: cell array with names of individuals in the study dietFilePath: path to and name of the text file with dietary information

Can also be a list of the sample names with individual diet files.

hostPath: char with path to host model, e.g., Recon3D (default: empty) hostBiomassRxn: char with name of biomass reaction in host (default: empty) hostBiomassRxnFlux: double with the desired upper bound on flux through the host

biomass reaction (default: 1)

numWorkers: integer indicating the number of cores to use for parallelization rDiet: boolean indicating if to simulate a rich diet pDiet: boolean indicating if a personalized diet

is available and should be simulated

computeProfiles: boolean defining whether flux variability analysis to

compute the metabolic profiles should be performed.

lowerBMBound Minimal amount of community biomass in mmol/person/day enforced (default=0.4) upperBMBound Maximal amount of community biomass in mmol/person/day enforced (default=1) includeHumanMets: boolean indicating if human-derived metabolites

present in the gut should be provexchangesed to the models (default: true)

adaptMedium: boolean indicating if the medium should be adapted through the

adaptVMHDietToAGORA function or used as is (default=true)

OUTPUTS:
exchanges: cell array with list of all unique exchanges to diet/

fecal compartment that were interrogated in simulations

netProduction: cell array containing FVA values for maximal uptake

and secretion for setup lumen / diet exchanges

netUptake: cell array containing FVA values for minimal uptake

and secretion for setup lumen / diet exchanges

growthRates: array containing values of microbiota models

objective function

infeasModels: cell array with names of infeasible microbiota models

normalizeCoverage(abunFilePath, cutoff)

This functions normalizes the coverage in a given file with organism coverages such that they sum up to 1 for each sample.

USAGE

[normalizedCoverage,normalizedCoveragePath] = normalizeCoverage(abunFilePath,cutoff)

INPUT abunFilePath Path to table with not yet normalized relative

coverages

OPTIONAL INPUT cutoff Cutoff for normalized coverages that are

considered below detection limit, respective organisms will be removed from the samples (default: 0.0001)

OUTPUTS normalizedCoverage Table with normalized coverages normalizedCoveragePath Path to csv file with normalized coverages

parsave(fname, data)

Saves a data variable (e.g., model) from a parfor loop - might not work in R2105b

USAGE:

parsave(fname, data)

INPUTS:

fname: name of file data: name of variable

plotMappingInfo(resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)

This function computes and automatically plots information coming from the mapping data as metabolic diversity and classical multidimensional scaling of individuals’ reactions repertoire. If the last 2 arguments are specified MDS plots will be annotated with samples and organisms names

USAGE:

Y = plotMappingInfo(resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)

INPUTS:

resPath: char with path of directory where results are saved reac: nx1 cell array with all the unique set of reactions

contained in the models

micRea: binary matrix assessing presence of set of unique

reactions for each of the microbes

reacSet: matrix with names of reactions of each individual reacTab: binary matrix with presence/absence of reaction per

individual.

reacAbun: matrix with abundance of reaction per individual reacNumber: number of unique reactions of each individual infoFilePath: char indicating, if stratification criteria are available,

full path and name to related documentation(default: no) is available

figForm: format to use for saving figures sampNames: nx1 cell array cell array with names of individuals in the study organisms: nx1 cell array cell array with names of organisms in the study

OUTPUTS:
Y: classical multidimensional scaling of individuals’

reactions repertoire

retrieveModelStats(modelPath, modelList, abunFilePath, numWorkers, infoFilePath)

This function retrieves statistics on the number of reactions and metabolites across microbiome models. If a file with stratification information on individuals is provided, it will also determine if reaction and metabolites numbers are significantly different between groups.

USAGE:

[modelStats,summary,statistics]=retrieveModelStats(modelPath, modelList, numWorkers, infoFilePath)

INPUTS modelPath: Path to models for which statistics should be retrieved modelList: Cell array with names of models for which statistics

should be retrieved

abunFilePath: char with path and name of file from which to retrieve

abundance information

numWorkers: integer indicating the number of cores to use for parallelization

OPTIONAL INPUT: infoFilePath: char with path to stratification criteria if available

OUTPUT modelStats: Reaction and metabolite numbers for each model summary: Table with average, median, minimal, and maximal

reactions and metabolites

OPTIONAL OUTPUT: statistics: If info file with stratification is provided, will

determine if there is a significant difference.

translateMetagenome2AGORA(MetagenomeAbundancePath, sequencingDepth, reconstructionResource)

Translates organism identifiers in a published metagenomic or 16S rRNA data file with organism abundances (retrieved e.g., from MetaPhlAn) to AGORA pan-model IDs. This will not catch every case since the format of input files with abundance data greatly varies. Feel free to modify this function and submit a pull request to enable more input files to be translated to AGORA. Moreover, slight spelling variations in taxa across input files may result to taxa not being mapped. Check the unmappedRows output to identify these cases and modify the function accordingly. Pan-models that can be used to create microbiome models in mgPipe can be created with the function createPanModels. Consider running the function updateTaxonomyInfoAGORA to retrieve the most recent taxonomic assignment for the taxa to map. You might need to update your input file based on updated taxonomic classifications.

USAGE:

[translatedAbundances,normalizedAbundances,unmappedRows]=translateMetagenome2AGORA(MetagenomeAbundancePath,sequencingDepth,reconstructionResource)

INPUT:
MetagenomeAbundancePath String containing the path to csv file with

organism abundance data retrieved from 16S rRNA or metagenomic samples (example: ‘SRP065497_taxonomy_abundances_v3.0.tsv’).

OPTIONAL INPUTS:
sequencingDepth Sequencing depth on the taxonomical level

in the input data (e.g., genus, species). Allowed inputs are ‘Species’,’Genus’, ‘Family’,’Order’, ‘Class’, ‘Phylum’. Default: ‘Species’.

reconstructionResource Name of the reconstruction resource to map

the abundances to. Allowed inputs are ‘AGORA’, ‘AGORA2’. Default: ‘AGORA’

OUTPUTS:
translatedAbundances Abundances with organism names from the

input file translated to AGORA pan-model IDs

normalizedAbundances Translated abundances normalized so they sum

up to 1 for each sample

unmappedRows Taxa on the selected taxonomical level that

could not be mapped to AGORA pan-models

updateTaxonomyInfoAGORA()

This function retrieves the newest taxonomy information for each AGORA strain from NCBI Taxonomy to facilitate mapping taxonomic assignments from metagenomic sequencing data to AGORA. An updated version of the AGORA information table is saved in spreadsheet format.