Mgpipe¶
- adaptVMHDietToAGORA(VMHDiet, setupUsed, AGORAPath)[source]¶
Part of the Microbiome Modeling Toolbox. This function adapts a diet generated by the Diet Designer on https://www.vmh.life such that microbiome community models created from the AGORA resource can generate biomass. All metabolites required by at least one AGORA model are added. Note that the adapted diet that is the output of this function is specific to the AGORA resource. It is not guaranteed that other constraint-based models can produce biomass on this diet. Units are given in mmol/day/person.
- USAGE
[adaptedDietConstraints, growthOK] = adaptVMHDietToAGORA (VMHDiet, setupUsed, AGORAPath)
- INPUTS
VMHDiet – Name of text file with VMH exchange reaction IDs and values on lower bounds generated by Diet Designer on https://www.vmh.life (or manually).
setupUsed – Model setup for which the adapted diet will be used. Allowed inputs are AGORA (the single AGORA models), Pairwise (the microbe-microbe models generated by the pairwise modeling module), and Microbiota (the microbe community models generated by MgPipe).
- OPTIONAL INPUTS
AGORAPath – Path to the AGORA model resource. If entered, growth of all single models on the adapted diet will be tested.
- OUTPUT
adaptedDiet – Cell array of exchange reaction IDs, values on lower bounds, and values on upper bounds that can serve as input for the function useDiet.
- OPTIONAL OUTPUT:
- growthOK: Variable indicating whether all AGORA models could
grow on the adapted diet (if 1 then yes).
- addMicrobeCommunityBiomass(model, microbeNames, abundances)[source]¶
Adds a community biomass reaction to a model structure with multiple microbes based on their relative abundances. If no abundance values are provided, all n microbes get equal weights (1/n). Assumes a lumen compartment [u] and a fecal secretion comparment [fe]. Creates a community biomass metabolite ‘microbeBiomass’ that is secreted from [u] to [fe] and exchanged from fecal compartment.
- USAGE
model = addMicrobeCommunityBiomass (model, microbeNames, abundances)
- INPUTS
model – COBRA model structure with n joined microbes with biomass metabolites ‘Microbe_biomass[c]’.
microbeNames – nx1 cell array of n unique strings that represent each microbe in the model.
- OPTIONAL INPUT
abundances – nx1 vector with the relative abundance of each microbe.
- OUTPUT
model – COBRA model structure
- buildModelStorage(microbeNames, modPath, dietFilePath, adaptMedium, includeHumanMets, numWorkers, pruneModels, biomasses)[source]¶
This function builds the internal exchange space and the coupling constraints for models to join within mgPipe so they can be merged into microbiome models afterwards. exchanges that can never carry flux on the given diet are removed to reduce computation time.
- USAGE
[activeExMets,couplingMatrix] = buildModelStorage(microbeNames,modPath,dietFilePath,adaptMedium,includeHumanMets,numWorkers,pruneModels)
- INPUTS
microbeNames: list of microbe models included in the microbiome models modPath: char with path of directory where models are stored dietFilePath: char with path of directory where the diet is saved adaptMedium: boolean indicating if the medium should be adapted through the
adaptVMHDietToAGORA function or used as is (default=true)
- includeHumanMets: boolean indicating if human-derived metabolites
present in the gut should be provexchangesed to the models (default: true)
numWorkers: integer indicating the number of cores to use for parallelization pruneModels: boolean indicating whether reactions that do not carry flux on the
input diet should be removed from the microbe models. Recommended for large datasets (default: false)
- biomasses: Cell array containing names of biomass objective functions
of models to join. Needs to be the same length as the length of models in the abundance file.
- OUTPUTS
- activeExMets: list of exchanged metabolites present in at
least one microbe model that can carry flux
couplingMatrix: matrix containing coupling constraints for each model to join
- createPanModels(agoraPath, panPath, taxonLevel, agoraVersion, numWorkers, builtTaxa)[source]¶
This function creates pan-models for all unique taxa (e.g., species) included in the AGORA resource. If reconstructions of multiple strains in a given taxon are present, the reactions in these reconstructions will be combined into a pan-reconstruction. The pan-biomass reactions will be built from the average of all biomasses. Futile cycles that result from the newly combined reaction content are removed by setting certain reactions irreversible. These reactions have been determined manually. NOTE: Futile cycle removal has only been tested at the species and genus level. Pan-models at higher taxonomical levels (e.g., family) may contain futile cycles and produce unrealistically high ATP flux. The pan-models can be used an input for mgPipe if taxon abundance data is available at a higher level than strain, e.g., species, genus.
- USAGE
createPanModels (agoraPath,panPath,taxonLevel, agoraVersion, numWorkers)
- INPUTS
agoraPath String containing the path to the AGORA reconstructions. – Must end with a file separator.
panPath String containing the path to an empty folder that the – created pan-models will be stored in. Must end with a file separator.
taxonLevel String with desired taxonomical level of the pan-models. – Allowed inputs are ‘Species’,’Genus’,’Family’,’Order’, ‘Class’,’Phylum’.
agoraVersion Version of AGORA that will be used (allowed inputs – ‘AGORA’, ‘AGORA2’, alternatively: path to custom table with reconstruction information)
- OPTIONAL INPUTS
numWorkers Number of workers for parallel pool (default: no pool) builtTaxa Names of taxa in table that will be built (default:
all). Need to be entered as a cell array of strings with names written exactly as in the corresponding column in the table.
- createPersonalizedModel(abundance, resPath, model, sampNames, orglist, couplingMatrix, host, hostBiomassRxn)[source]¶
This function creates personalized models from integration of given organisms abundances into the previously built global setup. Coupling constraints are also added for each organism. All the operations are parallelized and the generated personalized models directly saved in .mat format.
- USAGE
[createdModels] = createPersonalizedModel (abundance, resPath, model, sampNames, orglist, host, hostBiomassRxn)
- INPUTS
abundance – table with abundance information
resPath – char with path of directory where results are saved
model – model in COBRA model structure format
sampNames – cell array with names of individuals in the study
orglist – cell array with names of organisms in the study
couplingMatrix – cell array containing pre-created coupling matrices for each organism to be joined (created by buildModelStorage function)
host – Contains the host model if path to host model was defined. Otherwise empty.
hostBiomassRxn – char with name of biomass reaction in host (default: empty)
- OUTPUT
createdModels – created personalized models
- detectOutput(resPath, objNam)[source]¶
This function checks the existence of a specific file in the results folder.
- USAGE
mapP = detectOutput (resPath, objNam)
- INPUTS
resPath – char with path of directory where results are saved
objNam – char with name of object to find in the results folder
- OUTPUTS
mapP – double indicating if object was found in the result folder
- extractFullRes(resPath, ID, dietType, sampName, fvaCt, nsCt)[source]¶
This function is called from the MgPipe pipeline. Its purpose is to retrieve and export, in a comprehensive way, all the results (fluxes) computed during the simulations for a specified diet. Since FVA is computed on diet and fecal exchanges, every metabolite will have four different values for each individual, values corresponding min and max of uptake and secretion.
- USAGE
[finRes]= extractFullRes (resPath, ID, dietType, sampName, fvaCt, nsCt)
- INPUTS
resPath – char with path of directory where results are saved
ID – cell array with list of all unique Exchanges to diet/ fecal compartment
dietType – char indicating under which diet to extract results: rDiet (rich diet), sDiet(previously specified diet) set by default, and pDiet(personalized)if available
sampName – nx1 cell array cell array with names of individuals in the study
fvaCt – cell array containing FVA values for maximal uptake
nsCt – cell array containing FVA values for minimal uptake and secretion for setup lumen / diet exchanges
- OUTPUTS
finRes – cell array with min and max value of uptake and secretion for each metabolite
- fastSetupCreator(exMets, microbeNames, host)[source]¶
creates a microbiota model (min 1 microbe) that can be coupled with a host model. Microbes and host are connected with a lumen compartment [u], host can secrete metabolites into body fluids [b]. Diet is simulated as uptake through the compartment [d], transporters are unidirectional from [d] to [u]. Secretion goes through the fecal compartment [fe], transporters are unidirectional from [u] to [fe]. Reaction types Diet exchange: ‘EX_met[d]’: ‘met[d] <=>’ Diet transporter: ‘DUt_met’: ‘met[d] -> met[u]’ Fecal transporter: ‘UFEt_met’: ‘met[u] -> met[fe]’ Fecal exchanges: ‘EX_met[fe]’: ‘met[fe] <=>’ Microbe uptake/secretion: ‘Microbe_IEX_met[c]tr’: ‘Microbe_met[c] <=> met[u]’ Host uptake/secretion lumen: ‘Host_IEX_met[c]tr’: ‘Host_met[c] <=> met[u]’ Host exchange body fluids: ‘Host_EX_met(e)b’: ‘Host_met[b] <=>’
- INPUTS
exMets –
- cell array with all unique extracellular
metabolites contained in the models
- microbeNames: nx1 cell array of n unique strings that represent
each microbe model. Reactions and metabolites of each microbe will get the corresponding microbeNames (e.g., ‘Ecoli’) prefix. Reactions will be named ‘Ecoli_RxnAbbr’ and metabolites ‘Ecoli_MetAbbr[c]’).
- host: Host COBRA model structure, can be left empty if
there is no host model
- OUTPUT
model – COBRA model structure with all models combined
- getIndividualSizeName(abunFilePath, modPath)[source]¶
This function automatically detects organisms, names and number of individuals present in the study.
- USAGE
[sampNames, organisms, exMets] = getIndividualSizeName (abunFilePath,modPath)
- INPUTS
abunFilePath – char with path and name of file from which to retrieve information
modPath – char with path of directory where models are stored
- OUTPUTS
sampNames – nx1 cell array cell array with names of individuals in the study
organisms – nx1 cell array cell array with names of organisms in the study
exMets – cell array with all unique extracellular metabolites contained in the models
- getMappingInfo(modPath, organisms, abunFilePath)[source]¶
This function automatically extracts information from strain abundances in different individuals and combines this information into different tables.
- USAGE
[reac, exMets, micRea, binOrg, patOrg, reacPat, reacNumb, reacSet, reacTab, reacAbun, reacNumber] = getMappingInfo (modPath, organisms, abunFilePath, patNumb)
- INPUTS
organisms – nx1 cell array cell array with names of organisms in the study
modPath – char with path of directory where models are stored
abunFilePath – char with path and name of file from which to retrieve abundance information
patNumb – number of individuals in the study
- OUTPUTS
reac – cell array with all the unique set of reactions contained in the models
exMets – cell array with all unique extracellular metabolites contained in the models
micRea – binary matrix assessing presence of set of unique reactions for each of the microbes
binOrg – binary matrix assessing presence of specific strains in different individuals
reacPat – matrix with number of reactions per individual (organism resolved)
reacSet – matrix with names of reactions of each individual
reacTab – char with names of individuals in the study
reacAbun – binary matrix with presence/absence of reaction per individual: to compare different individuals
reacNumber – number of unique reactions of each individual
- guidedSim(model, rl)[source]¶
This function is part of the MgPipe pipeline and runs FVAs on a series of selected reactions with different possible FVA functions. Solver is automatically set to ‘cplex’, objective function is maximized, and optPercentage set to 99.99.
- USAGE
[minFlux, maxFlux] = guidedSim (model, fvaType, rl)
- INPUTS
model – COBRA model structure with n joined microbes with biomass metabolites ‘Microbe_biomass[c]’.
rl – nx1 vector with the reactions of interest.
solver – char with slver name to use.
- OUTPUTS
minFlux – Minimum flux for each reaction
maxFlux – Maximum flux for each reaction
..Author: Federico Baldini, 2017-2018
- initMgPipe(modPath, abunFilePath, computeProfiles, varargin)[source]¶
This function initializes the mgPipe pipeline and sets the optional input variables if not defined.
- USAGE
[init, netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = initMgPipe(modPath, abunFilePath, computeProfiles, varargin)
- INPUTS
modPath – char with path of directory where models are stored
abunFilePath – char with path and name of file from which to retrieve abundance information
computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.
- OPTIONAL INPUTS
resPath – char with path of directory where results are saved
dietFilePath – char with path of directory where the diet is saved. Can also be a character array with a separate diet for each individual, in that case, size(dietFilePath,1) needs to equal the length of samples, and the first row needs to be sample names and the second row needs to be the respective files with diet information.
infoFilePath – char with path to stratification criteria if available
biomasses – Cell array containing names of biomass objective functions of models to join. Needs to be the same length as the length of models in the abundance file.
hostPath – char with path to host model, e.g., Recon3D (default: empty)
hostBiomassRxn – char with name of biomass reaction in host (default: empty)
hostBiomassRxnFlux – double with the desired upper bound on flux through the host biomass reaction (default: 1)
numWorkers – integer indicating the number of cores to use for parallelization
rDiet – boolean indicating if to enable also rich diet simulations (default: ‘false’)
pDiet – boolean indicating if to enable also personalized diet simulations (default: ‘false’)
lowerBMBound – lower bound on community biomass (default=0.4)
upperBMBound – upper bound on community biomass (default=1)
includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provided to the models (default: true)
adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)
pruneModels – boolean indicating whether reactions that do not carry flux on the input diet should be removed from the microbe models. Recommended for large datasets (default: false)
- OUTPUTS
init – status of initialization
netSecretionFluxes – Net secretion fluxes by microbiome community models
netUptakeFluxes – Net uptake fluxes by microbiome community models
Y – Classical multidimensional scaling
modelStats – Reaction and metabolite numbers for each model
summary – Table with average, median, minimal, and maximal reactions and metabolites
statistics – If info file with stratification is provided, will determine if there is a significant difference.
modelsOK – Boolean indicating if the created microbiome models passed verifyModel. If true, all models passed.
- loadUncModels(modPath, organisms, objre, printLevel)[source]¶
This function loads and unconstrains metabolic models from a specific folder
- USAGE
models = loadUncModels (modPath, organisms, objre)
- INPUTS
organisms – nx1 cell array cell array with names of organisms in the study
modPath – char with path of directory where models are stored
objre – char with reaction name of objective function of organisms
printLevel – Verbose level (default: printLevel = 1)
- OUTPUT
models – nx1 cell array cell array with models of organisms in the study
- makeDummyModel(numMets, numRxns)[source]¶
Makes an empty model with numMets rows for metabolites and numRxns columns for reactions. Includes all fields that are necessary to join models.
- USAGE
dummy = makeDummyModel (numMets, numRxns)
- INPUTS
numMets – Number of metabolites
numRxns – Number of reactions
- OUTPUT
dummy – Empty COBRA model structure
- mgPipe(modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)[source]¶
mgPipe is a MATLAB based pipeline to integrate microbial abundances (coming from metagenomic data) with constraint based modeling, creating individuals’ personalized models. The pipeline is divided in 3 parts: [PART 1] Analysis of individuals’ specific microbes abundances are computed. [PART 2]: 1 Constructing a global metabolic model (setup) containing all the microbes listed in the study. 2 Building individuals’ specific models integrating abundance data retrieved from metagenomics. For each organism, reactions are coupled to the objective function. [PART 3] Simulations under different diet regimes.
- USAGE
[netSecretionFluxes, netUptakeFluxes, Y, modelStats, summary, statistics, modelsOK] = mgPipe (modPath, abunFilePath, computeProfiles, resPath, dietFilePath, infoFilePath, biomasses, hostPath, hostBiomassRxn, hostBiomassRxnFlux, figForm, numWorkers, rDiet, pDiet, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium, pruneModels)
- INPUTS
modPath – char with path of directory where models are stored
abunFilePath – char with path and name of file from which to retrieve abundance information
computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.
resPath – char with path of directory where results are saved
dietFilePath – char with path of directory where the diet is saved
infoFilePath – char with path to stratification criteria if available
biomasses – Cell array containing names of biomass objective functions of models to join. Needs to be the same length as the length of models in the abundance file.
hostPath – char with path to host model, e.g., Recon3D (default: empty)
hostBiomassRxn – char with name of biomass reaction in host (default: empty)
hostBiomassRxnFlux – double with the desired flux through the host biomass reaction (default: zero)
figForm – format to use for saving figures
numWorkers – integer indicating the number of cores to use for parallelization
rDiet – boolean indicating if to enable also rich diet simulations (default: ‘false’)
pDiet – boolean indicating if to enable also personalized diet simulations (default: ‘false’)
lowerBMBound – lower bound on community biomass (default=0.4)
upperBMBound – upper bound on community biomass (default=1)
includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provided to the models (default: true)
adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)
pruneModels – boolean indicating whether reactions that do not carry flux on the input diet should be removed from the microbe models. Recommended for large datasets (default: false)
- OUTPUTS
init – status of initialization
netSecretionFluxes – Net secretion fluxes by microbiome community models
netUptakeFluxes – Net uptake fluxes by microbiome community models
Y – Classical multidimensional scaling
modelStats – Reaction and metabolite numbers for each model
summary – Table with average, median, minimal, and maximal reactions and metabolites
statistics – If info file with stratification is provided, will determine if there is a significant difference.
modelsOK – Boolean indicating if the created microbiome models passed verifyModel. If true, all models passed.
AUTHORS
Federico Baldini, 2017-2018
Almut Heinken, 07/20: converted to function
Almut Heinken, 01/21: adapted inputs
- mgSimResCollect(resPath, sampNames, exchanges, rDiet, pDiet, infoFilePath, netProduction, netUptake, figForm)[source]¶
This function is called from the MgPipe pipeline. Its purpose is to compute NMPCs from simulations with different diet on multiple microbiota models. Results are outputted as .csv and a PCoA on NMPCs to group microbiota models of individuals for similar metabolic profile is also computed and outputted.
- USAGE
[fSp, Y]= mgSimResCollect (resPath, sampNames, sampNames, rDiet, pDiet, infoFilePath, netProduction, figForm)
- INPUTS
resPath – char with path of directory where results are saved
sampNames – nx1 cell array cell array with names of individuals in the study
exchanges – cell array with list of all unique exchanges to diet/ fecal compartment that were interrogated in simulations
rDiet – number (double) indicating if to simulate a rich diet
pDiet – number (double) indicating if a personalized diet is available and should be simulated
infoFilePath – char indicating, if stratification criteria are available, full path and name to related documentation(default: no) is available
netProduction – cell array containing FVA values for maximal uptake
figForm – char indicating the format of figures
- OUTPUTS
netSecretionFluxes – cell array with computed NMPCs
netUptakeFluxes – cell array with computed uptake potential
Y – classical multexchangesimensional scaling
- microbiotaModelSimulator(resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)[source]¶
This function is called from the MgPipe pipeline. Its purpose is to apply different diets (according to the user’s input) to the microbiota models and run simulations computing FVAs on exchanges reactions of the microbiota models. The output is saved in multiple .mat objects. Intermediate saving checkpoints are present.
- USAGE
[exchanges, netProduction, netUptake, growthRates, infeasModels] = microbiotaModelSimulator (resPath, exMets, sampNames, dietFilePath, hostPath, hostBiomassRxn, hostBiomassRxnFlux, numWorkers, rDiet, pDiet, computeProfiles, lowerBMBound, upperBMBound, includeHumanMets, adaptMedium)
- INPUTS
resPath – char with path of directory where results are saved
exMets – list of exchanged metabolites present in at least one microbe model that can carry flux
sampNames – cell array with names of individuals in the study
dietFilePath – path to and name of the text file with dietary information Can also be a list of the sample names with individual diet files.
hostPath – char with path to host model, e.g., Recon3D (default: empty)
hostBiomassRxn – char with name of biomass reaction in host (default: empty)
hostBiomassRxnFlux – double with the desired upper bound on flux through the host biomass reaction (default: 1)
numWorkers – integer indicating the number of cores to use for parallelization
rDiet – boolean indicating if to simulate a rich diet
pDiet – boolean indicating if a personalized diet is available and should be simulated
computeProfiles – boolean defining whether flux variability analysis to compute the metabolic profiles should be performed.
lowerBMBound Minimal amount of community biomass in mmol/person/day enforced (default=0.4)
upperBMBound Maximal amount of community biomass in mmol/person/day enforced (default=1)
includeHumanMets – boolean indicating if human-derived metabolites present in the gut should be provexchangesed to the models (default: true)
adaptMedium – boolean indicating if the medium should be adapted through the adaptVMHDietToAGORA function or used as is (default=true)
- OUTPUTS
exchanges – cell array with list of all unique exchanges to diet/ fecal compartment that were interrogated in simulations
netProduction – cell array containing FVA values for maximal uptake and secretion for setup lumen / diet exchanges
netUptake – cell array containing FVA values for minimal uptake and secretion for setup lumen / diet exchanges
growthRates – array containing values of microbiota models objective function
infeasModels – cell array with names of infeasible microbiota models
- normalizeCoverage(abunFilePath, cutoff)[source]¶
This functions normalizes the coverage in a given file with organism coverages such that they sum up to 1 for each sample.
- USAGE
[normalizedCoverage,normalizedCoveragePath] = normalizeCoverage(abunFilePath,cutoff)
INPUT abunFilePath Path to table with not yet normalized relative
coverages
OPTIONAL INPUT cutoff Cutoff for normalized coverages that are
considered below detection limit, respective organisms will be removed from the samples (default: 0.0001)
OUTPUTS normalizedCoverage Table with normalized coverages normalizedCoveragePath Path to csv file with normalized coverages
- parsave(fname, data)[source]¶
Saves a data variable (e.g., model) from a parfor loop - might not work in R2105b
- USAGE
parsave (fname, data)
- INPUTS
fname – name of file
data – name of variable
- plotMappingInfo(resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)[source]¶
This function computes and automatically plots information coming from the mapping data as metabolic diversity and classical multidimensional scaling of individuals’ reactions repertoire. If the last 2 arguments are specified MDS plots will be annotated with samples and organisms names
- USAGE
Y = plotMappingInfo (resPath, patOrg, reacPat, reacTab, reacNumber, infoFilePath, figForm, sampNames, organisms)
- INPUTS
resPath – char with path of directory where results are saved
reac – nx1 cell array with all the unique set of reactions contained in the models
micRea – binary matrix assessing presence of set of unique reactions for each of the microbes
reacSet – matrix with names of reactions of each individual
reacTab – binary matrix with presence/absence of reaction per individual.
reacAbun – matrix with abundance of reaction per individual
reacNumber – number of unique reactions of each individual
infoFilePath – char indicating, if stratification criteria are available, full path and name to related documentation(default: no) is available
figForm – format to use for saving figures
sampNames – nx1 cell array cell array with names of individuals in the study
organisms – nx1 cell array cell array with names of organisms in the study
- OUTPUTS
Y – classical multidimensional scaling of individuals’ reactions repertoire
- retrieveModelStats(modelPath, modelList, abunFilePath, numWorkers, infoFilePath)[source]¶
This function retrieves statistics on the number of reactions and metabolites across microbiome models. If a file with stratification information on individuals is provided, it will also determine if reaction and metabolites numbers are significantly different between groups.
- USAGE
[modelStats,summary,statistics]=retrieveModelStats (modelPath, modelList, numWorkers, infoFilePath)
INPUTS modelPath: Path to models for which statistics should be retrieved modelList: Cell array with names of models for which statistics
should be retrieved
- abunFilePath: char with path and name of file from which to retrieve
abundance information
numWorkers: integer indicating the number of cores to use for parallelization
OPTIONAL INPUT: infoFilePath: char with path to stratification criteria if available
OUTPUT modelStats: Reaction and metabolite numbers for each model summary: Table with average, median, minimal, and maximal
reactions and metabolites
OPTIONAL OUTPUT: statistics: If info file with stratification is provided, will
determine if there is a significant difference.
- translateMetagenome2AGORA(MetagenomeAbundancePath, sequencingDepth, reconstructionResource)[source]¶
Translates organism identifiers in a published metagenomic or 16S rRNA data file with organism abundances (retrieved e.g., from MetaPhlAn) to AGORA pan-model IDs. This will not catch every case since the format of input files with abundance data greatly varies. Feel free to modify this function and submit a pull request to enable more input files to be translated to AGORA. Moreover, slight spelling variations in taxa across input files may result to taxa not being mapped. Check the unmappedRows output to identify these cases and modify the function accordingly. Pan-models that can be used to create microbiome models in mgPipe can be created with the function createPanModels. Consider running the function updateTaxonomyInfoAGORA to retrieve the most recent taxonomic assignment for the taxa to map. You might need to update your input file based on updated taxonomic classifications.
- USAGE
[translatedAbundances,normalizedAbundances,unmappedRows]=translateMetagenome2AGORA (MetagenomeAbundancePath,sequencingDepth,reconstructionResource)
- INPUT
MetagenomeAbundancePath String containing the path to csv file with – organism abundance data retrieved from 16S rRNA or metagenomic samples (example: ‘SRP065497_taxonomy_abundances_v3.0.tsv’).
- OPTIONAL INPUTS
- sequencingDepth Sequencing depth on the taxonomical level – in the input data (e.g., genus, species).
Allowed inputs are ‘Species’,’Genus’, ‘Family’,’Order’, ‘Class’, ‘Phylum’. Default: ‘Species’.
- reconstructionResource Name of the reconstruction resource to map
the abundances to. Allowed inputs are ‘AGORA’, ‘AGORA2’. Default: ‘AGORA’
- OUTPUTS
translatedAbundances Abundances with organism names from the – input file translated to AGORA pan-model IDs
normalizedAbundances Translated abundances normalized so they sum – up to 1 for each sample
unmappedRows Taxa on the selected taxonomical level that – could not be mapped to AGORA pan-models
- updateTaxonomyInfoAGORA()[source]¶
This function retrieves the newest taxonomy information for each AGORA strain from NCBI Taxonomy to facilitate mapping taxonomic assignments from metagenomic sequencing data to AGORA. An updated version of the AGORA information table is saved in spreadsheet format.