Additionalfunctions¶
- combineFluxResults(directory1, directory2, resultdirectory, set_regexp)[source]¶
This function merges & prunes the FBA solutions between two runs of the optimiseRxnMultipleWBMs.m function. Reaction fluxes, FBA statistics, & shadow prices get therefore concatenated. Note that in case the sample filenames differ from the standard, the regular expression needs to be adapted.
- USAGE
[dietInfo, dietGrowthStats] = ensureHMfeasibility (hmDirectory, Diet)
INPUTS directory1 [char array] Directory to flux solutions from the first run directory2 [char array] Directory to flux solutions from the second run resultdirectory [char array] Directory to empty folder where the combined fluxes
will be saved.
OPTIONAL INPUT set_regexp [char array] Specifying alternative regular expression in case
the sample filenames are different from their style than the standard optimiseRxnMultipleWBMs.m output.
Authors
Tim Hensen, 2024
modified by Jonas Widder, 10/2024 & 11/2024 (function can now also merge dirs with unequal number of samples + added set_regexp option)
- completeSpeciesFolder(agoraPath, panPath)[source]¶
Some strains in AGORA2 only have one strain. These strains are not moved to the pan model folder, resulting in some soecies not being captured in the models. By converting the strain reconstructions with only a single strain to the species folder, you can solve this problem.
- USAGE
completeSpeciesFolder (strainPath, panSpeciesPath)
- INPUTS
strainPath Path to folder with strain reconstructions
panSpeciesPath Path to folder with pan species models
- convertVMHIDName(metNames, VMHIDs, suggestSimilar)[source]¶
FOUR FUNCTIONS:
Retrieve metabolite IDs corresponding to the given metabolite names AND/OR
- Retrieve metabolite names corresponding to a given metabolite ID or some
reactions
- Convert metabolite tranport reactions e.g., DM_glc_d[bc] to D-Glucose or
EX_glc_D[c] to metabolite name e.g., D-Glucose.
- Suggest similar names for metabolite names provided that are
not found in the data base
- Inputs
metNames - EITHER – Cell array of metabolite names (strings) or metabolilte transport reactions (can be mixed)for which IDs are required or flag indicating to skip step (0).
metIDs - Cell array of metabolite IDs (strings) – required or flag indicating to skip step (0).
suggestSimilar - Flag indicating whether to generate suggestions (1) or not (0)
- Outputs
foundVMHIDs - Cell array of metabolite IDs corresponding to the input names.
foundMetNames - Cell array of metabolite Names corresponsing to the input IDs
similarMets - Cell array of possible matches for each unfound metabolite name, – when searching for names.
Other requirements: COBRA toolbox installation (and paths set)
EXAMPLE OF USE:
VMHIDs = {‘DM_gam[bc]’; ‘malttr’}; metNames = {‘D-glucose’, ‘fructose’, ‘carbon’}; % metNames = false; % VMHIDs = false; % suggestSimilar = false; suggestSimilar = true;
% [foundVMHIDs, foundMetNames, similarMets] = convertVMHIDName(metNames,VMHIDs, suggestSimilar);
Author: - Anna Sheehy & Tim Hensen - 18/07/2024
- dimensionalityReductionAndMultivariateAnalysis(measuresTable, metadataTable, varOfInterest, results_path, varargin)[source]¶
Dimensionality reduction of high-dimensional measures (e.g. microbiome relative abundances or reaction relative abundances) by RPCA following data preprocessing OR beta-diversity measures by PCoA, with the aim to:
1. Find whether there are general differences between groups of a metadata variable of interest (e.g. disease vs ctrl status), in case variable is categorical. 2. Identify variables from the measures (e.g. microbial taxa, reactions) which contribute the most to the first principle component (PC1) of RPCA & therefore its explained variance. 3. Perform linear regression on PC1 ~ metadata variable (e.g. Sex, disease vs Ctrl status) to find metadata variables which might be important confounders in follow-up analysis in case they are significantly correlated & explain a lot of the variance of PC1 from RPCA/PCoA.
- INPUTS
measuresTable – [table] Contains high-dimensional measures (e.g. microbiome relative abundances or reaction relative abundances), with columns = samples & rows = measured groups (e.g. taxa/reactions).
metadataTable – [table] Contains metadata information for samples (e.g. sex), with columns = variables (e.g. Sex) & rows = samples.
varOfInterest – [string] Variable (e.g. Sex or disease status) contained in metadata.
results_path – [string] Directory path, where results should be stored (figures & statistical results in spreadsheet format).
varargin
numLoadings – [numeric] Number of PC loadings which shall be displayed in plot of PC strongest feature contributions. Defaults to 15 loadings.
inputDataType – [chars/string] Specify whether data input is of type “abundance” or “betaDiversityMatrix”, which results in alternative processing routes (the input is treated case-insensitive). Defaults to “abundance”.
PCofInterest – [numeric] Principle component/principle coordinate of interest, which analysis will be performed on. Defaults to PC 1.
- OUTPUTS
In form of tables & plots into dir at results_path location.
Authors
Jonas Widder, 12/2024 & 01/2025
- downloadAGORA2(directory)[source]¶
Download and unpack agora2 INPUT directory Directory indicating where to donwload AGORA2
OUTPUT AGORA2_dir Directory to AGORA2 folder
Author: Tim Hensen, 2024
- filterMetabolitesNotPresentInWBMmodel(metabolitesOfInterest, WBM_compartment)[source]¶
Filters a table with metabolites for their presence in selected compartment(s) of the unpersonalized Harvey & Harvetta WBM models and returns both the present & absent metabolites in seperate tables. This process ensures that all metabolites of interest are actually present in the models & fluxes can be calculated for.
- findOptimalCoreCount(modelDir, solver)[source]¶
This function finds the optimal number of workers for the HM models being investigated INPUT: modelPath Path to folder with COBRA models OPTIONAL INPUT subSetSize Size of the random subset of models used for testing
OUTPUT fig Figure showing the average speedup factor for each tested
configuration of workers.
- generatePanAGORA2database()[source]¶
Create lookup file for checking which reactions and metabolites are present in which AGORA2 strains
OUTPUT lookupFilePath Path to the generated lookup file
Authors: Tim Hensen, 2024
- generatePanDatabase(inputDir)[source]¶
Create lookup file for checking which reactions and metabolites are present in which AGORA2 models
OUTPUT lookupFilePath Path to the generated lookup file
Authors: Tim Hensen, 2024
- generateStackedBarPlot(input_relAbundances, saveDir)[source]¶
Generates stacked bar plots from relative abundances of taxa for single or multiple samples.
- INPUTS
input_relAbundances – [table] Contains taxa and their relative abundances for all samples. Requires column ‘Taxon’ and one or more sample columns.
saveDir – [chars/string] Path to the directory where the stacked bar plot should be saved.
- AUTHOR:
Jonas Widder, 12/2024 & 01/2025
- getDirectorySize(dirPath)[source]¶
- ======================================================================================================#
Title: Directory disk use calculator Author: Wiley Barton Modified code sources:
- assistance and reference from a generative AI model [ChatGPT](https://chatgpt.com/)
clean-up and improved readability
Last Modified: 2025.01.29 Part of: Persephone Pipeline
- Description:
This function determines the size of a selected directory
- Inputs:
repoPathSeqC (char) : Path to the SeqC repository
outputPathSeqC (char) : Path for SeqC output
fileIDSeqC (char) : Unique identifier for file processing
procKeepSeqC (logical) : Keep all files (true/false)
maxMemSeqC (int) : Maximum memory allocation for SeqC
maxCpuSeqC (int) : Maximum CPU allocation for SeqC
maxProcSeqC (int) : Maximum processes for SeqC
debugSeqC (logical) : Enable debug mode (true/false)
…
- Dependencies:
MATLAB
Docker installed and accessible in the system path
======================================================================================================#
- getPanSpeciesMetProdCapacity[source]¶
Create lookup file for checking which reactions and metabolites are present in which AGORA2 taxa
OUTPUT lookupFilePath Path to the generated lookup file
Authors: Tim Hensen, 2024
- getVMHID(mets, suggest)[source]¶
getVMHID - Retrieve metabolite IDs corresponding to the given metabolite names.
- Inputs
mets - Cell array of metabolite names (strings)
suggest - Flag indicating whether to generate suggestions (1) or not (0)
- Outputs
metIDs - Cell array of metabolite IDs corresponding to the input names.
suggestedMets - Cell array of possible matches for each unfound metabolite name.
Example
metaboliteNames = {‘glucose’, ‘fructose’}; [metIDs, suggestedMets] = getVMHID(metaboliteNames, 1);
Other requirements: COBRA toolbox installation and initialisation
Author: - Anna Sheehy - 16/07/2024
- microbiomeMappingStats(rawPath, marsPath, saveDir, metadataPath)[source]¶
Function for obtaining statistics on AGORA2 mapping
INPUT rawPath: path to the unfiltered microbiome data marsPath: path to mapped microbiome data saveDir: path to folder where the results are saved
- physiologicalConstraintsHMDBbasedTEMP(model, IndividualParameters, ExclList, Type, InputData, Biofluid, setDefault, ExclMet, ExclMetAbbr)[source]¶
This function applies constraints to the whole-body metabolic model metabolite concentrations have to be given in uM organ weights have to be given in g Please note that reaction specific constraints are applied at the end of the function, which have been derived from the literature.
function modelConstraint = physiologicalConstraintsHMDBbased(model,IndividualParameters, ExclList, Type, InputData, Biofluid, setDefault,ExclMet,ExclMetAbbr)
INPUT model model structure IndividualParameters Structure containing physiological parameters,
as generated in standardPhysiolDefaultParameters
- ExclList List of reaction(s) to which no updated bound
should be assigned to
- Type Input type (either ‘xlsx’ (default) –> loads by default
‘Parsed_hmdbConc.xlsx’ or ‘direct’). If ‘direct’ InputData must be provided
- InputData first column corresponds to vmh id’s of
metabolites, 2nd to data points (will be set as lb and ub)
- Biofluid ‘all’ (default if type is xlsx). For direct:
‘bc’,’u’,’csf’
- setDefault If input data does not contain concentration information for a given metabolite
then a default concentration ranges will be used to calculate the constraints (default: 1) Note that the default metabolite concentration ranges are specified in IndividualParameters for the different biofluid compartments.
- ExclMet Specify if certain metabolites, and thus their associated reactions, should be
excluded from the constraint application (default: 0)
- ExclMetAbbr Provide list of metabolites that should be
excluded
OUTPUT modelConstraint model structure with updated constraints
Ines Thiele, 2015-2019
- plotAbsentTaxaEffectOnMARScoverage(mars_preprocessedInput, absentTaxa_abundanceMetrics, readCounts, results_path, varargin)[source]¶
Based on MARS mapping input, this function generates a plot which visualizes how much of an effect the addition of currently unmapped taxa to the microbiome community model would have in terms of read coverage, starting from the most abundant taxa.
- INPUTS
mars_preprocessedInput – [table] MARS output “preprocessed_input” which contains read counts per pre-mapped taxa.
absentTaxa_abundanceMetrics – [table] MARS output listing all unmapped taxa together with summary statistics on their relative abundance across samples (mean relative abundance is of importance for the function).
readCounts – [table] Original data table containing read counts per taxa.
results_path – [string] Directory path, where results should be stored (figure).
numAbsentTaxaToInvestigate – [numerical] Number of unmapped taxa whose effect should be tested for & plotted. Optional, defaults to the full list of all unmapped taxa.
Authors
Jonas Widder, 11/2024 & 01/2025
- runStatisticsOnModerationAnalysisResults(data, metadata, formula, regressionResults, moderationThreshold_usePValue, moderationThreshold, saveDir)[source]¶
Filters regression results from moderation analysis for significantly correlating metabolites fluxes/bacterial taxa. Then stratifies the filtered flux/rel. abundances data for the moderator & performs new statistical analysis on the stratified data. Notes: The moderator needs to be categorical.
- INPUTS
data – [Table] Processed flux/relative abundances data.
metadata – [Table] Metadata containing ID & pot. additional variables (confounders, moderators)
formula – [String] Regression formula in Wilkinson notation.
regressionResults – [Struct] Structure containing tables for flux & rel. abundances regression results.
moderationThreshold_usePValue – [Boolean] Cutoff threshold being either FDR or pValue. Default = true.
moderationThreshold – [Numerical] Cutoff threshold for maximal FDR value from moderation analysis a metabolite/bacterial taxa needs to pass that it will be included in subsequent analysis of stratified fluxes/taxa. Default = 0.05 (5%).
saveDir – [Character array] Path to working directory.
- OUTPUT
statResults – [Struct] Structure containing tables for regression results for moderator stratified data of significant hits from initial from moderation analysis regressions. Will be empty, if regression does not contain Flux or relative abundance.
- AUTHOR:
Jonas Widder, 11/2024
- slimDownFBAresults(FBAsolutionDir)[source]¶
This function prunes FBA solution results obtained in optimiseRxnMultipleWBM.m and saves the slimmed down solution results in a new folder. The function first creates a new folder and generates paths for the flux results in that folder. Then, only the following data is loaded: ‘rxns’,’ID’,’sex’,’f’, and’stat’. If microbiome data was available: ‘speciesBIO’,’shadowPriceBIO’, and ‘relAbundances’. Then, the solutions are saved to the new paths.
INPUT FBAsolutionDir Character array with path to FBA solutions.
OUTPUT smallFBAsolutionPaths Path to slimmed down FBA results
AUTHOR: Tim Hensen, October 2024