Persephone

analyseWBMs(mWBMPath, fluxPath, rxnList, varargin)[source]

analyseWBMs predicts the optimal fluxes for a list of user-defined reactions (rxnList). All predicted are further described in analyseWBMsol.m.

USAGE:

[FBA_results, pathsToFilesForStatistics] = analyseWBMs (mWBMPath, fluxPath, rxnList)

INPUTS: mWBMPath Path (character array) to WBMs fluxPath Path to directory where the results will be stored rxnList Cell array of VMH metabolites to investigate.

Example: rxnList = {‘DM_trp_L[bc], DM_met_L[bc],’Brain_trp_L[c], Heart_met_L[x]}. Note that demand reactions are automatically added if they are not present in the models.

OPTIONAL INPUTS rxnSense Character array containing either ‘max’ or ‘min’

to specify the sense of the objective. Option to specify differently for each objective- to do so provide character array the exact length of rxnList. (OPTIONAL, Default = ‘max’).

numWorkersOptimization Number of workers that will perform FBA in parallel. Note

that more workers does not necessarily make the function faster. It is generally not recommended to set numWorkersOptimization equal to the number of available cores (see: feature(‘numCores’)) as linear solvers can already support multi-core linear optimisation, thus resulting in unnessessary overhead. On computers with 8 cores or less, it is recommended to set numWorkersOptimization to 1. On a computer with 36 cores, an optimal configuration of numWorkersOptimization=6 was found.

saveFullRes Boolean (true/false) indicating if all the complete .v, .y.

, and .w vectors are stored in the result. Default = true. It is recommended to set saveFullRes

paramFluxProcessing Structured array with optional parameters:

.NumericalRounding defines how much the predicted flux values are rounded. A defined value of 1e-6 means that a flux value of 2 + 2.3e-8 is rounded to 2. A flux value of 0 + 1e-15 would be rounded to exactly zero. This rounding factor will also be applied to the shadow price values. If microbiome relative abundance data is provided, the same rounding factor will be applied to the relative abundance data.

Default parameterisation: - paramFluxProcessing.NumericalRounding = 1e-6;

Example: - paramFluxProcessing.NumericalRounding = 1e-6;

paramFluxProcessing.NumericalRounding = 1e-6;

.RxnRemovalCutoff defines the minimal number of samples for which a unique reaction flux could be obtained, before removing the reaction for further analysis. This parameter can be expressed as * fraction: the fraction of samples with unique values, * SD: the standard deviation across samples, and * count: the counted number of unique values. If microbiome relative abundance data is provided, the same removal cutoff factor will be applied to the relative abundance data.

Default parameterisation: - paramFluxProcessing.RxnRemovalCutoff = {‘fraction’,0.1};

Examples: - paramFluxProcessing.RxnRemovalCutoff = {‘fraction’,0.1}; - paramFluxProcessing.RxnRemovalCutoff = {‘SD’,1}; - paramFluxProcessing.RxnRemovalCutoff = {‘count’,30};

paramFluxProcessing.RxnRemovalCutoff = {‘fraction’,0.1};

.RxnEquivalenceThreshold defines the minimal threshold of when functionally identical flux values are predicted, and are thus part of the same linear pathways. The threshold for functional equivalence is expressed as the R2 (r-squared) value after performing a simple linear regression between two reactions.

Default parameterisation: - paramFluxProcessing.RxnEquivalenceThreshold = 0.999;

Example: - paramFluxProcessing.RxnEquivalenceThreshold = 0.999;

paramFluxProcessing.RxnEquivalenceThreshold = 0.999;

.fluxMicrobeCorrelationType defines the method for correlating the predicted fluxes with microbial relative abundances. Note that this metric is not used if mWBMs are not present. The available correlation types are: * regression_r2: the R2 (r-squared) value from pairwised linear regression on the predicted fluxes against microbial relative abundances. * spearman_rho: the correlation coefficient, rho obtained from pairwise Spearman nonparametric correlations between predicted fluxes and microbial relative abundances.

Default parameterisation: - paramFluxProcessing.fluxMicrobeCorrelationMetric = ‘regression_r2’;

Examples: - paramFluxProcessing.fluxMicrobeCorrelationMetric = ‘regression_r2’; - paramFluxProcessing.fluxMicrobeCorrelationMetric = ‘spearman_rho’;

fluxAnalysisPath: Character array with path to directory where all

results will be saved (Default = pwd)

Solver: Validated solvers: ‘cplex_direct’,’ibm_cplex’

‘tomlab_cplex’, ‘gurobi’, ‘mosek’

analyseGF: Boolean indiciating whether or not to investigate

GF models. In the case of personalisation, a germ free iWBM will be created for every sample, this can results in long computation times as there are double the number of models to solve. If you are not interested in solving a germ-free iWBM for each sample, you can set to false. Default is true. When personalisation is skipped, only one germ-free model is made for each sex.

analyseWBMsol(fluxPath, paramFluxProcessing, fluxAnalysisPath, analyseGF)[source]

analyseWBMsol loads the FBA solutions produced in analyseWBMs.m, prepares the FBA results for further analyses, and produces summary statistics into the flux results.

The function contains six parts:
  • PART 1: Loading the flux solutions and corresponding shadow prices

  • PART 2: Converting the microbial biomass shadow prices

associated with each optimised reaction flux to human readable tables and producing statistics on microbial influences on the flux results. - PART 3: Isolate the microbial component of the fluxes by subtracting the germ-free fluxes in a sex specific manner. In this part, descriptive statistics are also obtained on the fluxes across samples and metabolites are removed based on a user defined threshold (see thresholds input) - PART 4: In this part, microbe-flux associations defined by the species biomass shadow prices (PART 2) are further quantified by performing Spearman correlations on the fluxes and microbe relative abundances. - PART 5: Reactions with identical or near identical fluxes across samples (user defined) are grouped and collapsed into a single result. - PART 6: Save all results

USAGE:

processedFluxResPaths = analyseWBMsol (fluxPath,paramFluxProcessing, fluxAnalysisPath)

INPUTS: fluxPath Character array with path to .mat files produced in

optimiseRxnMultipleWBM.m

paramFluxProcessing Structured array with optional parameters:

.numericalRounding defines how much the predicted flux values are rounded. A defined value of 1e-6 means that a flux value of 2 + 2.3e-8 is rounded to 2. A flux value of 0 + 1e-15 would be rounded to exactly zero. This rounding factor will also be applied to the shadow price values. If microbiome relative abundance data is provided, the same rounding factor will be applied to the relative abundance data.

Default parameterisation: - paramFluxProcessing.numericalRounding = 1e-6;

Example: - paramFluxProcessing.numericalRounding = 1e-6;

paramFluxProcessing.numericalRounding = 1e-6;

.rxnRemovalCutoff defines the minimal number of samples for which a unique reaction flux could be obtained, before removing the reaction for further analysis. This parameter can be expressed as * fraction: the fraction of samples with unique values, * SD: the standard deviation across samples, and * count: the counted number of unique values. If microbiome relative abundance data is provided, the same removal cutoff factor will be applied to the relative abundance data.

Default parameterisation: - paramFluxProcessing.rxnRemovalCutoff = {‘fraction’,0.1};

Examples: - paramFluxProcessing.rxnRemovalCutoff = {‘fraction’,0.1}; - paramFluxProcessing.rxnRemovalCutoff = {‘SD’,1}; - paramFluxProcessing.rxnRemovalCutoff = {‘count’,30};

paramFluxProcessing.rxnRemovalCutoff = {‘fraction’,0.1}; RxnEquivalenceThreshold .rxnEquivalenceThreshold defines the minimal threshold of when functionally identical flux values are predicted, and are thus part of the same linear pathways. The threshold for functional equivalence is expressed as the R2 (r-squared) value after performing a simple linear regression between two reactions.

Default parameterisation: - paramFluxProcessing.rxnEquivalenceThreshold = 0.999;

Example: - paramFluxProcessing.rxnEquivalenceThreshold = 0.999;

paramFluxProcessing.rxnEquivalenceThreshold = 0.999;

.fluxMicrobeCorrelationMetric defines the method for correlating the predicted fluxes with microbial relative abundances. Note that this metric is not used if mWBMs are not present. The available correlation types are: * regression_r2: the R2 (r-squared) value from pairwised linear regression on the predicted fluxes against microbial relative abundances. * spearman_rho: the correlation coefficient, rho obtained from pairwise Spearman nonparametric correlations between predicted fluxes and microbial relative abundances.

Default parameterisation: - paramFluxProcessing.fluxMicrobeCorrelationMetric = ‘regression_r2’;

Examples: - paramFluxProcessing.fluxMicrobeCorrelationMetric = ‘regression_r2’; - paramFluxProcessing.fluxMicrobeCorrelationMetric = ‘spearman_rho’;

fluxAnalysisPath

Character array with path to directory where all results will be saved.

AUTHOR:
  • Tim Hensen, July 2024

  • Jonas Widder, 11/2024 (small bugfixes)

configTemplatePersephone[source]

Configuration file %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

createBatchMWBM(mgpipePath, mWBMPath, metadataPath, varargin)[source]

This function creates personalised host-microbiome WBM models (mWBMs) by joining microbiome community models with unpersonalised WBM models in a sex-specific manner. The models are parameterised on a predefined diet.

USAGE:

[modelStats, summaryStats, dietInfo, dietGrowthStats] = createBatchMWBM (mgpipePath, mWBMPath, metadataPath)

INPUTS mgpipePath: Path to microbiome community models created by the

microbiome modelling toolbox.

mWBMPath Path to directory where the HM models are

saved

OPTIONAL INPUTS Diet Diet option: ‘EUAverageDiet’ (default) numWorkersCreation Number of cores used for parallelisation.

Default = 4.

checkFeasibility Flag (true/false) to run the

ensureHMfeasibility.m function and check if the generated models can grow on the diet. Default = true.

wbmDirectory Path to directory with user defined WBMs.

This function supports one user adapted male and one user adapted female WBM or personalised germ free WBMs for each sample with a microbiome community model. Default is empty. If wbmDirectory is empty, Harvey/Harvetta version 1.04c are used.

OUTPUTS modelStats Table with summary statistics on the generated WBMs:

gender, number of reactions, metabolites,constrainsts, and taxa.

summaryStats Table with the mean and SD of the model

statistics in the modelStats variable.

dietInfo Table with diet growth information.

models can grow on the given diet.

dietGrowthStats Table with statistics on

Authors: Tim Hensen, 2023, 2024

createForestPlot(estimates, ci, names, pValues, plotTitle, xTitle, hideLegend)[source]

createForestPlot generates a forest plot to display confidence intervals for estimates.

USAGE:

createForestPlot (estimates, ci, names, pValues, plotTitle, xTitle, hideLegend)

INPUTS:
  • estimates - Vector of estimates (e.g., effect sizes or log fold changes)

  • ci - Matrix of confidence intervals [n x 2] with lower and upper bounds in columns.

  • names - Cell array of labels for each data point.

  • pValues - Vector of p-values corresponding to each estimate.

  • plotTitle - String, title of the plot.

  • xTitle - String, label for the x-axis (typically “Effect Size” or “Log Fold Change”)

  • hideLegend - Logical, if true, hides the legend (default is false)

createMWBM(microbiota_model, WBM_model, Diet, saveDir)[source]

This function creates personalised host-microbiome WBM models by joining microbiome community models with unpersonalised WBM models. The models are parameterised on a predefined diet.

Example: modelHM = createMWBM(microbiota_model, WBM_model, ‘EUAverageDiet’)

INPUTS microbiota model: Microbiome community model created by the

microbiome modelling toolbox.

WBM_model: Harvey or Harvetta whole-body metabolic

model

Diet Diet option: ‘EUAverageDiet’ (default)

Output modelHM: Personalised host-microbiome WBM model

Authors: Tim Hensen, Bronson Weston, 2022

Tim Hensen, expanded WBM parameterisation July 2024 Tim Hensen, improved mWBM annotation November 2024

createVolcanoPlot(estimates, pValues, names, plotTitle, xTitle, yTitle)[source]

createVolcanoPlot generates a volcano plot to visualize the relationship between regression estimates and p-values.

USAGE:

createVolcanoPlot (estimates, pValues, names, plotTitle, xTitle, yTitle)

INPUTS:
  • estimates - [vector] Regression estimates for each data point (e.g., effect sizes or log fold changes)

  • pValues - [vector] p-values corresponding to each estimate.

  • names - [cell array] Labels or names for each data point.

  • plotTitle - [string] Title of the plot.

  • xTitle - [string] Label for the x-axis (e.g., “Effect Size” or “Log Fold Change”)

  • yTitle - [string] Label for the y-axis (e.g., “-log10(p-value)”)

OUTPUTS:

None - This function generates a volcano plot in the current figure.

Example

createVolcanoPlot(estimates, pValues, {‘Metabolite1’, ‘Metabolite2’}, …

‘Volcano Plot’, ‘Log Fold Change’, ‘-log10(p-value)’)

ensureWBMfeasibility(mWBMPath, varargin)[source]

This function checks for the WBMs in the specified mWBMPath to grow on the given diet, finds a diet that makes all WBMs feasible and propagates this diet onto the models. The functions works as follows: 1) Load the WBMs from the directory and test which models are feasible on the diet. 2) If any WBM is not feasible, collect all infeasible WBMs, open all diet reactions, and test if the WBMs can grow on any diet. If one or more WBMs cannot grow when opening all diet reactions, the statistics are collected and the function stops. The user will need to debug the WBMs manually. 3) If all previously infeasible WBMs can grow when all diet reactions are opened,missing diet components are searched using getMissingDietPersephone. If getMissingDietPersephone cannot find a diet after one function call, this function is run again until a hard limit of 10 iterations is reached. If this limit is reached, no feasible diet can be found for all models and the user will need to manually debug the infeasible models. If missing diet components are found in one WBMs, these component are automatically propagated to the other WBMs infeasible on the original diet before searching for more missing diet components 4) If a new diet is found for which all previously infeasible WBMs can grow, the updated diet is propagated to all models in the mWBMPath. 5) The updated diet is stored as an output variable for this function and for each model, its feasibility on the original and updated diet is recorded.

USAGE:

[dietInfo, dietGrowthStats] = ensureHMfeasibility (mWBMPath, Diet)

INPUTS mWBMPath Path to directory where the HM models are saved Diet Diet option: ‘EUAverageDiet’ (default)

OUTPUTS dietInfo Structured variable containing the missing diet

components and the updated diet propagated to all models. This variable is empty if no updated diet could be found.

dietGrowthStats Table indicating which WBMs could initially not grow on

the diet, which WBMs could not grow on any diet, and which WBMs could not grow on an updated diet.

Authors

  • Tim Hensen, 2024

  • Bram Nap, 07-2025 remove parfor functionality. Time save is not

massive and allows the function to be used for other purposes besides Persephone.

fdrBHadjustment(p)[source]
IMPORTANT; This function was partially generated by chatGPT o3-mini.

The code is manually tested and altered with bug fixes. This function performs identical to running q = mafdr(p, ‘BHFDR’, true) using the Matlab bioinformatics toolbox.

FDR_BH Performs Benjamini-Hochberg FDR correction on a vector of p-values.

q = FDR_BH(p) returns the FDR-corrected p-values q for the input vector p.

Input:

p - a vector of p-values (assumed to be in the interval [0, 1])

Output:

q - a vector of FDR corrected p-values, in the same order as p.

The function implements the procedure described in:

Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing.

Example:

pvals = [0.01, 0.04, 0.03, 0.002, 0.07]; qvals = fdr_bh(pvals);

findPvalCategories(pValues)[source]

This function inputs a list of p values and outputs the variables pValueGroups and colourGroups. A maximum of three groups are defined, FDR<0.05, P<0.05, and P>0.05.

USAGE:

[groupsToPlot, colours] = findPvalCategories (pValues)

INPUT pValues

OUTPUTS groupsToPlot Table that categorises the indices of the p-value list. colours Cell array with numerical encodings for the

category-associated colours.

generateStackedBarPlot_PhylumMARScoverage(input_relAbundances_preMapping, input_relAbundances_postMapping, saveDir, varargin)[source]

Generates stacked bar plots from mean relative abundances of phyla pre- & post-mapping to a model-database in MARS.

INPUTS:
  • input_relAbundances_preMapping – [chars/string] Path to table which contains phyla taxa and their mean relative abundances across samples pre-mapped to a model-database (original dataset). This spreadsheet is a standard output generated in MARS.

  • input_relAbundances_postMapping – [chars/string] Path to table which contains phyla taxa and their mean relative abundances across samples post-mapped to a model-database (filtered dataset). This spreadsheet is a standard output generated in MARS.

  • saveDir – [chars/string] Path to the directory where the stacked bar plot should be saved.

  • mappingDatabase_name – [chars/string] The name of the model-database used for mapping in MARS, which will be displayed in the title of the graph. Defaults to ‘’.

AUTHOR:
  • Jonas Widder, 12/2024 & 01/2025

getMissingDietPersephone(inputModel, missingDietComponents, testInitialFeasibility)[source]

This function determines missing dietary compounds in host-microbiome WBMs that are infeasible (it assumes that the WBMs and the microbiome models individually are feasible). Alternatively, this function can determine missing dietary compounds in microbiome community models This function first tests whether the model is feasible (which can be skipped by setting testInitialFeasibility = 0), if infeasible, all diet exchange reactions that are not already active will be opened and the feasibility will be tested. If infeasible, the function stops - then there is no dietary solution (of course it could be that active diet constraints are limited but they will not be tested with this script). Subsequently, diet exchanges are randomly closed (first in batches of 50, then 10, then 5, then 1). If the inputModel remains feasible this batch as well as all other 0 diet fluxes in the fba solution will be deemed not necessary for feasibility, if infeasible the batch set will be kept and tested for in the next step. when the batch set size is 1, each remaining diet exchange will be tested for individually. While still being slow, this approach is much faster then testing each diet exchange individually. As the diet exchanges are selected randomly for each batch, running the function twice may not result in the same final set of missingDietComponents

INPUT inputModel Host-microbiome or microbiome community model structure missingDietComponents list of diet exchange reactions that are known

to be missing or that have been identified in a previous run of this function

testInitialFeasibility default 1

OUTPUT missingDietComponents list of missingDietComponents. If

missingDietComponents was given as an input then this list is a combination of the input and the newly discovered missingDietComponents

Ines Thiele, Nov 2023 Tim Hensen, September 2025. Added support for microbiome community models.

initPersephone(resultPath, paths)[source]
Initializes the Persephone pipeline by:
  • Checking for MATLAB toolbox dependencies.

  • Processing and validating the metadata file.

  • Cross-referencing metadata with microbiome data (if provided).

  • Setting up directory structures for Persephone results.

USAGE:

[initialised, paths, statToolboxInstalled] = initPersephone (resultPath, metadataPath, readsTablePath)

INPUTS:
  • resultPath - (char)

  • metadataPath - (char) Path to the metadata file (.csv or .xlsx)

  • readsTablePath - (char)

OUTPUTS:
  • initialised - (logical)

  • paths - (struct)

  • statToolboxInstalled - (logical)

  • updatedMetadataPath - (char)

REQUIREMENTS:
  • MATLAB R2020b or newer.

  • Parallel Computing Toolbox (mandatory).

  • Statistics and Machine Learning Toolbox (recommended for analysis).

Notes

  • Metadata must contain a column of sample IDs and a column of sex information.

  • Metadata and reads table will be synchronized to ensure consistent samples across analyses.

  • Any discrepancies in variable naming (e.g., “ID”, “Sex”) are automatically resolved.

  • Processed metadata is saved to an updated file.

Example

resultPath = ‘./results’; metadataPath = ‘./data/metadata.csv’; readsTablePath = ‘./data/readsTable.csv’; [initialised, paths, statToolboxInstalled] = initPersephone(resultPath, metadataPath, readsTablePath);

AUTHOR:
  • Tim Hensen, January 2025

  • Bram Nap, July 2024: Automatic creation of output folders

loadMinimalWBM(modPath)[source]

This function loads the smallest possible combination of WBM model fields needed to perform FBA. Loading the minimal model can decrease loading times by 6X.

Author: Tim Hensen, November 2024

performRegressions(data, metadata, formula, exponentiateLogOdds)[source]

This functions performs linear or logistic regressions on either flux data or microbial relative abundances. The function supports control variables & moderators in the regression formulae.

USAGE:

results = performRegressions (data,metadata,formula)

INPUT data: Table with flux or microbiome data. The first column must be the ID

column.

metadata: Metadata table. The first column must be the ID column. formula: Must contain either “Flux or “relative_abundance” as predictor. exponentiateLogOdds: Logical variable on if a logistic regression estimate should be exponentiately to obtain the odds ratios.

performStatsPersephone(statPath, pathToProcessedFluxes, metadataPath, response, varargin)[source]

The performStatistics function is part of the PSCM toolbox. This function performs regression analyses on processed fluxes and gut microbiome relative abundance data in large cohort studies. This function needs a minimal sample size of 50 samples. This function performs different statistical analyses depending on the user defined inputs.

If the response variable is binary and no confounders are given, wilcoxon tests will be performed for the fluxes and microbiome abundances. If the response variable is binary and confounders are given, multiple logistic regressions will be performed. If the response variable is continuous and no confounders are given, simple linear regressions will be performed. If the response variable is continuous and confounders are given, multiple linear regressions will be performed.

The fluxes are z-transformed before statistical testing. The microbiome relative abundances are log transformed before testing. If a moderator variable is given and a regression is performed, a moderation analysis on the moderator with the predictor will be performed.

USAGE:

performStatistics (statPath, pathToProcessedFluxes, pathToWbmRelAbundances, metadataPath, response, confounders, moderator, microbeCutoff,threshold)

INPUT statPath Path (character array) to working directory pathToProcessedFluxes Path to processed flux data pathToWbmRelAbundances Path to microbial relative abundances metadataPath Path to metadata file response Name of response variables (char or string)

OPTIONAL INPUTS confounders Cell array with the names of confounding variables to

be included. Default = empty.

microbeCutoff Cutoff threshold for the number of samples a

microbe needs to present to be analysed. Default = 0.1. (10%)

persWBM(metadata, varargin)[source]

This function takes a table of physiological parameters for an individual or multiple individuals (listed in inputs) and adjusts the paramteres of a provided WBM or Harvey/Harvetta to create a personalised WBM

The calculation of physiological parameteres will be performed based on the available data e.g., if the user provides Cardiac output, this given value is assigned, if the user provides stroked volume and no CO, CO is calculated based on SV. If neither values are provided, CO will be calculated based on XXX. All details of parameter calculation are detailed in the personalised model output and in the excel file provided by this function.

INPUTS

REQUIRED: metadata Can be a struct (for personalising one model)

or a path to an excel file (for batch personalisation). If it is a structure, it should follow the format of the individualised parameters struct obtained from running: > sex = “male”; OR > sex = “female”; AND > standardPhysiolDefaultParameters;

persWBMmetabolomics(sex, metabolomicParams, varargin)[source]

This function takes a table of metabolomic parameters for an individual or multiple individuals (listed in inputs) and adjusts the paramteres of a provided WBM or Harvey/Harvetta to create a personalised WBM

The calculation of metabolomic parameteres will be performed based on the available data

INPUTS

REQUIRED: metabolomicParamters Can be a cell array or a table or a path to an

excel file. In any case, this should contain a list with a minimum of three columns and option for additional columns for each individual for which a model should be created:

|”compartment” |”blood” |”urine” | |”unit” |”mg/dL” | “mg/dL” | |”Individual1” |180 |50 | |”Individual2” |150 |66 |

Sex must be provided for each model. The compartment must be specified and must be one of the following: csf, u, blParameters that can be personalised include: All of those metabolites which are found in the *for each the default unit used by the model is provided. For x, y, z- unit conversion will be performed for all known common units of measurement.

OPTIONAL: WBM/(s) User can provide a WBM (Whole Body Metabolic Model) or a path to multiple models.

If no model is provided, either Harvey or Harvetta will be loaded from the COBRA toolbox, depending on sex in provided physiological data. If multiple models are provided, the model ID must match the model name provided in the physiological data.

resPath Path on which to store personalised model and

other outputs. Default = current directory

Diet Diet in the form of text file or named .mat

file from the COBRA toolbox (default = EUAverageDiet) ** This function can also be run in isolation but if a user wants to personalise both physiological and metabolomic- this function should alway be run and NOT persWBMmetabolomics in isolation!

OUTPUTS

iWBM Model with updated physiological paramteres

(stored as “persModelName.mat”). All updated paramteres are described in model.IndividualisedParameters

controlWBM This is a WBM with no personalised

adjustments. When a WBM or multiple WBMs have been given as inputs, the controlWBMs are exact copies of those. When Harvey or Harvetta are used, controlWBM is copies of Harvey/Harvetta.

persParameters Excel file with details of the updated

parameter and how it was calculated

author: Anna Sheehy November 2024 Step One: Read in the available data, check all data is valid Define the input parser

readMetadataForPersephone(metadataPath)[source]
DESCRIPTION:

This function loads a metadata file into a MATLAB table. It accounts for: - Preserving the full variable names by disabling automatic truncation. - Capturing variable units if provided in the second row of the file. - Storing the captured units in the ‘VariableUnits’ property of the table.

USAGE:

metadata = readMetadataForPersephone (metadataPath)

INPUTS:

metadataPath – A string specifying the path to the metadata file (in CSV or XLSX format).

OUTPUTS:

metadata – A MATLAB table containing the processed metadata. Variable units, if present in the second row of the file, are stored in the table’s ‘VariableUnits’ property.

Notes

  • If variable names exceed 64 characters, they are truncated, but the true variable names are preserved in the ‘VariableDescriptions’ property.

  • Warnings related to variable name truncation are suppressed to avoid unnecessary console output.

AUTHORS

Tim Hensen, January 2025

runMars(readsTablePath, varargin)[source]

This function processes microbiome taxonomy and read abundance data and maps microbial species on a microbial reconstruction database, such as AGORA2 and APOLLO.

INPUTS:
  • readsTablePath – String; path to the reads abundance file. If taxonomic assignment is not present in this file, provide taxonomy in taxaTablePath.

  • cutoffMars – Numeric; value under which individual taxa relative abundances are considered to be zero. Optional, defaults to 1e-6

  • flagLongSpecies – Boolean; indicates if the genus name is in the name of the species. E.g., if the species name is Prevotella copri, set to false. If the species name is copri set to true. Optional, defaults to true.

  • taxaDelimiter – String; delimiter used to separate taxonomic levels. Optional, defaults to ;

  • removeClade – Boolean; specifies to remove clade name extensions from all taxonomic levels of microbiome taxa. If set to false, MARS might find significantly less models in AGORA2 and APOLLO databases, as clade extensions are not included there. Optional, defaults to true.

  • reconstructionDb – String; defining if AGORA2, APOLLO, a combination of both (full) or a user-defined database should be used as the model database to check presence in. Allowed Input (case-insensitive): “AGORA2”, “APOLLO”, “full_db”, “user_db”. Optional, defaults to full_db.

  • userDbPath – String; The path to the user database if reconstructionDb is set to user_db. Optional, defaults to ‘’.

  • sampleReadCountCutoff – Numeric; value for total read counts per sample under which samples are excluded from analysis. If the reads table is already normalised, set this value to 0 or 0.1 to ensure that samples are not removed.

  • taxaTablePath – String; path to the file where taxonomies are matched to taxonomic unit (OTU/ASV etc.) This requires that the taxonomic unit is present in the reads table as well in order to match the taxonomy with the reads. Ensure that the column with the taxonomic unit has the same header in both the reads table and taxonomy table and that the column with the taxonomies in the taxonomy table is called ‘taxon’. Optional, defaults to ‘’.

  • outputPathMars – String; path to the directory where the output of MARS is stored. Optional, defaults to [pwd, filesep, ‘resultMars’].

  • calculateBrayCurtis – Boolean; Specifies if the Bray-Curtis dissimilarity index is calculated. Caution! Putting this to true can greatly increase the time MARS needs to run. Defaults to false.

  • compoundedDatabase – Boolean, indicates if the database reads are compounded or not. E.g., if the total number of reads for an order o__x is the same as the genus g__x. This means the reads assigned to g__x are in the number of reads of o__x. This requires a different data handling as compared where o__x only has reads that could be assocated up to that level. Defaults to false.

OUTPUTS:
  • The function does not return variables but writes processed results

  • to the specified output directory in the MARS pipeline.

AUTHOR: Tim Hensen, July 2025

Bram Nap, August 2025

runNonparametricTests(data, metadata, predictor, response)[source]

Performs non-parametric statistical tests (Wilcoxon or Kruskal-Wallis) on metabolic data

USAGE:

resultTable = runNonparametricTests (data, metadata, predictor, response)

INPUTS:
  • data – (table) m x n table containing: * m samples * n reactions/taxa with flux or abundance values * First column must contain sample IDs

  • metadata – (table) Sample metadata containing: * Sample IDs in first column * Response variable in separate column

  • predictor – (char) Name of predictor variable (‘Flux’ or ‘relative_abundance’)

  • response – (char) Name of response variable in metadata

OUTPUTS:

resultTable – (table) Statistical results with columns: * name: Reaction/Taxa identifier * groups: Cell array of group names * n: Array of group sizes * statistic: Test statistic (Wilcoxon or Kruskal-Wallis) * p-value: Unadjusted p-value * FDR: Benjamini-Hochberg adjusted p-value * effectSize: Effect size (r for Wilcoxon, η² for Kruskal-Wallis) * confidence: 95% confidence interval (only for binary comparisons)

Example

% Binary comparison (Wilcoxon test) resultTable = runNonparametricTests(fluxData, metadata, ‘Flux’, ‘Disease’)

% Multiple group comparison (Kruskal-Wallis test) resultTable = runNonparametricTests(abundanceData, metadata, ‘relative_abundance’, ‘Treatment’)

Note

  1. Automatically selects appropriate test based on number of groups

  2. Data is automatically normalized before testing

  3. Missing values (NaN) are removed before analysis

  4. Multiple testing correction uses Benjamini-Hochberg FDR

  5. Minimum group size of 3 samples is required for valid testing

runPersephone(configPath)[source]

This pipeline orchestrates the end-to-end process of constructing and analysing human–microbiome whole-body models (WBMs). It integrates sequencing data processing, metagenomic mapping, microbiome model generation, WBM personalisation, host–microbiome model creation, flux balance analysis (FBA), and statistical evaluation.

Updated: 2025.07.01 wbarton Updated: 2025.09.09 asheehy Updated: 2025.09.19 bnap

Usage:

runPersephone (configPath)

Overview of steps: 1. SeqC: Pre-process sequencing data and prepare input for MARS. 2. MARS: Map taxonomy-assigned reads to AGORA2/APOLLO reconstructions

to generate relative abundance files.

  1. MgPipe: Build microbiome community models using relative abundances.

  2. WBM personalisation: Personalise WBMs with physiological and/or

    metabolomic data.

  3. mWBM Creation: Combine WBMs (personalised or base) with microbiome models

    to create host–microbiome WBMs (miWBMs).

  4. FBA: Perform flux balance analysis on mWBMs using user-specified

    reaction objectives, followed by flux post-processing.

  5. Statistics: Run statistical analyses (e.g., mixed-effects linear

    regression) on flux and abundance results, accounting for metadata-specified predictors and confounders.

Inputs %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

configPathPath to a .mat file which should contain all of the

necessary inputs needed to run specific or all sections of this pipeline. A template of this config file can be found at: https://github.com/opencobra/cobratoolbox/tree/master/src/analysis/persephone or you can fill in an online version and download at: https://vmh2.life/persephone

Outputs %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

All results will be saved to your computer in the directories you can choose in the config file. The output ‘progress’ will also be produced and saved to your workspace. It will detail all sections that have been run and how long they took.

%%%%%%%%%%%%%%%%%%% PERSEPHONE INITIALISATION %%%%%%%%%%%%%%%%%%%%%%%%% Start timer

runSeqC(repoPathSeqC, outputPathSeqC, fileIDSeqC, procKeepSeqC, maxMemSeqC, maxCpuSeqC, maxProcSeqC, debugSeqC, runApptainer)[source]
======================================================================================================#

Title: SeqC as Flux Pipeline MATLAB Wrapper Author: Wiley Barton Modified code sources:

matlab script structure: Tim Hensen, runMars.m, 2025.01 assistance and reference from a generative AI model [ChatGPT](https://chatgpt.com/)

clean-up and improved readability

Last Modified: 2025.10.02 - wbarton Part of: Persephone Pipeline

Description:

This function builds and runs the SeqC Docker image from a MATLAB environment. It ensures necessary databases exist and coordinates execution with the MARS pipeline.

Inputs:
  • repoPathSeqC (char) : Path to the SeqC repository

  • outputPathSeqC (char) : Path for SeqC output

  • fileIDSeqC (char) : Unique identifier for file processing

  • procKeepSeqC (logical) : Keep all files (true/false)

  • maxMemSeqC (int) : Maximum memory allocation for SeqC

  • maxCpuSeqC (int) : Maximum CPU allocation for SeqC

  • maxProcSeqC (int) : Maximum processes for SeqC

  • debugSeqC (logical) : Enable debug mode (true/false)

  • runApptainer (logical) : Enable apptainer wrapping (true/false)

Dependencies:
  • MATLAB

  • Docker installed and accessible in the system path

Optional:
  • Apptainer (formerly Singularity) version 1.3.4

======================================================================================================#

Determine Operating System

validatePersephoneInputs(paths, resultPath)[source]

Validates the inputs provided in the paths structure for the Persephone pipeline. This function ensures that all required fields, flags, and parameters are present, appropriately formatted, and meet the expected constraints. Validation is performed across multiple pipeline components, including MARS, mgPipe, WBM personalization, HM creation, FBA, and statistics.

USAGE:

validated = validatePersephoneInput (paths)

INPUT:

paths Structure containing all required input fields for the Persephone – pipeline. Fields include: - General flags and paths (e.g., flagSeqC, resultPath) - Component-specific flags and parameters (e.g., flagMars, marsRepoPath, flagMgPipe, metadataPath, etc.)

OUTPUT:

validated Logical scalar indicating whether the validation succeeded – (true) or failed (false).

Notes

  • Ensure all required flags and fields in the paths structure are defined prior to invoking this function. Undefined or mismatched fields will result in validation failure.

  • This function is designed to be modular and extendable for future additions to the Persephone pipeline.

AUTHOR:

Tim Hensen, January 2025

See also: validateattributes, validatestring, COBRA Toolbox