massBalance

checkBalance(model, element, printLevel, fileName, missingFormulaeBool)

Checks whether a set of reactions is elementally balanced. Note that exchange reactions are not elementally balanced.

USAGE:

[dE ,E, missingFormulaeBool] = checkBalance(model, element, printLevel, fileName, missingFormulaeBool)

INPUTS:

model: COBRA model structure:

  • .S - Stoichiometric matrix

  • .metForumlas - Metabolite formulas

element: Abbreviation of element e.g. C or Mg

OPTIONAL INPUTS:

printLevel: {-1, (0), 1} where:

-1 = print out missing formulae to a file; 0 = silent; 1 = print out missing formulae to screen reactions to screen

fileName: name of the file missingFormulaeBool: boolean variable for missing formulae

OUTPUTS:
dE: n x 1 vector of net change in elements per reaction bal = E*S

If isnan(dE(n)) then the reaction involves a metabolite without a formula.

E: m x 1 vector with number of elements in each metabolite missingFormulaeBool: boolean variable for missing formulae

checkFormulaValidty(model)

Assesses whether metabolites in model are likely to be documented in databases.

USAGE:

[dbBool, noDocMets] = checkFormulaValidty(model)

INPUT:

model: Model structure array

OUTPUTS:
dbBool: A boolean vector where the number of rows is equal to the number of

metabolites in model. Contains a logical 1 in rows for metabolites that are likely to be documented in databases and a logical 0 elsewhere.

noDocMets: A cell array containing a list of all metabolites in model that

are not likely to be documented in any database.

checkMassChargeBalance(model, printLevel, modelName)

Tests for a list of reactions if these reactions are mass-balanced by adding all elements on left hand side and comparing them with the sums of elements on the right hand side of the reaction. A reaction is considered elementally imbalanced if any of the molecular species involved is missing a chemical formula.

USAGE:

[massImbalance, imBalancedMass, imBalancedCharge, imBalancedRxnBool, elements, missingFormulaeBool, balancedMetBool] = checkMassChargeBalance(model, printLevel, modelName)

INPUT:

model: COBRA model structure:

  • .S - m x n stoichiometric matrix

  • .metForumlas - m x 1 cell array of metabolite formulas

  • .metCharges - m x 1 double array of charges

  • model.SIntRxnBool - Boolean of reactions heuristically though to be mass balanced. (optional)

OPTIONAL INPUTS:

printLevel: {-1, (0), 1} where:

-1 = print out diagnostics on problem reactions to a file, 0 = silent, 1 = print elements as they are checked (display progress), 2 = also print out diagnostics on problem reactions to screen

modelName: name of the file

OUTPUTS:
massImbalance: nRxn x nElement matrix with mass imbalance

for each element checked. 0 if balanced.

imBalancedMass: nRxn x 1 cell with mass imbalance

e.g. -3 H means three hydrogens disappear in the reaction.

imBalancedCharge: nRxn x 1 vector with charge imbalance, empty if no imbalanced reactions imBalancedRxnBool: boolean vector indicating imbalanced reactions (including exchange reactions!) elements: nElement x 1 cell array of element abbreviations checked missingFormulaeBool: nMet x 1 boolean vector indicating metabolites without formulae balancedMetBool: boolean vector indicating metabolites exclusively involved in balanced reactions

OPTIONAL OUTPUT FILES: modelName_mass_imbalanced_reactions.txt provides a human readable summary for each mass imbalanced reaction.

For each reaction it contains: j the index of the column of the stoichiometric matrix corresponding to the reaction rxns{j} the abbreviation of the reaction imbalance the number of each element that is unbalanced equation the reaction equation

For each metabolite involved in a reaction it contains i the index of the row of the stoichiometric matrix corresponding to the metabolite mets{i} the abbreviation of the reaction S(i,j) the stoichiometric coefficient metFormulas{i} the chemical formula of the metabolite

Example #Rxn;rxnAbbr;imbalance;equation 32;2AMACSULT;3 O, 1 S;2amac[c] + nadph[c] + paps[c] -> nadp[c] + Lcyst[c] + pap[c]

58 2amac[c] -1 C3H5NO2 60 nadph[c] -1 C21H26N7O17P3 61 nadp[c] 1 C21H25N7O17P3 62 paps[c] -1 C10H11N5O10P2 63 Lcyst[c] 1 C3H6NO5S 64 pap[c] 1 C10H11N5O10P2

computeElementalMatrix(model, metList, warnings, generalFormula)

Computes elemental matrix

USAGE:

[Ematrix, element] = computeElementalMatrix(model, metList, warnings, generalFormula)

INPUT:

model: COBRA model structure (must define .mets and .metFormulas)

OPTIONAL INPUTS:
metList: Cell array of which metabolites to search for

(Default = all metabolites in model)

warnings: Display warnings if there are errors with the

formula. (Default = true)

generalFormula: * (default) false to return composition for [C N O H P other] only.
  • true to support formulae with brackets, decimal places and any chemical elements including undefined groups (e.g., ‘([H2O]2(CuSO4))2Generic_element0.5’).

OUTPUT:
Ematrix: m x 6 matrix of order [C N O H P other] if genericFormula = 0

m x e matrix if genericFormula = 1 given e elements in the chemical formulae

elements: cell array of elements corresponding to the columns of Ematrix

computeMetFormulae(model, varargin)

Compute the chemical formulas for metabolites without formulas using a set of metabolites with known formulae by minimizing the overall inconsistency in elemental balance. They are combined with the conserved moiety vectors identified from the left null space of S-matrix to return the general formulae. Known formulae for all external mets (including sink/demand) suffice to infer the formulae for all metabolites. More mets with known formulae will reveal more inconsistency in mass balance. This is particularly useful for checking the molecular weight of the biomass produced in the model, which should be curated to have MW = 1000 g/mol for prediction fidelity. Metabolites that may have adjustable stoichiometries (e.g., H+, H2O) can be supplied to fill up inconsistency.

USAGE:
Find unknown chemical formulae for all metabolites:

[model, metFormulae, elements, metEle, rxnBal, S_fill, solInfo, LP] = computeMetFormulae(model, ‘parameter’, value, … )

Find the min/max possible MW of a particular metabolite of interest:

[mwRange, metFormulae, elements, metEle, rxnBal, S_fill, solInfo, LP] = computeMetFormulae(model, ‘metMwRange’, met, …)

Also support direct input in the following order:

[…] = computeMetFormulae(model, knownMets, balancedRxns, fillMets, calcCMs, nameCMs, deadCMs, metMwRange, LPparams, …)

INPUT:

model: COBRA model

OPTIONAL INPUTS (support name-value argument input):
knownMets: Known metabolites (cell array of strings or vector of met IDs)

[default all mets with nonempty .metFormulas]

balancedRxns: The set of reactions for inferring formulae (cell array of strings or vector of rxn IDs)

[default all non-exchange reactions]

fillMets: The chemical formulas for compounds for freely filling the imbalance, e.g. {‘HCharge1’, ‘H2O’}.

‘none’ not to have any. [default ‘HCharge1’ if charge info exists else ‘H’]

calcCMs: Calculate conserved moieties from the left null space of S. Options:
  • ‘efmtool’: Use EFMtool (most comprehensive, recommanded, but computational

    cost may be high if there are many deadend mets)

  • ‘null’: Use the rational basis computed by Matlab

  • N: Directly supply the matrix \(N\) for conserved moieties

    (from rational basis or the set of extreme rays)

  • false: Not to find conserved moieties and return minimal formulae

[default ‘efmtool’ if ‘CalculateFluxModes.m’ is in path, else switch to ‘null’]

nameCMs: Name the identified conserved moieties or not [default 0]
  • 0: The program assigns default names for conserved moieties (Conserve_a, Conserve_b, …)

  • 1: Name true conserved moieties interactively (exclude dead end mets).

  • 2: Name all interactively (including dead end)

  • cell array of names for the conserved moieties, corresponding to the columns in \(N\) (N must be supplied as calcCMs to use this option)

deadCMs: true to include dead end metabolites when finding conserved moieties (default true) metMwRange: the metabolite of interest (or its ID) for which the range of possible molecular weights is determined

(if supplied, inputs ‘fillMets’, … , ‘deadCMs’ are all not used)

LPparams: Additional parameters for solveCobraLP. See solveCobraLP for details

OUTPUTS:

model: COBRA model with updated formulas and charges (1st output if ‘metMwRange’ is not called) mwRange: range for the MW of the metabolite of interest (1st output if ‘metMwRange’ is called) metFormulae: Formulae for unknown mets (#unknown mets x 2 cell array) if ‘metMwRange’ is not called

Cell array of two chemical formulae for the min/max MW of the metabolite of interest if ‘metMwRange’ is called

elements: Elements corresponding to the row of rxnBal, the coloumn of metEle metEle: Chemical formulas in matrix (#metKnown x #elements) rxnBal: Elemental balance of rxns (#elements x #rxns) S_fill: Adjustment of the S-matrix by ‘metFill’ (#metFill x #rxns in the input) solInfo: Info for the solutions for each of the optimization problems solved:

  • metUnknown: mets whose formulae are being solved for (#met_unknown x 1 cell)

  • metFill: metabolite formulae used to automatically fill inconsistency

  • rxns: reactions with elemental balance imposed

  • elements: the chemical elements present in the model’s formulae.

  • eleConnect: connected components partitioning solInfo.elements (#elements x #components logical matrix).

    Elements in the same component mean that they are connected by some ‘metFill’ and are optimized in the same round.

  • metEleUnknown: the formulae found for unknown metabolites in different steps (#met_unknown x #elements)

  • S_fill: solution for S_fill in each step

  • solEachEle: structure of solutions, one for each connected componenets of elements

  • feasTol: tolerance used to determine solution feasibility

  • stat: minIncon/minFill/minForm/mixed/infeasible stating where the final solution metEleUnknwon

    is obtained from. Ideally minForm if no numerical issue on feasibility. (minIncon/minMw/maxMw/minMw & maxMw if calling with option ‘metMwRange’)

  • N: the minimal conserved moiety vectors

  • cmFormulae: the chemical formulae for each conserved moiety in N

  • cmUnknown: logical vector indicating which columns in N represent moieties with unknown formulae.

    These are usually cofactors cycled in the network.

  • cmDeadend: logical vector indicating which columns in N represent moieties in deadend metabolites.

    These are usually isolated conversion steps between dead end metabolites.

LP: LP problem structure for solveCobraLP (#components x 1)

elementalMatrixToFormulae(Ematrix, elements, dMax)

Convert the elemental composition matrix into chemical formulae

USAGE:

metForm = elementalMatrixToFormulae(elements, Ematrix, dMax)

INPUTS:

Ematrix: elemental composition (M x E matrix) for M metabolites and E elements elements: cell array of elements corresponding to the columns of Ematrix

OPTIONAL INPUT:

dMax: the maximum number of decimal places for the stoichiometry (default 12)

Siu Hung Joshua Chan Nov 2016

findSExRxnInd(model, nRealMet, printLevel)

Returns a model with boolean vectors indicating internal vs external (exchange/demand/sink) reactions. Finds the reactions in the model which export/import from the model boundary

e.g. Exchange reactions, Demand reactions, Sink reactions

USAGE:

model = findSExRxnInd(model, nRealMet, printLevel)

model.SIntRxnBool Boolean of reactions heuristically though to be mass balanced.

INPUT:

model: structure with:

  • S - m x n stoichiometric matrix

OPTIONAL INPUT:
model: structure with:
  • model.biomassRxnAbbr - abbreviation of biomass reaction

nRealMet: specified in case extra rows in S which dont correspond to metabolties printLevel: verbose level

OUTPUT:

model: structure with:

  • .SIntRxnBool - Boolean of reactions heuristically though to be mass balanced.

  • .SIntMetBool - Boolean of metabolites heuristically though to be involved in mass balanced reactions.

  • .SOnlyIntMetBool - Boolean of metabolites heuristically though only to be involved in mass balanced reactions.

  • .SExMetBool - Boolean of metabolites heuristically though to be involved in mass imbalanced reactions.

  • .SOnlyExMetBool - Boolean of metabolites heuristically though only to be involved in mass imbalanced reactions.

  • .biomassBool - Boolean of biomass reaction

  • .DMRxnBool - Boolean of demand reactions. Prefix DM_ (optional field)

  • .SinkRxnBool - Boolean of sink reactions. Prefix sink_ (optional field)

  • .ExchRxnBool - Boolean of exchange reactions. Prefix EX_ or Exch_ or Ex_ (optional field)

getElementalComposition(formulae, elements, chargeInFormula)

Get the complete elemental composition matrix. It supports formulae with generic elements, parentheses and decimal places

USAGE:

[Ematrix, elements] = getElementalComposition(formulae, elements, chargeInFormula)

INPUT:
formulae: cell array of strings of chemical formulae. Can contain any generic elements starting

with a capital letter followed by lowercase letters or ‘_’, followed by a non-negative number. Also support ‘()’, ‘[]’, ‘{}’. E.g. {‘H2O’; ‘[H2O]2(CuSO4)Generic_element0.5’}

OPTIONAL INPUTS:

elements: elements from previous call to preserve the order (default {}) chargeInFormula: true to accept formulae containing the generic element ‘Charge’ representing the charges,

followed by a real number, e.g., ‘HCharge1’, ‘SO4Charge-2’ (default false).

OUTPUTS:

Ematrix: elemental composition matrix (#formulae x #elements) elements: cell array of elements corresponding to the columns of Ematrix

E.g., [Ematrix, elements] = getElementalComposition({‘H2O’; ‘[H2O]2(CuSO4)Generic_element0.5’}) would return:

elements = {‘H’, ‘O’, ‘Cu’, ‘S’, ‘Generic_element’} Ematrix = [ 2, 1, 0, 0, 0;

4, 6, 1, 1, 0.5]

Siu Hung Joshua Chan May 2017

ispolymer(formula)

Checks to see if a formula corresponds to a polymer

USAGE:

res = ispolymer(formula)

INPUT:

formula: formula to be checked

numAtomsOfElementInFormula(formula, element, printLevel)

Returns the number of atoms of a single element in a formula

USAGE:

N = numAtomsOfElementInFormula(formula, element, printLevel)

INPUTS:

formula: formula in format Element then NumberOfAtoms with no spaces element: Abbreviation of element e.g. C or Mg printLevel: verbose level

OUTPUT:

N: number of atoms of this element in the formula provided

pHbalanceProtons(model, massImbalance, printLevel, fileName)

Mass balance protons for each reaction by adjusting the hydrogen ion stoichiometric coefficient.

For each non transport reaction the proton stoichiometric coefficient for each reaction is given by the difference between the sum of the average number of protons bound by substrate reactants and the average number of protons bound by product reactants

i.e. proton S_ij = sum(substrates(S_ij*aveHbound)) - sum(products(S_ij*aveHbound))

For each transport reaction, model transport across the membrane as three reactions, one where non-proton reactants convert into non-proton metabolites involved in one compartment,then in the other compartment, non-proton metabolites convert into non-proton reactants. In between is a reconstruction transport reaction which must be elementally balanced for H to begin with. We assume that the transporter involved has affinity only for a particular species of metabolite defined by the reconstrution.

USAGE:

model = pHbalanceProtons(model, massImbalance, printLevel, fileName)

INPUT:

model: structure with fields:

  • model.S - stoichiometric matrix

  • model.mets - metabolites

  • model.SIntRxnBool - Boolean of internal reactions

  • model.aveHbound - average number of bound hydrogen ions

  • model.metFormulas - average number of bound hydrogen ions

  • model.metCharges - charge of metabolite species from reconstruction

OPTIONAL INPUTS:
massImbalance: nRxn x nElement matrix where massImbalance(i,j) is imbalance of

reaction i for element j. massImbalance(i,j) == 0 if balanced for that element. The first column is assumed to correspond to H balancing.

printLevel: {(0), -2, -1, 0, 1, 2, 3) print more to file or more to the command window fileName: name of the file to print out to

OUTPUT:

model: structure containing:

  • .S - Stoichiometric matrix balanced for protons where each row. corresponds to a reactant at specific pH.

  • .Srecon - Stoichiometric matrix of the reconstruction

regulariseModelFormulae(model)

Update the molecular formulae to make sure that they are consistent with model metadata, and expressed in Hill notation, with the exception that all R groups are replaced with the letter A