ChemoInformatics

atomicallyResolveModel(model, param)

1. Takes a standard COBRA model as input, annotates the metabolites with information on metabolite identifers (e.g. InChIKey) and then metabolite compares the identifiers and saves the most consistent identifiers in MDL MOL format, representing the structure of each metabolite, as well as the InChI, SMILES and images of each metabolite structure also (if ChemAxon cxcalc and openBabel are installed).

2. The MDL MOL files serve as the basis for creating the MDL RXN files that represent each metabolic reaction, if there is a MDL MOL file for each metabolite in a metabolic reaction. If JAVA is installed, it also atom maps each metabolic reactions with ReactionDecoder Tool and returns an MDL RXN file representing the atom mapping of each reaction.

3. If atom mappings have been generated, it builds a matlab digraph object representing an atom transition multigraph corresponding to the metabolic network (metabolic reaction hypergraph) from reaction stoichiometry and atom mappings.

The multigraph nature is due to possible duplicate atom transitions, where the same pair of atoms are involved in the same atom transition in different reactions.

The directed nature is due to possible duplicate atom transitions, where the same pair of atoms are involved in atom transitions of opposite orientation, corresponding to reactions in different directions.

Note that A = incidence(dATM) returns a a x t atom transition directed multigraph incidence matrix where a is the number of atoms and t is the number of directed atom transitions. Each atom transition inherits the orientation of its corresponding reaction.

A stoichimetric matrix may be decomposed into a set of atom transitions with the following atomic decomposition:

N=left(VV^{T}right)^{-1}VAE

VV^{T} is a diagonal matrix, where each diagonal entry is the number of atoms in each metabolite, so V*V^{T}*N = V*A*E

With respect to the input, N is the subset of model.S corresponding to atom mapped reactions

With respect to the output V := M2Ai

E := Ti2R A := incidence(dATM);

so we have the atomic decomposition M2Ai*M2Ai’*N = M2Ai*A*Ti2R

INPUTS:
model: COBRA model with following fields:
  • .S - The m x n stoichiometric matrix for the metabolic network.

  • .rxns - An n x 1 array of reaction identifiers.

  • .mets - An m x 1 array of metabolite identifiers.

OPTIONAL INPUTS:
model: COBRA model with following fields:
  • .metFormulas - An m x 1 array of metabolite chemical formulas.

  • .metinchi - An m x 1 array of metabolite identifiers.

  • .metsmiles - An m x 1 array of metabolite identifiers.

  • .metKEGG - An m x 1 array of metabolite identifiers.

  • .metHMDB - An m x 1 array of metabolite identifiers.

  • .metPubChem - An m x 1 array of metabolite identifiers.

  • .metCHEBI - An m x 1 array of metabolite identifiers.

param: A structure containing all the arguments for the function:

  • .printlevel: Verbose level

  • .debug: Logical value used to determine whether or not the results of different points in the function will be saved for debugging (default: empty).

  • .resultsDir: directory where the results should be saved (default: current directory)

    resultsDir/atomMapping containing the RXN files with atom mappings

molecular structure file options
  • .metaboliteIDcrossMapping: True to cross map metabolite IDs using https://github.com/opencobra/ctf/metaboliteIDcrossMapping.txt

  • .replace: True if the new ID should replace an existing ID (default: FALSE).

  • .standardisationApproach: String containing the type of standardisation of molecule MOL files

    (default: ‘explicitH’ if openBabel is installed, otherwise default: ‘basic’):

    ‘explicitH’ Normal chemical graphs; ‘implicitH’ Hydrogen suppressed chemical graph; ‘basic’ No standardisation, just update the MOL file header.

  • .keepMolComparison: Logic value for comparing MDL MOL files from various sources (default: FALSE)

  • .onlyUnmapped: Logic value to select create only unmapped MDL RXN files (default: FALSE).

  • .adjustToModelpH: Logic value used to determine whether a molecule’s pH must be adjusted in accordance with the COBRA model. If TRUE, requires MarvinSuite).

  • .dirsToCompare: Cell(s) with the path to directory to an existing database (default: empty).

  • .dirNames: Cell(s) with the name of the directory(ies) (default: empty).

atom mapping options
  • .atomMapping True to atom map reactions (default: TRUE)

  • .bonds True to calculate numbers of bonds broken and formed (default: TRUE)

  • .replaceExistingAtomMappings True to recompute existing atom mapping data, even if atom mapping already exists (default: FALSE).

atom transition multigraph options
  • .directed - transition split into two oppositely directed edges for reversible reactions, default: false

  • .param.sanityChecks - perform checks on creation of atom transition multigraph, default: true.

OUTPUTS: modelOut: A new model with the additional fields

  • .comparison:

  • .standardisation:

  • .bondsBF: Number of bonds broken and formed in each reaction, if param.bonds = true.

  • .bondsE: Estimated bond enthalpies for each metabolic reaction, if param.bonds = true.

arm: An atomically resolved model as a matlab structure with the following fields:

arm.MRH: Standard COBRA model (Directed Metabolic Hypergraph), with additional fields: arm.MRH.metAtomMappedBool: m x 1 boolean vector indicating atom mapped metabolites arm.MRH.rxnAtomMappedBool: n x 1 boolean vector indicating atom mapped reactions arm.MRH.metRXNBool: m x 1 boolean vector indicating metabolites represented in RXN files arm.MRH.rxnRXNBool: n x 1 boolean vector indicating reactions represented in RXN files

arm.dATM: Directed atom transition multigraph (dATM) obtained from buildAtomTransitionMultigraph.m

dATM: Directed atom transition multigraph as a MATLAB digraph structure with the following tables:

  • .Nodes — Table of node information, with p rows, one for each atom.

  • .Nodes.Atom - unique alphanumeric id for each atom by concatenation of the metabolite, atom and element

  • .Nodes.AtomIndex - unique numeric id for each atom in atom transition multigraph

  • .Nodes.Met - metabolite containing each atom

  • .Nodes.AtomNumber - unique numeric id for each atom in an atom mapping

  • .Nodes.Element - atomic element of each atom

  • .EdgeTable — Table of edge information, with q rows, one for each atom transition instance.

  • .EdgeTable.EndNodes - two-column cell array of character vectors that defines the graph edges

  • .EdgeTable.Trans - unique alphanumeric id for each atom transition instance by concatenation of the reaction, head and tail atoms

  • .EdgeTable.TansIndex - unique numeric id for each atom transition instance

  • .EdgeTable.Rxn - reaction corresponding to each atom transition

  • .EdgeTable.HeadAtomIndex - head Nodes.AtomIndex

  • .EdgeTable.TailAtomIndex - tail Nodes.AtomIndex

arm.M2Ai: m x a matrix mapping each mapped metabolite to one or more atoms in the directed atom transition multigraph arm.Ti2R: t x n matrix mapping one or more directed atom transition instances to each mapped reaction

The internal stoichiometric matrix may be decomposition into

N = (M2Ai*M2Ai)^(-1)*M2Ai*Ti*Ti2R;

where Ti = incidence(dATM), is incidence matrix of directed atom transition multigraph.

report: A report of the database generation process

checkChemoinformaticDependencies()

check chemoinformatic dependencies

deleteProtons(model)

Function to delete the protons in the metabolic network

USAGE:

newModel = deleteProtons(model)

INPUTS:

model: COBRA model.

OUTPUTS:
newModel: COBRA model without protons nor protons trasport

reactions.

Example

newModel = deleteProtons(model)

editChemicalFormula(metFormula, addOrRemove)

Removes non-chemical characters from the chemical formula and replaces them with a R group. Produces a chemical formula from a string of atoms. Removes or add atoms in the formula.

USAGE:

[metFormula] = cobraFormulaToChemFormula(metReconFormula)

INPUTS:

metReconFormula: An n x 1 array of metabolite Recon formulas

OPTIONAL INPUTS:
addOrRemove: A struct array containing:

*.elements - element to edit *.times - vector indicated the times the

element will be deleted (negative) or added (positive)

OUTPUTS:

newFormula: A chemical formula for a metabolite

generateChemicalDatabase(model, options)

This function uses the metabolite identifiers in the model to compare them and save the identifiers with the best score in MDL MOL format and/or inchi and simles and jpeg if it’s installed cxcalc and openBabel. The obtained MDL MOL files will serve as the basis for creating the MDL RXN files that represent a metabolic reaction and can only be written if there is a MDL MOL file for each metabolite in a metabolic reaction. If JAVA is installed, it also atom maps the metabolic reactions with an MDL RXN file.

USAGE:

[info, modelOut] = generateChemicalDatabase(model, options)

INPUTS:

model: COBRA model with following fields:

  • .S - The m x n stoichiometric matrix for the metabolic network.

  • .rxns - An n x 1 array of reaction identifiers.

  • .mets - An m x 1 array of metabolite identifiers.

  • .metFormulas - An m x 1 array of metabolite chemical formulas.

  • .metinchi - An m x 1 array of metabolite identifiers.

  • .metsmiles - An m x 1 array of metabolite identifiers.

  • .metKEGG - An m x 1 array of metabolite identifiers.

  • .metHMDB - An m x 1 array of metabolite identifiers.

  • .metPubChem - An m x 1 array of metabolite identifiers.

  • .metCHEBI - An m x 1 array of metabolite identifiers.

options: A structure containing all the arguments for the function:

  • .resultsDir: The path to the directory containing the RXN files

    with atom mappings (default: current directory)

  • .printlevel: Verbose level

  • .standardisationApproach: String containing the type of standardisation

    for the molecules (default: ‘explicitH’ if openBabel is installed, otherwise default: ‘basic’):

    ‘explicitH’ Normal chemical graphs; ‘implicitH’ Hydrogen suppressed chemical graph; ‘basic’ Update the header.

  • .keepMolComparison: Logic value for comparing MDL MOL files

    from various sources (default: FALSE)

  • .atomMapping: Logic value to decide on atom mapping. If false, it will generate only unmapped MDL RXN files (default: TRUE).

  • .bonds: Logic value to decide on computing bond enthalpy and bonds broken and formed (default: TRUE) (only if atomMapping = 1);

  • .adjustToModelpH: Logic value used to determine whether a molecule’s

    pH must be adjusted in accordance with the COBRA model. TRUE, requires MarvinSuite).

  • .dirsToCompare: Cell(s) with the path to directory to an

    existing database (default: empty).

  • .dirNames: Cell(s) with the name of the directory(ies) (default: empty).

  • .debug: Logical value used to determine whether or not the results

    of different points in the function will be saved for debugging (default: empty).

OUTPUTS: modelOut: A new model with the following additional fields

  • .metAtomMappedBool: m x 1 boolean vector indicating atom mapped metabolites

  • .rxnAtomMappedBool: n x 1 boolean vector indicating atom mapped reactions

  • .comparison:

  • .standardisation:

  • .bondsBF: Number of bonds broken and formed in each reaction, if options.bonds = true.

  • .bondsE: Estimated bond enthalpies for each metabolic reaction, if options.bonds = true.

report: Struct array containing a diary of the database generation process

*.molCollectionReport: Struct array containing information on the
metabolite structures each of the model’s sources.

-.metList: List of the metabolites in the model. -.sources: Sources from which the metabolic structures

were obtained.

-.structuresObtained: Number of metabolites with a structure. -.structuresObtainedPerSource: A Boolean matrix (mets x sources)

indicating whether or not a structure was obtained.

-.databaseCoverage: Table showing the coverage per source. -.idsToCheck: Id’s from which the metabolic structure wasn’t

obtained.

*.sourcesComparison: Struct array containing information on the

metabolite structures comparison.

-.mets: List of the metabolites in the model with structure. -.sources: Sources from which the metabolic structures

were obtained.

-.comparisonMatrix: Matrix (mets x sources) with the

comparison score.

-.chargeOkBool: Boolean vector indicating whether the

metabolite’s formula matches the formula of the source with the highest score.

-.metFormula: String array with the formulas of the

metabolites.

-.met_”metID”: Comparison tables for each metabolite. -.comparisonTable: Table summarising the highest score

sources per metabolite.

*.adjustedpHTable: table indicating whether or not the highest

scoring metabolite required pH adjustment and identifying metabolites for which the pH could not be adjusted.

*.standardisationReport: Table with InChIKeys, InChIs and

SMILES for the highest scoring metabolites.

*.reactionsReport: Struct array containing information about the

atom-mapped reactions.

-.rxnInDatabase: Cell array containing the rxns IDs of the

MDL RXN files written.

-.mappedRxns: Cell array containing the rxns IDs of the

atom-mapped reactions.

-.balancedReactions: Cell array containing the rxns IDs of the

balanced atom-mapped reactions.

-.unbalancedReactions: Cell array containing the rxns IDs of the

unbalanced atom-mapped reactions.

-.rxnMissing: Cell array containing the rxns IDs of the reactions

could not be written due to missing metabolites structures in the reactions.

-.metInDatabase: Cell array containing the metabolite IDs

in the metabolite database.

-.metsAllwaysInBalancedRxns: Cell array containing the metabolite

IDs of the metabolites in balanced reactions at all times.

-.metsSometimesInUnbalancedRxns: Cell array containing the metabolite

IDs of the metabolites ocassionally in unbalanced reactions.

-.metsAllwaysInUnbalancedRxns: Cell array containing the metabolite

IDs of the metabolites always in unbalanced reactions.

-.metsNotUsed: Cell array containing the metabolite IDs of

the metabolites could not be integrated in reactions since another structure was missing.

-.missingMets: Cell array containing the metabolite IDs of

the missing metabolites

-.table: Table containing information about the

atom-mapped reactions.

*.bondsData: A table containing the bonds broken and formed, the

enthalpy change, and the substrate mass per atom-mapped reaction.

openBabelConverter(origFormat, outputFormat, saveFileDir)

This function converts chemoformatic formats using OpenBabel. It requires to have openbabel installed. The formats that can be converted are used MDL MOL, SMILES, InChI, InChIKey, MDL RXN, reaction SMILES and rInChI.

USAGE:

newStructure = openBabelConverter(origFormat, outputFormat, saveFile)

INPUT:
origFormat: Original chemoinformatic format. Chemical tables such

as MDL MOL or MDL RXN must be provided as files

outputFormat: The format to be converted. Formats supported: smiles,

mol, inchi, inchikey, rxn and rinchi.

OPTIONAL INPUTS:
saveFileDir: String with the directory where the new format will be

saved. If is empty, the format is not saved.

Example

Example 1 (MDL MOL to InChI): origFormat = [pwd filesep ‘alanine.mol’]; outputFormat = ‘inchi’; newFormat = openBabelConverter(origFormat, outputFormat);

Example 2 (InChI to SMILES): origFormat = ‘InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1’; outputFormat = ‘smiles’; newFormat = openBabelConverter(origFormat, outputFormat);

Example 3 (SMILES to mol): origFormat = ‘C[C@@H](C(=O)O)N’; outputFormat = ‘mol’; newFormat = openBabelConverter(origFormat, outputFormat);