Chemoinformatics¶
- atomicallyResolveModel(model, param)[source]¶
1. Takes a standard COBRA model as input, annotates the metabolites with information on metabolite identifers (e.g. InChIKey) and then metabolite compares the identifiers and saves the most consistent identifiers in MDL MOL format, representing the structure of each metabolite, as well as the InChI, SMILES and images of each metabolite structure also (if ChemAxon cxcalc and openBabel are installed).
2. The MDL MOL files serve as the basis for creating the MDL RXN files that represent each metabolic reaction, if there is a MDL MOL file for each metabolite in a metabolic reaction. If JAVA is installed, it also atom maps each metabolic reactions with ReactionDecoder Tool and returns an MDL RXN file representing the atom mapping of each reaction.
3. If atom mappings have been generated, it builds a matlab digraph object representing an atom transition multigraph corresponding to the metabolic network (metabolic reaction hypergraph) from reaction stoichiometry and atom mappings.
The multigraph nature is due to possible duplicate atom transitions, where the same pair of atoms are involved in the same atom transition in different reactions.
The directed nature is due to possible duplicate atom transitions, where the same pair of atoms are involved in atom transitions of opposite orientation, corresponding to reactions in different directions.
Note that A = incidence(dATM) returns a a x t atom transition directed multigraph incidence matrix where a is the number of atoms and t is the number of directed atom transitions. Each atom transition inherits the orientation of its corresponding reaction.
A stoichimetric matrix may be decomposed into a set of atom transitions with the following atomic decomposition:
N=left(VV^{T}right)^{-1}VAE
VV^{T} is a diagonal matrix, where each diagonal entry is the number of atoms in each metabolite, so V*V^{T}*N = V*A*E
With respect to the input, N is the subset of model.S corresponding to atom mapped reactions
- With respect to the output V := M2Ai
E := Ti2R A := incidence(dATM);
so we have the atomic decomposition M2Ai*M2Ai’*N = M2Ai*A*Ti2R
- INPUTS
model – COBRA model with following fields: * .S - The m x n stoichiometric matrix for the metabolic network. * .rxns - An n x 1 array of reaction identifiers. * .mets - An m x 1 array of metabolite identifiers.
- OPTIONAL INPUTS
model – COBRA model with following fields: * .metFormulas - An m x 1 array of metabolite chemical formulas. * .metinchi - An m x 1 array of metabolite identifiers. * .metsmiles - An m x 1 array of metabolite identifiers. * .metKEGG - An m x 1 array of metabolite identifiers. * .metHMDB - An m x 1 array of metabolite identifiers. * .metPubChem - An m x 1 array of metabolite identifiers. * .metCHEBI - An m x 1 array of metabolite identifiers.
param – A structure containing all the arguments for the function:
.printlevel: Verbose level
.debug: Logical value used to determine whether or not the results of different points in the function will be saved for debugging (default: empty).
- .resultsDir: directory where the results should be saved (default: current directory)
resultsDir/atomMapping containing the RXN files with atom mappings
- molecular structure file options
.metaboliteIDcrossMapping: True to cross map metabolite IDs using https://github.com/opencobra/ctf/metaboliteIDcrossMapping.txt
.replace: True if the new ID should replace an existing ID (default: FALSE).
- .standardisationApproach: String containing the type of standardisation of molecule MOL files
(default: ‘explicitH’ if openBabel is installed, otherwise default: ‘basic’):
‘explicitH’ Normal chemical graphs; ‘implicitH’ Hydrogen suppressed chemical graph; ‘basic’ No standardisation, just update the MOL file header.
.keepMolComparison: Logic value for comparing MDL MOL files from various sources (default: FALSE)
.onlyUnmapped: Logic value to select create only unmapped MDL RXN files (default: FALSE).
.adjustToModelpH: Logic value used to determine whether a molecule’s pH must be adjusted in accordance with the COBRA model. If TRUE, requires MarvinSuite).
.dirsToCompare: Cell(s) with the path to directory to an existing database (default: empty).
.dirNames: Cell(s) with the name of the directory(ies) (default: empty).
- atom mapping options
.atomMapping True to atom map reactions (default: TRUE)
.bonds True to calculate numbers of bonds broken and formed (default: TRUE)
.replaceExistingAtomMappings True to recompute existing atom mapping data, even if atom mapping already exists (default: FALSE).
- atom transition multigraph options
.directed - transition split into two oppositely directed edges for reversible reactions, default: false
.param.sanityChecks - perform checks on creation of atom transition multigraph, default: true.
OUTPUTS: modelOut: A new model with the additional fields
.comparison:
.standardisation:
.bondsBF: Number of bonds broken and formed in each reaction, if param.bonds = true.
.bondsE: Estimated bond enthalpies for each metabolic reaction, if param.bonds = true.
arm: An atomically resolved model as a matlab structure with the following fields:
arm.MRH: Standard COBRA model (Directed Metabolic Hypergraph), with additional fields: arm.MRH.metAtomMappedBool: m x 1 boolean vector indicating atom mapped metabolites arm.MRH.rxnAtomMappedBool: n x 1 boolean vector indicating atom mapped reactions arm.MRH.metRXNBool: m x 1 boolean vector indicating metabolites represented in RXN files arm.MRH.rxnRXNBool: n x 1 boolean vector indicating reactions represented in RXN files
arm.dATM: Directed atom transition multigraph (dATM) obtained from buildAtomTransitionMultigraph.m
dATM: Directed atom transition multigraph as a MATLAB digraph structure with the following tables:
.Nodes — Table of node information, with p rows, one for each atom.
.Nodes.Atom - unique alphanumeric id for each atom by concatenation of the metabolite, atom and element
.Nodes.AtomIndex - unique numeric id for each atom in atom transition multigraph
.Nodes.Met - metabolite containing each atom
.Nodes.AtomNumber - unique numeric id for each atom in an atom mapping
.Nodes.Element - atomic element of each atom
.EdgeTable — Table of edge information, with q rows, one for each atom transition instance.
.EdgeTable.EndNodes - two-column cell array of character vectors that defines the graph edges
.EdgeTable.Trans - unique alphanumeric id for each atom transition instance by concatenation of the reaction, head and tail atoms
.EdgeTable.TansIndex - unique numeric id for each atom transition instance
.EdgeTable.Rxn - reaction corresponding to each atom transition
.EdgeTable.HeadAtomIndex - head Nodes.AtomIndex
.EdgeTable.TailAtomIndex - tail Nodes.AtomIndex
arm.M2Ai: m x a matrix mapping each mapped metabolite to one or more atoms in the directed atom transition multigraph arm.Ti2R: t x n matrix mapping one or more directed atom transition instances to each mapped reaction
The internal stoichiometric matrix may be decomposition into
N = (M2Ai*M2Ai)^(-1)*M2Ai*Ti*Ti2R;
where Ti = incidence(dATM), is incidence matrix of directed atom transition multigraph.
report: A report of the database generation process
- deleteProtons(model)[source]¶
Function to delete the protons in the metabolic network
- USAGE
newModel = deleteProtons (model)
- INPUTS
model – COBRA model.
- OUTPUTS
newModel – COBRA model without protons nor protons trasport reactions.
Example
newModel = deleteProtons(model)
- editChemicalFormula(metFormula, addOrRemove)[source]¶
Removes non-chemical characters from the chemical formula and replaces them with a R group. Produces a chemical formula from a string of atoms. Removes or add atoms in the formula.
- USAGE
[metFormula] = cobraFormulaToChemFormula (metReconFormula)
- INPUTS
metReconFormula – An n x 1 array of metabolite Recon formulas
- OPTIONAL INPUTS
addOrRemove – A struct array containing: *.elements - element to edit *.times - vector indicated the times the
element will be deleted (negative) or added (positive)
- OUTPUTS
newFormula – A chemical formula for a metabolite
- generateChemicalDatabase(model, options)[source]¶
This function uses the metabolite identifiers in the model to compare them and save the identifiers with the best score in MDL MOL format and/or inchi and simles and jpeg if it’s installed cxcalc and openBabel. The obtained MDL MOL files will serve as the basis for creating the MDL RXN files that represent a metabolic reaction and can only be written if there is a MDL MOL file for each metabolite in a metabolic reaction. If JAVA is installed, it also atom maps the metabolic reactions with an MDL RXN file.
- USAGE
[info, modelOut] = generateChemicalDatabase (model, options)
- INPUTS
model – COBRA model with following fields:
.S - The m x n stoichiometric matrix for the metabolic network.
.rxns - An n x 1 array of reaction identifiers.
.mets - An m x 1 array of metabolite identifiers.
.metFormulas - An m x 1 array of metabolite chemical formulas.
.metinchi - An m x 1 array of metabolite identifiers.
.metsmiles - An m x 1 array of metabolite identifiers.
.metKEGG - An m x 1 array of metabolite identifiers.
.metHMDB - An m x 1 array of metabolite identifiers.
.metPubChem - An m x 1 array of metabolite identifiers.
.metCHEBI - An m x 1 array of metabolite identifiers.
options – A structure containing all the arguments for the function:
- .resultsDir: The path to the directory containing the RXN files
with atom mappings (default: current directory)
.printlevel: Verbose level
- .standardisationApproach: String containing the type of standardisation
for the molecules (default: ‘explicitH’ if openBabel is installed, otherwise default: ‘basic’):
‘explicitH’ Normal chemical graphs; ‘implicitH’ Hydrogen suppressed chemical graph; ‘basic’ Update the header.
- .keepMolComparison: Logic value for comparing MDL MOL files
from various sources (default: FALSE)
.atomMapping: Logic value to decide on atom mapping. If false, it will generate only unmapped MDL RXN files (default: TRUE).
.bonds: Logic value to decide on computing bond enthalpy and bonds broken and formed (default: TRUE) (only if atomMapping = 1);
- .adjustToModelpH: Logic value used to determine whether a molecule’s
pH must be adjusted in accordance with the COBRA model. TRUE, requires MarvinSuite).
- .dirsToCompare: Cell(s) with the path to directory to an
existing database (default: empty).
.dirNames: Cell(s) with the name of the directory(ies) (default: empty).
- .debug: Logical value used to determine whether or not the results
of different points in the function will be saved for debugging (default: empty).
OUTPUTS: modelOut: A new model with the following additional fields
.metAtomMappedBool: m x 1 boolean vector indicating atom mapped metabolites
.rxnAtomMappedBool: n x 1 boolean vector indicating atom mapped reactions
.comparison:
.standardisation:
.bondsBF: Number of bonds broken and formed in each reaction, if options.bonds = true.
.bondsE: Estimated bond enthalpies for each metabolic reaction, if options.bonds = true.
report: Struct array containing a diary of the database generation process
- *.molCollectionReport: Struct array containing information on the
- metabolite structures each of the model’s sources.
-.metList: List of the metabolites in the model. -.sources: Sources from which the metabolic structures
were obtained.
-.structuresObtained: Number of metabolites with a structure. -.structuresObtainedPerSource: A Boolean matrix (mets x sources)
indicating whether or not a structure was obtained.
-.databaseCoverage: Table showing the coverage per source. -.idsToCheck: Id’s from which the metabolic structure wasn’t
obtained.
*.sourcesComparison: Struct array containing information on the
- metabolite structures comparison.
-.mets: List of the metabolites in the model with structure. -.sources: Sources from which the metabolic structures
were obtained.
- -.comparisonMatrix: Matrix (mets x sources) with the
comparison score.
- -.chargeOkBool: Boolean vector indicating whether the
metabolite’s formula matches the formula of the source with the highest score.
- -.metFormula: String array with the formulas of the
metabolites.
-.met_”metID”: Comparison tables for each metabolite. -.comparisonTable: Table summarising the highest score
sources per metabolite.
- *.adjustedpHTable: table indicating whether or not the highest
scoring metabolite required pH adjustment and identifying metabolites for which the pH could not be adjusted.
- *.standardisationReport: Table with InChIKeys, InChIs and
SMILES for the highest scoring metabolites.
- *.reactionsReport: Struct array containing information about the
atom-mapped reactions.
- -.rxnInDatabase: Cell array containing the rxns IDs of the
MDL RXN files written.
- -.mappedRxns: Cell array containing the rxns IDs of the
atom-mapped reactions.
- -.balancedReactions: Cell array containing the rxns IDs of the
balanced atom-mapped reactions.
- -.unbalancedReactions: Cell array containing the rxns IDs of the
unbalanced atom-mapped reactions.
- -.rxnMissing: Cell array containing the rxns IDs of the reactions
could not be written due to missing metabolites structures in the reactions.
- -.metInDatabase: Cell array containing the metabolite IDs
in the metabolite database.
- -.metsAllwaysInBalancedRxns: Cell array containing the metabolite
IDs of the metabolites in balanced reactions at all times.
- -.metsSometimesInUnbalancedRxns: Cell array containing the metabolite
IDs of the metabolites ocassionally in unbalanced reactions.
- -.metsAllwaysInUnbalancedRxns: Cell array containing the metabolite
IDs of the metabolites always in unbalanced reactions.
- -.metsNotUsed: Cell array containing the metabolite IDs of
the metabolites could not be integrated in reactions since another structure was missing.
- -.missingMets: Cell array containing the metabolite IDs of
the missing metabolites
- -.table: Table containing information about the
atom-mapped reactions.
- *.bondsData: A table containing the bonds broken and formed, the
enthalpy change, and the substrate mass per atom-mapped reaction.
- openBabelConverter(origFormat, outputFormat, saveFileDir)[source]¶
This function converts chemoformatic formats using OpenBabel. It requires to have openbabel installed. The formats that can be converted are used MDL MOL, SMILES, InChI, InChIKey, MDL RXN, reaction SMILES and rInChI.
- USAGE
newStructure = openBabelConverter (origFormat, outputFormat, saveFile)
- INPUT
origFormat – Original chemoinformatic format. Chemical tables such as MDL MOL or MDL RXN must be provided as files
outputFormat – The format to be converted. Formats supported: smiles, mol, inchi, inchikey, rxn and rinchi.
- OPTIONAL INPUTS
saveFileDir – String with the directory where the new format will be saved. If is empty, the format is not saved.
Example
Example 1 (MDL MOL to InChI): origFormat = [pwd filesep ‘alanine.mol’]; outputFormat = ‘inchi’; newFormat = openBabelConverter(origFormat, outputFormat);
Example 2 (InChI to SMILES): origFormat = ‘InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1’; outputFormat = ‘smiles’; newFormat = openBabelConverter(origFormat, outputFormat);
Example 3 (SMILES to mol): origFormat = ‘C[C@@H](C(=O)O)N’; outputFormat = ‘mol’; newFormat = openBabelConverter(origFormat, outputFormat);