Analysemetstruct

acceptIDsSuggested(metabolite_structure, IDsSuggested, annotationSource)[source]

function [metabolite_structure,IDsAdded] = acceptIDsSuggested(metabolite_structure,IDsSuggested) This function accepts suggested IDs as provided in the list IDsSuggested and adds them to the metabolite_structure. Note each row in IDsSuggested that shall be accepted must have an entry in the 6th column specifying that the entry in the 2nd column shall be accepted for the metabolite in the 1st column. Note that at this stage the annotation/curation level will be raised to curated, so it is imperative that each row and suggested ID will be carefully evaluated before being accepted.

INPUT metabolite_structure metabolite structure IDsSuggested list of suggested IDs, each row to be accepted

must have ‘accepted’ in the 6th column

annotationSource source of annotation, e.g. ‘curator (name)’ OUTPUT metabolite_structure updated metabolite structure IDsAdded List of added IDs

Ines Thiele, 01/2021

addAnnotations(metabolite_structure, RAW, annotationSource, annotationType, annotationVerification)[source]

function [metabolite_structure] = addAnnotations(metabolite_structure,RAW,annotationSource,annotationType) This function adds annotations (fields) to the metabolite_structure. It is generally used to populate the metabolite_structure with new metabolites from an xlsx sheet (RAW).

INPUT metabolite_structure metabolite structure RAW data read in using the function xlsread, e.g.,

[NUM,TXT,RAW]=xlsread(‘MetaboliteTranslationTable.xlsx’); Note that the xlsx sheet has to have certain headers to be correctly read in.

annotationSource define annotation source (to track where the

information came from, e.g., ‘Recon3D’). If not specified, ‘unknown’ will be added

annotation Type type of annotation, e.g., ‘automatic’ (Default), ‘manual’ annotationVerification verification of annotation, e.g., ‘not verified’ (Default), ‘verified by curator’, ‘verified based on inchiKeys’

OUTPUT metabolite_structure updated metabolite structure

Ines Thiele 2020/2021

addInfoFromMolFiles(metabolite_structure, folderName, startSearch, endSearch)[source]

This function creates inchiStrings, smiles, and inchiKeys from provided mol files, in the case that these fields are empty (NaN) in the structure.

INPUT metabolite_structure metabolite structure folderName name of folder that contains the mol structures startSearch specify where the search should start in the

metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

endSearch specify where the search should end in the

metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 09/2021

addMetFormulaCharge(metabolite_structure, startSearch, endSearch)[source]

This function uses getInchiString2ChargedFormula.m to calculate charge and neutral formula.

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

endSearch specify where the search should end in the

metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure updated metabolite structure

Ines Thiele 09/21

assignMetaboliteIDs[source]

readin the various files

check4DuplicatesInList(list)[source]

This function checks for duplicate entries in a list.

INPUT list List of e.g. metabolite abbr

OUTPUT listDuplicates List of duplicated entries. Second (or more) occurance

of the duplicate is provided.

Ines Thiele, 09/2021

checkAbbrExists(list, metab_rBioNet_online, rxn_rBioNet_online, metabolite_structure_rBioNet)[source]

This function checks whether the abbreviations in the list exist already in the VMH or the most recent rBioNetDB either as reaction or metabolite abbr

INPUT list List of abbreviations (either metabolite or reactions);

Alternatively a metabolite structure can be given as input and more fields are compared

metab_rBioNet_online rxn_rBioNet_online metabolite_structure_rBioNet

OUTPUT VMH_existance Lists whether the abbreviation exists in VMH (online),

as a reaction (2nd entry) or as a metabolite (3rd entry)

rBioNet_existance Lists whether the abbreviation exists in rBioNet (as deposited in cobra toolbox online),

as a reaction (2nd entry) or as a metabolite (3rd entry)

Ines Thiele 09/2021

checkLinkValidity(metabolite_structure, startSearch, endSearch)[source]

the aim of this script is go take each of the id’s collected and test whether

createMetaboliteIDs[source]

create metabolite database

getIDfromMetStructure(metabolite_structure, idName)[source]

INPUT metabolite_structure Structure containg metabolite related informations and ID’s idName Name of the ID as used in the metabolite structure

to be retrieved (e.g., ‘pubChemId’)

OUTPUT VMH2IDmappingAll Mapping of all VMH metabolites present in the

metabolite structure (including NaNs)

VMH2IDmappingPresent Mapping of all VMH metabolites present in the

metabolite structure (excluding NaNs)

VMH2IDmappingMissing Abbreviations of metabolites that are NaN’s in the

metabolite structure

IT, Aug 2020

getStatsMetStruct(metabolite_structure)[source]

INPUT metabolite_structure Metabolite structure

OUTPUT IDs List of ID names IDcount Count per ID Table Table listing IDs per reaction

Ines Thiele, 09/2021

linkComparison[source]

create a table that lists the resources

list2MetaboliteStructure(fileName, molFileDirectory, metList, fileNameOutput, metabolite_structure_rBioNet, customMetAbbrList)[source]

This function reads in an xlsx file and converts it into a metabolite_structure. The minimum requirement is that the VMH ID are present in one column of the table.

INPUT fileName Name of the xlsx file molFileDirectoryIn Location where to locate the mol files obtained from ctf and new mol files will be added. metList fileNameOutput

OUTPUT metabolite_structure metabolite structure containing the metabolites

with VMH ID listed in the xlsx file

rBioNet_existance This array indicates whether the query abbr exist

already in rBioNet (online and the growing internal database) (col 1: assigned initial VMHId, col 2: Id exists as rxn abbr, col 3: Id exists as met abbr, col 4: VMHId - if not empty this abbr was used in the metabolite structure instead of the one given in col 1.

VMH_existance This array indicates whether the query abbr exist

already in the VMH (online) (col 1: assigned initial VMHId, col 2: Id exists as rxn abbr, col 3: Id exists as met abbr).

Ines Thiele, 09/2021

verifyInchiString(metabolite_structure)[source]

function [metabolite_structure] = verifyInchiString(metabolite_structure) This function verifies whether the inchiString and the formula/charge match for the entries in the metabolite_structure. If the inchiString is neutral but the chargedFormula is not neutral, only a note to the inchiString_source will be added. If the inchiString does not match or represents a different charge (not neutral and not overlapping with the metabolite charge), the inchiString will be removed from the metabolite_structure and added to a IDsSuggested list.

Ines Thiele 2020/2021