Connect2resources¶

VMH2Metabolon(metabolite_structure)[source]¶

read in Metabolon to VMH mapping which has been done in parts manually and cross checked from two sides independently. Currently, we have 400 metabolites mapped. information missing in the current rBioNet flat files will be substituted with this information. I will also read in the Metabolon ID. (CHEM_ID in this input file).

INPUT metabolite_structure metabolite structure

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 09/2021

VMH2Seed(metabolite_structure)[source]¶

read in Metabolon to VMH mapping which has been done in parts manually and cross checked from two sides independently. Currently, we have 400 metabolites mapped. information missing in the current rBioNet flat files will be substituted with this information. I will also read in the Metabolon ID. (CHEM_ID in this input file).

INPUT metabolite_structure metabolite structure

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 09/2021

assignAGORAReconPresence(metabolite_structure, reaction)[source]¶

this function assigns whether a metabolite occurs in AGORA_X and ReconX

INPUT metabolite_structure metabolite structure reaction default: false (0). Set to true (1) if input is a reaction

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 09/2021

assignClassyFire(metabolite_structure, startSearch, endSearch)[source]¶

get metabolite classification from ClassyFire

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

endSearch specify where the search should end in the: metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure updated metabolite structure

Ines Thiele, 09/2021

convertOld2NewHMDB(HMDBId)[source]¶

This function converts the old style HMDB ids to the new style old style ‘HMDB06525’ new style ‘HMDB0006525’ – 7 digits – fill up old ID to new ID with 0

INPUT HMDBId HMDB id

OUTPUT HMDBId_new new style HMDB id

Ines Thiele 03/2022

getAnnoFromHMDB[source]¶: hmdb field exists in metabolite_structure as a field

getCas2CTD(metabolite_structure)[source]¶: The input file was obtained from http://ctdbase.org/reports/CTD_chemicals.csv.gz 1st col: ctd id, 3rd col cas

getCas2Echa(metabolite_structure)[source]¶

The input file was downloaded from https://echa.europa.eu/documents/10162/13629/ec_inventory_en.xlsx

INPUT metabolite_structure metabolite structure

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 2020-2021 first column contains echa_id,4th col is cas registry

getIDsFromBIGG[source]¶: This m file annotates the metabolite studeture with IDs from BiGG using an offline file. Ines Thiele 2020/2021

getIDsfromFiehnLab(metabolite_structure, sourceId, targetId, startSearch, endSearch)[source]¶: connect to Fiehn lab (associated paper: https://academic.oup.com/bioinformatics/article/26/20/2647/194184_ url from / to / query term e.g., http://cts.fiehnlab.ucdavis.edu/service/convert/kegg/inchikey/C00234

getIds2VMH(metabolite_structure)[source]¶: map Seed metabolites file obtained from https://www.pnas.org/highwire/filestream/616377/field_highwire_adjunct_files/0/pnas.1401329111.sd01.xlsx for PMID 24927599 when getting the biggId’s the script is checking whether the id’s are still valid by testing the weblink. Only valid bigg id’s will be added

getInchiStringFromHMDB(HMDBID)[source]¶

This function retrieves the inchiString from HMDB (online) for a given HMDB ID.

INPUT HMDBID Human metabolome database (HMDB) ID

OUTPUT inchiString Retrieved inchiString

Ines Thiele, 09/2021

getMetIdsFromInchiKeys(metabolite_structure, inchiKeyCheck, inchiStringCheck, inchiKeyAltCheck, metList)[source]¶: This function connects to UniChem and grebs available ID’s for metabolites that have Inchi Strings.

getMetIdsFromUniChem(metabolite_structure, startSearch, endSearch, vmhIdCheck, cheBIIdCheck, drugBankCheck, pubChemIdCheck, keggIdCheck, hmdbCheck, inchiKeyCheck, inchiStringCheck, inchiKeyAltCheck)[source]¶: This function connects to UniChem and grebs available ID’s for metabolites that have Inchi Strings.

getRxnFromKegg(metabolite_structure, metabolite_structure_rBioNet, metsField)[source]¶: get reaction from kegg

getSeed2Kegg(metabolite_structure)[source]¶

This function parses the file: ftp://ftp.kbase.us/assets/KBase_Reference_Data/Biochemistry/compounds.xls first column contains seed ID, 5th col contains Kegg ID. This file is provided in /data/ as ‘compounds.xlsx’

INPUT metabolite_structure metabolite structure

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 2020-2021

parseBiggID4VMH(metabolite_structure, startSearch, endSearch, grebMoreIDs)[source]¶: the problem is that by chance Bigg and VMH could have the same ID but for different metabolites – I do not do any additional checks right now which is dangerous (hence I do not greb more ID’s by default)

parseBridgeDb(metabolite_structure, startSearch, endSearch)[source]¶

function [metabolite_structure,IDsAdded,IdsMismatch] = parseBridgeDb(metabolite_structure) This function takes existing database-dependent identifiers and searches BridgeDB (https://bridgedb.github.io/) via their webservice for other database identifiers (see below) and adds them to the metabolite structure if the metabolite does not have the respective identifier. In the case that the metabolite has such identifier already but if there is a mismatch, this will be listed in ‘IdsMismatch’

INPUT metabolite_structure metabolite structure

OUTPUT metabolite_structure updated metabolite structure IDsAdded List of added IDs from BridgeDB IdsMismatch List of mismatching IDs between VMH and BridgeDB

Ines Thiele October 2020

parseCHOmineWebpage(metabolite_structure, startSearch, endSearch)[source]¶: try to guess chomine abbreviation based on VMH ID

parseChemIDPlusWebpage(metabolite_structure, startSearch, endSearch)[source]¶: uses unii IDs to parse

parseDBCollection(metabolite_structure, startSearch, endSearch)[source]¶

This function takes substantial time. Also note that order matters, hence, some resources are parsed twice

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

endSearch specify where the search should end in the: metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 09/2021

parseEPA4VMH(metabolite_structure, startSearch, endSearch)[source]¶: search EPA - comptox using casRegistry or using inchiKey

parseFDAsisWebpage(metabolite_structure, startSearch, endSearch)[source]¶: uses unii IDs to parse

parseMetaNetXWebpage(metabolite_structure, startSearch, endSearch)[source]¶

function [metabolite_structure,IDsAdded,IDsSuggested] = parseMetaNetXWebpage(metabolite_structure) This function first retrieves MetaNetX IDs based on existing IDs in the metabolite_structure (defined in queryFields). MetaNetX IDs will only be added to the metabolite_structure if the MetaNetX inchiKey and the metabolite_structure inchiKey agree (and added to IDsAdded(, otw it will be added to IDsSuggested. The function then takes all the MetaNetX IDs can retrieves further IDs to be added to the metabolite_structure. Therefore, we first verify the MetaNetX ID in the metabolite_structure by comparing the inchiKey in the metabolite_structure with the one from the MetaNetX ID if they do not agree the MetaNetX ID, the function tries to find the right ID based on the inchiKey in the metabolite structure. If unsuccesfull, the MetaNetX ID is removed from the metabolite_structure and added to the IDsSuggested list. Further ID’s are only retrieved for verified MetaNetX IDs.

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

endSearch specify where the search should end in the: metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure updated metabolite structure IDsAdded list of addded IDs IDsSuggested list of suggested IDs

Ines Thiele 2020/2021

parseVMH4IDs(metabolite_structure, startSearch, endSearch)[source]¶

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

endSearch specify where the search should end in the: metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure Updated metabolite structure

Ines Thiele, 09/2021

parseWikipediaWebpage(metabolite_structure, startSearch, endSearch)[source]¶

This function searches wikipedia for identifiers. It will either use wikipedia ids provided by the metabolite structure or try to find perfect hits based on metabolite name search.

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

endSearch specify where the search should end in the: metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure updated metabolite structure

Ines Thiele, 09/2021

queryExposomeExplorer(metabolite_structure)[source]¶

the function will search for metabolite names http://exposome-explorer.iarc.fr/search?utf8=%E2%9C%93&query=2-aminophenol+sulfate&button=

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

endSearch specify where the search should end in the: metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure updated metabolite structure

Ines Thiele, 09/2021

queryLipidMaps(metabolite_structure, startSearch, endSearch)[source]¶

https://www.lipidmaps.org/search/quicksearch.php?Name=2-methyl-dodecanedioic+acid

retrievePotHitsHMDB(met)[source]¶

This function connects to HMDB can searches the metabolite name. The first 10 hits will be looked at and the metabolite name will be search for in traditional name, IUPAC name, synonyms, and common name. If one or more hits are found, the HMDB Ids will be returned.

INPUT met Metabolite name

OUTPUT hmdb One or more HMDB id’s. If empty, no hmdb ID could be found. multipleHits This variable indicates whether there are multiple hits.

Ines Thiele, 09/2021

searchMultipleUnknownMetOnline(metabolite_structure, metabolite_structure_rBioNet, metab_rBioNet_online, rxn_rBioNet_online, startSearch, endSearch)[source]¶

INPUT metabolite_structure metabolite structure startSearch specify where the search should start in the

endSearch specify where the search should end in the: metabolite structure. Must be numeric (optional, default: all metabolites in the structure will be search for)

OUTPUT metabolite_structure updated metabolite structure

Ines Thiele, 2020-2021

searchUnknownMetOnline(met, VMHId, metabolite_structure_rBioNet, metab_rBioNet_online, rxn_rBioNet_online)[source]¶

This function searches HMDB by names and returns a metabolite structure and the HMDB ID if the name appear in the common name, IUPAC, synonyms, or traditional name.

INPUT met metabolite name (try to spell it correctly)

OUTPUT metabolite_structure metabolite structure

Ines Thiele, 09/2021