Integration

adaptDraftModelID(modelIDIn)[source]

This function adapts the name of the draft reconstruction entered into the pipeline into a suitable curated reconstruction ID.

USAGE: microbeID = adaptKBaseModelID(modelIDIn)

INPUT modelIDIn name of the KBase draft reconstruction file to refine

OUTPUT microbeID name of the resulting refined reconstruction

AUTHOR: Almut Heinken, 06/2020.

checkInputData(inputData, strainInformation)[source]

Part of the DEMETER pipeline. This function checks for duplicate and removed strains in the input data files and removes them. Also adds strains in the reconstruction resource that were not yet present in the input file.

USAGE

[checkedData,addedStrains,removedStrains,duplicateStrains] = checkInputData (inputData,strainInformation)

INPUT inputData: Table with experimental data to check strainInformation: Table with list on all strains to reconstruct

OUTPUT checkedData: Corrected table with experimental data to check addedStrains: Strains that were missing from inputData removedStrains: Strains that were in inputData but are not present

in strainInformation

duplicateStrains: Strains that were in inputData more than once

AUTHOR: Almut Heinken, 06/2020

createRBioNetDBFromVMHDB(varargin)[source]

This function creates an input fit for rBioNet from the VMH metabolite and reaction database and builds a new database settings file. It enables using the VMH database to modify or build reconstructions.

USAGE

createRBioNetDBFromVMHDB (‘rBioNetDBFolder’,rBioNetDBFolder)

OPTIONAL INPUT

rBioNetDBFolder Path where to save the created database and – database settings file. Default: current path

gapfillRefinedGenomeReactions(genomeAnnotation)[source]

Part of the DEMETER pipelinbe. Adds reactions needed to connect pathways introduced based on comparative genomic analyses.comparative genomics.

USAGE

gapfilledGenomeAnnotation = gapfillRefinedGenomeReactions (genomeAnnotation)

INPUT genomeAnnotation Text file with genome annotations that had

been retrieved through the function writeReactionsFromPubSeedSpreadsheets

OUTPUT gapfilledGenomeAnnotation Text file with genome annotations that have

been gap-filled to enable flux through reactions added based on genome annotations

getCurationStatus(infoFilePath, inputDataFolder, getComparativeGenomics)[source]

This function retrieves for each organism in a reconstruction resource whether the strain was refined based on experimental data and/or comparative genomic data. For experimental data, 2 indicates that the reconstruction was refined against available experimental data for the strain, 1 indicates that published studied were available for the strain but no suitable data was found or all findings were negative, and 0 indicates that no experimental data was available. For comparative genomic data, 2 indicates that genome annotations were refined for all subsystems, 1 indicates that certain subsystems were refined, and 0 indicates that no comparative genomic refinement was performed. the file with curation status information will be saved as a file called curationStatus.txt in the inputDataFolder.

USAGE

curationStatus = getCurationStatus (infoFilePath,inputDataFolder,getComparativeGenomics)

INPUTS infoFilePath: File with information on reconstructions to refine inputDataFolder: Folder with experimental data and database files to getComparativeGenomics: Boolean indicating whether PubSEED spreadsheets

with information on the reconstructed strains are available

OUTPUT curationStatus: Table with curation status of each model

getUnannotatedReactionsFromPubSeedSpreadsheets(infoFilePath, inputDataFolder, spreadsheetFolder)[source]

Part of the DEMETER pipeline. Prepares input file for the refinement based on comparative genomics analyses. Gets all the reactions that were not found in the respective organism through comparative genomics to remove them from the draft reconstructions.

USAGE

unannotatedRxns = getUnannotatedReactionsFromPubSeedSpreadsheets (infoFilePath,inputDataFolder,spreadsheetFolder)

INPUTS infoFilePath File with information on reconstructions to refine inputDataFolder Folder to save propagated data to (default: folder

in current path called “InputData”)

spreadsheetFolder Folder with comparative genomics data retrieved

from PubSEED in spreadsheet format if available. For an example of the required format, see cobratoolbox/papers/2021_demeter/exampleSpreadsheets.

loadVMHDatabase()[source]

This function loads the database with reactions and metabolites in Virtual Metabolic Human (https://www.vmh.life/) nomenclature.

USAGE

database=loadVMHDatabase

OUTPUT database Structure with reaction and metabolite database

mapKBaseToVMHReactions(translatedRxns)[source]

Part of the DEMETER pipeline. This functions aids in translating reactions from KBase to VMH nomenclature. Requires running the function propagateKBaseMetTranslationToRxns beforehand to translate metabolite IDs, which will then allow matching translated reactions to reactions that already exist in the VMH (Virtual Metabolic Human) database.

USAGE

[sameReactions,similarReactions] = mapKBaseToVMHReactions (translatedRxns)

INPUTS translatedRxns: Table with untranslated KBase reactions but

translated metabolite IDs

OUTPUT sameReactions: Table with translated KBase reactions that already

exist in the VMH database with corresponding IDs

similarReactions: Table with translated KBase reactions for which a

reaction with the same formula but irreversible in VMH and reversible in KBase (or vice versa) exists

mapMediumData2AGORA(strainGrowth, inputMedia)[source]

This function extracts experimentally determined growth of species on multiple media and converts the in silico minimal medium the strain can grow on into an input fit for DEMETER. The input data was retrieved from Tramontano et al., Nat Microbiol 2019 (PMID: 29556107).

USAGE

mappedMedia = mapMediumData2AGORA(strainGrowth,inputMedia)

INPUTS strainGrowth Growth of strains on the different media reported

by Tramontano et al.

inputMedia Growth media tested by Tramontano et al. converted

into a computationally readable format

OUTPUT mappedMedia Data from Tramontano et al. converted into a format

that can be added to GrowthRequirementsTable

parseNCBItaxonomy(NCBI)[source]

Grabs taxonomic lineage from NCBI using the NCBI ID

INPUT NCBI NCBI ID (e.g., 511145)

OUTPUT taxonomy structure containing the taxonomic lineage

Stefania Magnusdottir, Nov 2017

prepareInputData(infoFilePath, varargin)[source]

This function propagates available experimental data that was collected for AGORA2 (https://www.biorxiv.org/content/10.1101/2020.11.09.375451v1) to newly reconstructed strains and reads information from comparative genomic data in PubSEED spreadsheet format if available. It is recommended to check the propagated data manually afterwards.

USAGE

[adaptedInfoFilePath,inputDataFolder] = prepareInputData (infoFilePath,varargin)

REQUIRED INPUT infoFilePath File with information on reconstructions to refine OPTIONAL INPUTS inputDataFolder Folder to save propagated data to (default: folder

in current path called “InputData”)

spreadsheetFolder Folder with comparative genomics data retrieved

from PubSEED in spreadsheet format if available. For an example of the required format, see cobratoolbox/papers/2021_demeter/exampleSpreadsheets.

OUTPUTS adaptedInfoFilePath Path to file with taxonomic information adapted

with gram staining information

inputDataFolder Folder to save propagated data to (default: folder

in current path called “InputData”)

printRefinedModelIDs(draftFolder)[source]

This function prints the adapted names of the draft reconstruction entered into DEMETER. The adapted names need to be present in the MicrobeID column of the file with taxonomic information exactly as written in the output refinedModelIDs.

USAGE: refinedModelIDs = printRefinedModelIDs(draftFolder)

INPUT draftFolder Folder with draft COBRA models generated by

KBase pipeline to analyze

OUTPUT refinedModelIDs Names of the refined models generated by DEMETER

AUTHOR: Almut Heinken, 01/2021.

propagateKBaseMetTranslationToRxns(toTranslatePath)[source]

This functions replaced already translated metabolites in reactions with KBase/Model SEED nomenclature that are not yet translated. The function creates an output fit for the ReconstructionTool interface in rBioNet that can be used to check if the reactions already exist in the VMH database.

USAGE

[translatedRxns]=propagateKBaseMetTranslationToRxns (toTranslatePath)

INPUT

toTranslatePath String containing the path to xlsx, csv, or – txt file with reaction IDs in KBase/ModelSEED nomenclature to translate (e.g., rxn00001)

OUTPUTS

translatedRxns Table with reactions with already translated – metabolite IDs replaced that can serve as input for rBioNet to check if the reactions already exist in the VMH database.

readInputTableForPipeline(tablePath)[source]

Reads tables, such as text files, that are needed as input data for DEMETER in a format fit for the pipeline. The necessary inputs depend on the version of MATLAB.

USAGE

formattedTable = readInputTableForPipeline (tablePath)

INPUT tablePath Path to file with the table to read in text or

table format

OUTPUT formattedTable Table in cell array format

translateKBaseToVMHMets(toTranslatePath)[source]

This functions translates metabolites inKBase/Model SEED nomenclature that are not yet translated to VMH nomenclature based on names/InCHi keys. It is recommended the resulting translated is verified through manual inspection.

USAGE

[translatedMets]=translateKBaseToVMHMets (toTranslatePath)

INPUT

toTranslatePath String containing the path to xlsx, csv, or – txt file with metabolite IDs in KBase/ModelSEED nomenclature to translate (e.g., cpd00001)

OUTPUTS

translatedMets Table with KBase metabolite IDs that could be – matched to VMH metabolite IDs

writeReactionsFromPubSeedSpreadsheets(infoFilePath, inputDataFolder, spreadsheetFolder)[source]

Prepares input file for the comparative genomics part Write reaction spreadsheets from InReactions and PubSEED spreadsheets.

USAGE

writeReactionsFromPubSeedSpreadsheets (infoFilePath,inputDataFolder,spreadsheetFolder)

INPUTS infoFilePath File with information on reconstructions to refine inputDataFolder Folder to save propagated data to (default: folder

in current path called “InputData”)

spreadsheetFolder Folder with comparative genomics data retrieved

from PubSEED in spreadsheet format if available. For an example of the required format, see cobratoolbox/papers/2021_demeter/exampleSpreadsheets.