Integration¶
- adaptDraftModelID(modelIDIn)[source]¶
This function adapts the name of the draft reconstruction entered into the pipeline into a suitable curated reconstruction ID.
USAGE: microbeID = adaptKBaseModelID(modelIDIn)
INPUT modelIDIn name of the KBase draft reconstruction file to refine
OUTPUT microbeID name of the resulting refined reconstruction
AUTHOR: Almut Heinken, 06/2020.
- checkInputData(inputData, strainInformation)[source]¶
Part of the DEMETER pipeline. This function checks for duplicate and removed strains in the input data files and removes them. Also adds strains in the reconstruction resource that were not yet present in the input file.
- USAGE
[checkedData,addedStrains,removedStrains,duplicateStrains] = checkInputData (inputData,strainInformation)
INPUT inputData: Table with experimental data to check strainInformation: Table with list on all strains to reconstruct
OUTPUT checkedData: Corrected table with experimental data to check addedStrains: Strains that were missing from inputData removedStrains: Strains that were in inputData but are not present
in strainInformation
duplicateStrains: Strains that were in inputData more than once
AUTHOR: Almut Heinken, 06/2020
- createRBioNetDBFromVMHDB(varargin)[source]¶
This function creates an input fit for rBioNet from the VMH metabolite and reaction database and builds a new database settings file. It enables using the VMH database to modify or build reconstructions.
- USAGE
createRBioNetDBFromVMHDB (‘rBioNetDBFolder’,rBioNetDBFolder)
- OPTIONAL INPUT
rBioNetDBFolder Path where to save the created database and – database settings file. Default: current path
- gapfillRefinedGenomeReactions(genomeAnnotation)[source]¶
Part of the DEMETER pipelinbe. Adds reactions needed to connect pathways introduced based on comparative genomic analyses.comparative genomics.
- USAGE
gapfilledGenomeAnnotation = gapfillRefinedGenomeReactions (genomeAnnotation)
INPUT genomeAnnotation Text file with genome annotations that had
been retrieved through the function writeReactionsFromPubSeedSpreadsheets
OUTPUT gapfilledGenomeAnnotation Text file with genome annotations that have
been gap-filled to enable flux through reactions added based on genome annotations
- getCurationStatus(infoFilePath, inputDataFolder, getComparativeGenomics)[source]¶
This function retrieves for each organism in a reconstruction resource whether the strain was refined based on experimental data and/or comparative genomic data. For experimental data, 2 indicates that the reconstruction was refined against available experimental data for the strain, 1 indicates that published studied were available for the strain but no suitable data was found or all findings were negative, and 0 indicates that no experimental data was available. For comparative genomic data, 2 indicates that genome annotations were refined for all subsystems, 1 indicates that certain subsystems were refined, and 0 indicates that no comparative genomic refinement was performed. the file with curation status information will be saved as a file called curationStatus.txt in the inputDataFolder.
- USAGE
curationStatus = getCurationStatus (infoFilePath,inputDataFolder,getComparativeGenomics)
INPUTS infoFilePath: File with information on reconstructions to refine inputDataFolder: Folder with experimental data and database files to getComparativeGenomics: Boolean indicating whether PubSEED spreadsheets
with information on the reconstructed strains are available
OUTPUT curationStatus: Table with curation status of each model
- getUnannotatedReactionsFromPubSeedSpreadsheets(infoFilePath, inputDataFolder, spreadsheetFolder)[source]¶
Part of the DEMETER pipeline. Prepares input file for the refinement based on comparative genomics analyses. Gets all the reactions that were not found in the respective organism through comparative genomics to remove them from the draft reconstructions.
- USAGE
unannotatedRxns = getUnannotatedReactionsFromPubSeedSpreadsheets (infoFilePath,inputDataFolder,spreadsheetFolder)
INPUTS infoFilePath File with information on reconstructions to refine inputDataFolder Folder to save propagated data to (default: folder
in current path called “InputData”)
- spreadsheetFolder Folder with comparative genomics data retrieved
from PubSEED in spreadsheet format if available. For an example of the required format, see cobratoolbox/papers/2021_demeter/exampleSpreadsheets.
- loadVMHDatabase()[source]¶
This function loads the database with reactions and metabolites in Virtual Metabolic Human (https://www.vmh.life/) nomenclature.
- USAGE
database=loadVMHDatabase
OUTPUT database Structure with reaction and metabolite database
- mapKBaseToVMHReactions(translatedRxns)[source]¶
Part of the DEMETER pipeline. This functions aids in translating reactions from KBase to VMH nomenclature. Requires running the function propagateKBaseMetTranslationToRxns beforehand to translate metabolite IDs, which will then allow matching translated reactions to reactions that already exist in the VMH (Virtual Metabolic Human) database.
- USAGE
[sameReactions,similarReactions] = mapKBaseToVMHReactions (translatedRxns)
INPUTS translatedRxns: Table with untranslated KBase reactions but
translated metabolite IDs
OUTPUT sameReactions: Table with translated KBase reactions that already
exist in the VMH database with corresponding IDs
- similarReactions: Table with translated KBase reactions for which a
reaction with the same formula but irreversible in VMH and reversible in KBase (or vice versa) exists
- mapMediumData2AGORA(strainGrowth, inputMedia)[source]¶
This function extracts experimentally determined growth of species on multiple media and converts the in silico minimal medium the strain can grow on into an input fit for DEMETER. The input data was retrieved from Tramontano et al., Nat Microbiol 2019 (PMID: 29556107).
- USAGE
mappedMedia = mapMediumData2AGORA(strainGrowth,inputMedia)
INPUTS strainGrowth Growth of strains on the different media reported
by Tramontano et al.
- inputMedia Growth media tested by Tramontano et al. converted
into a computationally readable format
OUTPUT mappedMedia Data from Tramontano et al. converted into a format
that can be added to GrowthRequirementsTable
- parseNCBItaxonomy(NCBI)[source]¶
Grabs taxonomic lineage from NCBI using the NCBI ID
INPUT NCBI NCBI ID (e.g., 511145)
OUTPUT taxonomy structure containing the taxonomic lineage
Stefania Magnusdottir, Nov 2017
- prepareInputData(infoFilePath, varargin)[source]¶
This function propagates available experimental data that was collected for AGORA2 (https://www.biorxiv.org/content/10.1101/2020.11.09.375451v1) to newly reconstructed strains and reads information from comparative genomic data in PubSEED spreadsheet format if available. It is recommended to check the propagated data manually afterwards.
- USAGE
[adaptedInfoFilePath,inputDataFolder] = prepareInputData (infoFilePath,varargin)
REQUIRED INPUT infoFilePath File with information on reconstructions to refine OPTIONAL INPUTS inputDataFolder Folder to save propagated data to (default: folder
in current path called “InputData”)
- spreadsheetFolder Folder with comparative genomics data retrieved
from PubSEED in spreadsheet format if available. For an example of the required format, see cobratoolbox/papers/2021_demeter/exampleSpreadsheets.
OUTPUTS adaptedInfoFilePath Path to file with taxonomic information adapted
with gram staining information
- inputDataFolder Folder to save propagated data to (default: folder
in current path called “InputData”)
- printRefinedModelIDs(draftFolder)[source]¶
This function prints the adapted names of the draft reconstruction entered into DEMETER. The adapted names need to be present in the MicrobeID column of the file with taxonomic information exactly as written in the output refinedModelIDs.
USAGE: refinedModelIDs = printRefinedModelIDs(draftFolder)
INPUT draftFolder Folder with draft COBRA models generated by
KBase pipeline to analyze
OUTPUT refinedModelIDs Names of the refined models generated by DEMETER
AUTHOR: Almut Heinken, 01/2021.
- propagateKBaseMetTranslationToRxns(toTranslatePath)[source]¶
This functions replaced already translated metabolites in reactions with KBase/Model SEED nomenclature that are not yet translated. The function creates an output fit for the ReconstructionTool interface in rBioNet that can be used to check if the reactions already exist in the VMH database.
- USAGE
[translatedRxns]=propagateKBaseMetTranslationToRxns (toTranslatePath)
- INPUT
toTranslatePath String containing the path to xlsx, csv, or – txt file with reaction IDs in KBase/ModelSEED nomenclature to translate (e.g., rxn00001)
- OUTPUTS
translatedRxns Table with reactions with already translated – metabolite IDs replaced that can serve as input for rBioNet to check if the reactions already exist in the VMH database.
- readInputTableForPipeline(tablePath)[source]¶
Reads tables, such as text files, that are needed as input data for DEMETER in a format fit for the pipeline. The necessary inputs depend on the version of MATLAB.
- USAGE
formattedTable = readInputTableForPipeline (tablePath)
INPUT tablePath Path to file with the table to read in text or
table format
OUTPUT formattedTable Table in cell array format
- translateKBaseToVMHMets(toTranslatePath)[source]¶
This functions translates metabolites inKBase/Model SEED nomenclature that are not yet translated to VMH nomenclature based on names/InCHi keys. It is recommended the resulting translated is verified through manual inspection.
- USAGE
[translatedMets]=translateKBaseToVMHMets (toTranslatePath)
- INPUT
toTranslatePath String containing the path to xlsx, csv, or – txt file with metabolite IDs in KBase/ModelSEED nomenclature to translate (e.g., cpd00001)
- OUTPUTS
translatedMets Table with KBase metabolite IDs that could be – matched to VMH metabolite IDs
- writeReactionsFromPubSeedSpreadsheets(infoFilePath, inputDataFolder, spreadsheetFolder)[source]¶
Prepares input file for the comparative genomics part Write reaction spreadsheets from InReactions and PubSEED spreadsheets.
- USAGE
writeReactionsFromPubSeedSpreadsheets (infoFilePath,inputDataFolder,spreadsheetFolder)
INPUTS infoFilePath File with information on reconstructions to refine inputDataFolder Folder to save propagated data to (default: folder
in current path called “InputData”)
- spreadsheetFolder Folder with comparative genomics data retrieved
from PubSEED in spreadsheet format if available. For an example of the required format, see cobratoolbox/papers/2021_demeter/exampleSpreadsheets.