CoMR (Comprehensive Mitochondrial proteome Reconstruction) reference databases,benchmarking data, and container for mitochondrial proteome reconstruction
https://doi.org/10.17044/SCILIFELAB.31361839
This item contains reference databases, benchmarking resources, and a
reproducibility container associated with CoMR (Comprehensive Mitochondrial proteome Reconstruction),an integrative workflow for reconstructing mitochondrial proteomes from eukaryotic protein sequence data.
Mitochondrial proteome reconstruction often relies heavily on prediction of mitochondrial targeting signals (MTSs), but MTS predictors are mainly trained on model organisms and may perform poorly in phylogenetically divergent lineages or in organisms with atypical or reduced targeting sequences. CoMR was developed to address this by integrating complementary evidence sources within a unified scoring framework, including targeting prediction, curated homology searches, large-scale similarity searches, profile HMM detection, and automated phylogenetic analysis.
The workflow is implemented as a modular **Snakemake-based pipeline** and is
distributed in containerized form to support reproducible execution across
computing environments.
The files deposited here support inspection, reuse, and reproducibility of that workflow. They include: (1) CoMR databases with FASTA databases, preformatted BLAST resources, orthogroup alignment archives, and HMM profile archives; (2) a benchmarking collection with filtered FASTA and DIAMOND databases, benchmark proteomes, benchmarking scripts, summary tables, figures, and benchmarking outputs; and (3) a Singularity/Apptainer container image (CoMR.sif) for running CoMR in a controlled computational environment.
The benchmarking material corresponds to the analyses described in the paper for the model yeast *Saccharomyces cerevisiae* and the divergent anaerobic protist *Paratrimastix pyriformis*. In the manuscript, CoMR achieved strong
discriminatory performance in yeast (ROC-AUC 0.92), exceeding standalone
TargetP2 prediction (ROC-AUC 0.72), and maintained robust performance in
*P. pyriformis* (ROC-AUC 0.86), where precision-recall analysis also supported
improved recovery of mitochondrial-related organelle proteins relative to
TargetP2. The benchmarking resources in this deposit include the processed data,scripts, figures, and output archives underlying those comparisons.
The deposited reference resources include the **CoMR Subtractive Mitochondrial
Database (SMD)**, supporting HMM resources, and benchmarking-modified database
versions generated for performance evaluation with taxonomic exclusion to reduce circularity. The benchmarking directory also documents how filtered databases and orthogroup alignments were generated, and how benchmarking tables, ROC curves, and precision-recall summaries were generated from CoMR output tables.
The accompanying README and MANIFEST files provide a self-contained guide to the files and an inventory of the distributed content.
Gå till källa för data
https://doi.org/10.17044/SCILIFELAB.31361839
Citering och åtkomst
Citering och åtkomst
Skapare/primärforskare:
Forskningshuvudman:
Citering:
Administrativ information
Administrativ information
Finansiering
Finansiering
Finansiär:
- European Research Council
Öppnar nytt fönster hos ror.org.
ROR
Ämnesområde och nyckelord
Ämnesområde och nyckelord
Standard för svensk indelning av forskningsämnen 2025:
Nyckelord:
- Bioinformatic methods development
- Sequence analysis
Relationer
Relationer
Är ett komplement till:
Är citerad av:
Metadata
Metadata
