Skip to main content

CoMR (Comprehensive Mitochondrial proteome Reconstruction) reference databases,benchmarking data, and container for mitochondrial proteome reconstruction

https://doi.org/10.17044/SCILIFELAB.31361839
This item contains reference databases, benchmarking resources, and a reproducibility container associated with CoMR (Comprehensive Mitochondrial proteome Reconstruction),an integrative workflow for reconstructing mitochondrial proteomes from eukaryotic protein sequence data. Mitochondrial proteome reconstruction often relies heavily on prediction of mitochondrial targeting signals (MTSs), but MTS predictors are mainly trained on model organisms and may perform poorly in phylogenetically divergent lineages or in organisms with atypical or reduced targeting sequences. CoMR was developed to address this by integrating complementary evidence sources within a unified scoring framework, including targeting prediction, curated homology searches, large-scale similarity searches, profile HMM detection, and automated phylogenetic analysis. The workflow is implemented as a modular **Snakemake-based pipeline** and is distributed in containerized form to support reproducible execution across computing environments. The files deposited here support inspection, reuse, and reproducibility of that workflow. They include: (1) CoMR databases with FASTA databases, preformatted BLAST resources, orthogroup alignment archives, and HMM profile archives; (2) a benchmarking collection with filtered FASTA and DIAMOND databases, benchmark proteomes, benchmarking scripts, summary tables, figures, and benchmarking outputs; and (3) a Singularity/Apptainer container image (CoMR.sif) for running CoMR in a controlled computational environment. The benchmarking material corresponds to the analyses described in the paper for the model yeast *Saccharomyces cerevisiae* and the divergent anaerobic protist *Paratrimastix pyriformis*. In the manuscript, CoMR achieved strong discriminatory performance in yeast (ROC-AUC 0.92), exceeding standalone TargetP2 prediction (ROC-AUC 0.72), and maintained robust performance in *P. pyriformis* (ROC-AUC 0.86), where precision-recall analysis also supported improved recovery of mitochondrial-related organelle proteins relative to TargetP2. The benchmarking resources in this deposit include the processed data,scripts, figures, and output archives underlying those comparisons. The deposited reference resources include the **CoMR Subtractive Mitochondrial Database (SMD)**, supporting HMM resources, and benchmarking-modified database versions generated for performance evaluation with taxonomic exclusion to reduce circularity. The benchmarking directory also documents how filtered databases and orthogroup alignments were generated, and how benchmarking tables, ROC curves, and precision-recall summaries were generated from CoMR output tables. The accompanying README and MANIFEST files provide a self-contained guide to the files and an inventory of the distributed content.
Go to data source
https://doi.org/10.17044/SCILIFELAB.31361839

Citation and access

Administrative information

Funding

Topic and keywords

Relations

Metadata

scilifelablu_en