<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">CoMR (Comprehensive Mitochondrial proteome Reconstruction) reference databases,benchmarking data, and container for mitochondrial proteome reconstruction</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-31361839-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.31361839</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.31361839">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">CoMR (Comprehensive Mitochondrial proteome Reconstruction) reference databases,benchmarking data, and container for mitochondrial proteome reconstruction</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-31361839-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.31361839</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Stairs, Courtney</AuthEnty>
      </rspStmt>
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2026-04-21" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2026-04-21" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.31361839">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">This item contains reference databases, benchmarking resources, and a
reproducibility container associated with CoMR (Comprehensive Mitochondrial proteome Reconstruction),an integrative workflow for reconstructing mitochondrial proteomes from eukaryotic protein sequence data.

Mitochondrial proteome reconstruction often relies heavily on prediction of mitochondrial targeting signals (MTSs), but MTS predictors are mainly trained on model organisms and may perform poorly in phylogenetically divergent lineages or in organisms with atypical or reduced targeting sequences. CoMR was developed to address this by integrating complementary evidence sources within a unified scoring framework, including targeting prediction, curated homology searches, large-scale similarity searches, profile HMM detection, and automated phylogenetic analysis. 
The workflow is implemented as a modular **Snakemake-based pipeline** and is
distributed in containerized form to support reproducible execution across
computing environments.

The files deposited here support inspection, reuse, and reproducibility of that workflow. They include: (1) CoMR databases with FASTA databases, preformatted BLAST resources, orthogroup alignment archives, and HMM profile archives; (2) a benchmarking collection with filtered FASTA and DIAMOND databases, benchmark proteomes, benchmarking scripts, summary tables, figures, and benchmarking outputs; and (3) a Singularity/Apptainer container image (CoMR.sif) for running CoMR in a controlled computational environment.

The benchmarking material corresponds to the analyses described in the paper for the model yeast *Saccharomyces cerevisiae* and the divergent anaerobic protist *Paratrimastix pyriformis*. In the manuscript, CoMR achieved strong
discriminatory performance in yeast (ROC-AUC 0.92), exceeding standalone
TargetP2 prediction (ROC-AUC 0.72), and maintained robust performance in
*P. pyriformis* (ROC-AUC 0.86), where precision-recall analysis also supported
improved recovery of mitochondrial-related organelle proteins relative to
TargetP2. The benchmarking resources in this deposit include the processed data,scripts, figures, and output archives underlying those comparisons.

The deposited reference resources include the **CoMR Subtractive Mitochondrial
Database (SMD)**, supporting HMM resources, and benchmarking-modified database
versions generated for performance evaluation with taxonomic exclusion to reduce circularity. The benchmarking directory also documents how filtered databases and  orthogroup alignments were generated, and how benchmarking tables, ROC curves, and precision-recall summaries were generated from CoMR output tables. 

The accompanying README and MANIFEST files provide a self-contained guide to the files and an inventory of the distributed content.</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>