<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne)</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-25908748-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.25908748</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.25908748">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne)</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-25908748-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.25908748</IDNo>
      </titlStmt>
      <rspStmt />
      <prodStmt>
        <grantNo xml:lang="en" agency="Swedish Research Council">2019-04791_VR</grantNo>
      </prodStmt>
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-06-26" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-06-26" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.25908748">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:

Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., &amp; Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031

Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269) .

The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ).

Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.

- pmne_functional_edit1.gff.gz
contains the functional annotation (protein coding genes) of the primary genome assembly (GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ). This is the original file that was submitted to ENA. A derived version of the file is available from NCBI; the NCBI version was generated from the EMBL records of each annotated gene and differs in that it for instance use a different naming scheme for the seqid column and the locus tags. The NCBI version is available at this link (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/963/668/995/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11_genomic.gff.gz) .

The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISweden).

- pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz
contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).

- pmne_mtdna.gff.gz
contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).

- pmne_ncRNAs.gff.gz
contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.

- pmne_tRNAs_and_pseudogenes.gff.gz
contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).

- pmne_PacBio_isoseq.sorted.bam
contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436) ) aligned to the primary genome assembly.

- pmne_repeat_library.fa.gz
contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).

Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats.

The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdf

The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformat/

ContactFor questions about this dataset, please contact:
jacob.hoglund@ebc.uu.se
niclas.backstrom@ebc.uu.se</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>