Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne)
https://doi.org/10.17044/SCILIFELAB.25908748
This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:
Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031Öppnas i en ny tabb
Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269Öppnas i en ny tabb) .
The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1Öppnas i en ny tabb) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1Öppnas i en ny tabb) ).
Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.
- pmne_functional_edit1.gff.gz
contains the functional annotation (protein coding genes) of the primary genome assembly (GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1Öppnas i en ny tabb) ). This is the original file that was submitted to ENA. A derived version of the file is available from NCBI; the NCBI version was generated from the EMBL records of each annotated gene and differs in that it for instance use a different naming scheme for the seqid column and the locus tags. The NCBI version is available at this link (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/963/668/995/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11_genomic.gff.gzÖppnas i en ny tabb) .
The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISwedenÖppnas i en ny tabb).
- pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz
contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451Öppnas i en ny tabb) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).
- pmne_mtdna.gff.gz
contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1Öppnas i en ny tabb) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).
- pmne_ncRNAs.gff.gz
contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.
- pmne_tRNAs_and_pseudogenes.gff.gz
contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).
- pmne_PacBio_isoseq.sorted.bam
contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436Öppnas i en ny tabb) ) aligned to the primary genome assembly.
- pmne_repeat_library.fa.gz
contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).
Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats.
The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.mdÖppnas i en ny tabb
The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdfÖppnas i en ny tabb
The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformatÖppnas i en ny tabb
ContactFor questions about this dataset, please contact:
jacob.hoglund@ebc.uu.seÖppnas i en ny tabb niclas.backstrom@ebc.uu.seÖppnas i en ny tabb
Gå till källa för data
Öppnas i en ny tabbhttps://doi.org/10.17044/SCILIFELAB.25908748
Citering och åtkomst
Citering och åtkomst
Administrativ information
Administrativ information
Ämnesområde och nyckelord
Ämnesområde och nyckelord
Relationer
Relationer
Metadata
Metadata

Uppsala universitet