Gå direkt till huvudinnehåll
Researchdata.se

1. Comparative population transcriptomics in krill: reference transcriptomes (FASTA, GFF, TSV files)

https://doi.org/10.17044/SCILIFELAB.22722361
This item holds one major gzipped tar archive that contains 20 nested tar archives, each of which containing reference transcriptomes and associated metadata for one species of krill (20 species in total). Archive: krill.transcriptomes.tar.gz Contents of major archive (FILE,TAG,SPECIES,SIZE): - earm.transcriptomes.tar,earm,Euphausia similis var. armata,491.6M - ecry.transcriptomes.tar,ecry,Euphausia crystallorophias,89.7M - edin.transcriptomes.tar,edin,Euphausia distinguenda,496.5M - efri.transcriptomes.tar,efri,Euphausia frigida,345.2M - elam.transcriptomes.tar,elam,Euphausia lamelligera,515.9M - elos.transcriptomes.tar,elos,Euphausia longirostris,234M - emuc.transcriptomes.tar,emuc,Euphausia mucronata,360.4M - epac.transcriptomes.tar,epac,Euphausia pacifica,357.1M - erec.transcriptomes.tar,erec,Euphausia recurva,114.8M - esim.transcriptomes.tar,esim,Euphausia similis,417.9M - espi.transcriptomes.tar,espi,Euphausia spinifera,425.1M - esup.transcriptomes.tar,esup,Euphausia superba,520.6M - etri.transcriptomes.tar,etri,Euphausia triacantha,396M - eval.transcriptomes.tar,eval,Euphausia vallentini,635.1M - mnor.transcriptomes.tar,mnor,Meganyctiphanes norvegica,469M - nmeg.transcriptomes.tar,nmeg,Nematoscelis megalops,429M - tine.transcriptomes.tar,tine,Thysanoessa inermis,594.6M - tlon.transcriptomes.tar,tlon,Thysanoessa longicaudata,328.8M - tmac.transcriptomes.tar,tmac,Thysanoessa macrura,253.4M - trac.transcriptomes.tar,trac,Thysanoessa raschii,231.2M Contents of nested archives: Each nested tar archive contains the follow set of files (the "TAG" prepends the filenames according to the list of species tags above): TAG. trinity.fasta The full Trinity transcriptomem, including non-coding transcripts and alternative isoforms TAG.trinity.longest_isoforms.fasta.renamed.list.tsv: A TSV table to translate between original Trinity transcript sequence names (field 3) and names used throughout the analyses (field 2). This table contains the longest isoforms, i.e. the resulting transcripts after removing redundant shorter isoforms. - field 1: number - field 2: species-specific transcript sequence names used in analyses. The sequence name follow the format "TAG_NUMBER" for non-coding transcripts and "TAG_NUMBER_OTHER_NUMBER" for coding transcripts (the last number indicates which reading-frame was selected by transdecoder as the best). - field 3: original Trinity transcript sequence names TAG.trinity.longest_isoforms.coding.fasta The filtered transcriptome, including only the longest isoform of each coding transcript. TAG.trinity.longest_isoforms.coding.fasta.transdecoder.gff3 A GFF coordinate file that specifies where along the coding transcripts features such as CDS, UTRs start and stop. TAG.trinity.longest_isoforms.fasta.transdecoder.cds.fasta The CDS of the open reading frame of coding transcripts, as specified by the TAG.trinity.longest_isoforms.coding.fasta.transdecoder.gff3 GFF file and the TAG.trinity.longest_isoforms.coding.fasta file. TAG.trinity.longest_isoforms.fasta.transdecoder.pep.fasta The corresponding peptide sequence of encoded by each CDS. The GFF files follow the GFF3 standard: https://www.ensembl.org/info/website/upload/gff3.htmlÖppnas i en ny tabb The FASTA files follow the FASTA standard: https://www.ncbi.nlm.nih.gov/genbank/fastaformatÖppnas i en ny tabb Note: Compared to the files used in analyses, these files have been edited to reflect the species names and abbreviations used in publication figures.
Gå till källa för data
Öppnas i en ny tabb
https://doi.org/10.17044/SCILIFELAB.22722361

Citering och åtkomst

Administrativ information

Ämnesområde och nyckelord

Metadata

scilifelabuu