<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">3. Comparative population transcriptomics in krill: orthogroups (FASTA, TSV files)</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-24039510-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.24039510</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.24039510">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">3. Comparative population transcriptomics in krill: orthogroups (FASTA, TSV files)</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-24039510-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.24039510</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Wallberg, Andreas</AuthEnty>
      </rspStmt>
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2023-10-19" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2023-10-19" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.24039510">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">This item contains a gzipped archive with ~13,000 orthogroups used to study molecular evolution in this project.

Archive:

krill.orthogroups.tar.gz

Contents of archive (FILE,SIZE,SPECIES,SAMPLES,SNPs):

- krill.proteinortho.tsv - the primary output table from Proteinortho. Describes which protein sequences from which species belong to the same orthogroup. Format according to the standard output of the program.
- krill.proteinortho.tsv.seqs.csv - a processed table that also contains the actual sequences line by line (see below).
- the alignments directory, which contains all OGs in unaligned and aligned files in FASTA format (see below).
Format of the krill.proteinortho.tsv.seqs.csv table

The fields are:

- NR = orthogroup number
- ORTHO_GROUP = orthogroup ID
- N_SPECIES = the number of species
- N_GENES = the number of genes/sequences in this orthogroup
- N_MATCHING[o] = number of sequences matching outgroup species for this orthogroup
- N_NON_MATCHING = number of sequences matching ingroup species for this orthogroup
- HEADER = the name of this particular sequence
- SEQ = the protein sequence
Contents of the alignments directory

Each orthogroup is represented by up to four FASTA files:

- OG*.cds.ginsi.fasta.orig = the original, unaligned and unfiltered sequences
- OG*.cds.ginsi.fasta = the aligned and filtered sequences
- OG*.cds.ginsi.fasta.without_cold_euphausia.fasta = the aligned and filtered sequences after removing cold-associated Euphausia species
- OG*.cds.ginsi.fasta.without_cold_thysanoessa.fasta = the aligned and filtered sequences after removing cold-associated Thysanoessa species</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>