<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">7. Ecological genomics of the Northern krill: Genome-scale comparisons of adaptive divergence</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-22817410-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.22817410</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.22817410">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">7. Ecological genomics of the Northern krill: Genome-scale comparisons of adaptive divergence</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-22817410-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.22817410</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Wallberg, Andreas</AuthEnty>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Unneberg, Per</AuthEnty>
      </rspStmt>
      <prodStmt>
        <grantNo xml:lang="en" agency="Swedish Research Council for Environment Agricultural Sciences and Spatial Planning">2017-00413_Formas</grantNo>
      </prodStmt>
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-03-27" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-03-27" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.22817410">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">This item holds multiple tar archives with genome-scale comparisons of divergence between Northern krill populations, including estimated allele-frequencies and divergence (e.g. FST) , and extended haplotype signatures (XP-nSL estimates). Many analyses were performed in "chunks" (160 in total across both gene-rich and gene-poor sequences), which are described in a previous item.

Population definitions

Population definitions are the same as desribed in a different item:


  - "at vs. me" = Atlantic Ocean samples (n=67) vs. the Mediterranean (i.e. Barcelona) samples (n=7).

  - "we vs. ea" = South-West North Atlantic Ocean (n=20) vs. North-East North Atlantic Ocean (n=47). In files using this contrast, sometimes the label "wa" is used instead of "we" for the South-West North Atlantic Ocean samples.



Contents:


  - allele_freqs_fst.gene_rich_sequences.at_vs_me.tar, contains per-SNP estimates of allele frequencies and FST between "at" and "me" groups along gene-rich sequences.

  - allele_freqs_fst.gene_rich_sequences.we_vs_ea.tar, as above but between "we" and "ea" groups.

  - allele_freqs_fst.gene_poor_sequences.at_vs_me.tar, contains per-SNP estimates of allele frequencies and FST between "at" and "me" groups along gene-poor sequences.

  - allele_freqs_fst.gene_poor_sequences.we_vs_ea.tar, as above but for "we" and "ea" groups.

  - allele_freqs_fst.merged_sequences.at_vs_me.csv.gz, contains per-SNP estimates of allele frequencies and FST between "at" and "me" merged into a single TSV file.

  - allele_freqs_fst.merged_sequences.we_vs_ea.csv.gz, as above but for "we" and "ea".

  - allele_freqs_fst.gene_rich_sequences_windows.at_vs_me.tar.gz, contains per-window estimates of FST between "at" and "me" groups along gene-rich sequences.

  - allele_freqs_fst.gene_rich_sequences_windows.we_vs_ea.tar.gz, as above but for "we" and "ea" groups.

  - allele_freqs_fst.gene_poor_sequences_windows.at_vs_me.tar.gz, contains per-window estimates of FST between "at" and "me" groups along gene-poor sequences.

  - allele_freqs_fst.gene_poor_sequences_windows.we_vs_ea.tar.gz, as above but for "we" and "ea" groups.

  - selscan_xpnsl.gene_rich_sequences.tar.gz, contains per-SNP cross-population XP-nSL statistics for gene-rich sequences.

  - selscan_xpnsl.gene_poor_sequences.tar.gz, contains per-SNP cross-population XP-nSL statistics for gene-poor sequences.

  - selscan_xpnsl.gene_rich_sequences_windows.tar.gz, contains per-window cross-population XP-nSL statistics for gene-rich sequences.

  - selscan_xpnsl.gene_poor_sequences_windows.tar.gz, as above but for gene-poor sequences.

  - fst_vs_xpnsl.per_snp.at_vs_me.csv.gz, contains per-SNP FST, genomic region and XP-nSL values in a single file for the "at vs. me" contrast.

  - fst_vs_xpnsl.per_snp.we_vs_ea.csv.gz, contains per-SNP FST, genomic region and XP-nSL values in a single file for the "we vs. ea" contrast.

  - fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.at_vs_me.tsv.tar.gz, integrates window-based statistics into a single file for the "at vs. me" contrast.

  - fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.we_vs_ea.tsv.tar.gz, as above but for the "we vs. ea" contrast.



allele_freqs_fst.gene_(rich|poor)_sequences.(at_vs_me|we_vs_ea).tar

The TSV files in these archives contain per-SNP estimates of allele frequencies and FST, along with SNP annotations. There are nine main fields/columns with overlapping/redundant information to accommodate flexible parsing. Large fields have nested subfields that are separated by "|" (first level) or ":" (second level).


  - name of sequence (e.g. "seq_s_1")

  - position of SNP (e.g. "448878")

  - reference allele (e.g. "A")

  - alternate allele (e.g. "G")

  - major column with FST value and allele frequency and other data for each population. It is described below.

  - type of SNP (e.g. intron, synonymous, missense, intergenic, ...) and label of associated gene (e.g. missense|REF_STRG_1_4_XLOC_012878)

  - FST tag and value (e.g. fst|0.0653)

  - region, type of SNP and gene label (e.g. region|missense|REF_STRG_1_4_XLOC_012878)

  - gene annotation derived from EnTAP annotations and Drosophila homologs, which are described below. Uses comma-separated sub-fields.



Subfields in field 5:

Example:

at/me:0.0653:148:1.0000:1.0000:1.0000|at,134,133.0000,1.0000,0.9925,0.0075|me,14,13.0000,1.0000,0.9286,0.0714

This field splits into three major subfields on "|": one about the pairwise comparison and two with metadata about each population.

1st subfield (at/me:0.0653:148:1.0000:1.0000:1.0000)


  - name of contrast (at/me)

  - FST of SNP (0.0653)

  - Sample size (148)

  - Proportion of observed data given overall sample size (1.0000),</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>