<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">8. Ecological genomics of the Northern krill: Recombination rates and demographic history</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-22825277-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.22825277</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.22825277">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">8. Ecological genomics of the Northern krill: Recombination rates and demographic history</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-22825277-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.22825277</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Wallberg, Andreas</AuthEnty>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Unneberg, Per</AuthEnty>
      </rspStmt>
      <prodStmt>
        <grantNo xml:lang="en" agency="Swedish Research Council for Environment Agricultural Sciences and Spatial Planning">2017-00413_Formas</grantNo>
      </prodStmt>
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-03-27" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-03-27" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.22825277">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">This item contains archives of data and results used to assess recombination rates (iSMC), demographic history (PSMC, MSMC) and haplotype ages (GEVA) using coalescent methods.

Population definitionsPopulation definitions are the same as desribed in a different item:


  - "at vs. me" = Atlantic Ocean samples (n=67) vs. the Mediterranean (i.e. Barcelona) samples (n=7).

  - "we vs. ea" = South-West North Atlantic Ocean (n=20) vs. North-East North Atlantic Ocean (n=47). In files using this contrast, sometimes the label "wa" is used instead of "we" for the South-West North Atlantic Ocean samples.



Contents:


  - psmc_dataset.psmcfa.gz, datasets for PSMC-analyses containing signatures of heterozygosity in the reference specimen that were converted from VCF into the fasta-like PSMCFA format.

  - msmc_datasets.tar.gz, datasets for MSMC-analyses containing signatures of heterozygosity in the reference specimen that were converted from VCF into TSV.

  - ismc_dataset.tar.gz, the VCF dataset and accessory files for iSMC-analyses used to infer recombination rates.

  - geva_datasets.candidates.at_vs_me.tar.gz, the re-coded VCF and binary format datasets as well as analysis output for the 660 candidate gene loci analyzed for "at" and "me" populations in the "at vs. me" contrast.

  - geva_datasets.candidates.we_vs_ea.tar.gz, the re-coded VCF and binary format datasets as well as analysis output for the 34 candidate gene loci analyzed for "we" and "ea" populations in the "we vs. ea" contrast.

  - geva_results.candidates.at_vs_me.tar.gz, the resulting age estimates of minor alleles in the "at vs. me" contrast.

  - geva_results.candidates.we_vs_ea.tar.gz, the resulting age estimates of minor alleles in the "we vs. ea" contrast.



psmc_dataset.psmcfa.gz

A FASTA-like file that encodes the distribution of heterozygous genotypes across 4,911 sequences in the diploid reference specimen at the 10 bp window resolution. Character states are:


  - N=a window with only inaccessible sites (i.e. missing data)

  - T=a window with accessible data

  - K=a window with accessible data and at least one heterozygous genotype



This format is further documented on the site of the original tool: https://github.com/lh3/psmc

msmc_datasets.tar.gz

This archive contains one TSV file per sequence (n=5,176) that specify the distribution of heterozygous genotypes. It countains four fields. Example: seq_s_1	2039	171	TC


  - name of sequence

  - position of the heterozygous genotype

  - number of accessible sites since the last heterozygous genotype

  - the heterozygous genotype (only two a string with alleles in this case when analysing a single individual)



This format is further documented on the site of the original tool: https://github.com/stschiff/msmc-tools/blob/master/msmc-tutorial/guide.md

ismc_dataset.tar.gz

This archive contains several files:


  - 1.merged_contigs.vcf = specifies the distribution of heterozygous genotypes in VCF format

  - 1.merged_contigs.tab = specifies the lengths of sequences (TSV format)

  - 1.merged_contigs.bpp = the program control file with run-time parameters (TXT)

  - 1.merged_contigs.fasta = specifies accessible and inaccessible sites ("N") in FASTA format

  - 1.merged_contigs.out_estimates.txt = the summary results of the analysis (TXT)



geva_datasets.candidates.at_vs_me.tar.gz and geva_datasets.candidates.we_vs_ea.tar.gz

These archives hold data and results from analysing variant ages at each of the 660 or 34 candidate gene loci with divergent haplotypes in each of the two contrasts. For each locus, the files span:


  - Two recoded VCF files. In the first file, the minor allele in one of the two populations (e.g. "at") was taken to represent the derived allele and coded as the ALT allele. In the second file, the minor allele in the other group (e.g. "me") was taken to represent the derived allele and coded as the ALT allele.

  - Intermediate data files generated by GEVA by processing the VCF files (*.bin, *.marker.txt, *.sample.txt), including a log and err file.

  - Results files (*.pairs.txt.gz and *.sites.txt). The "*.sites.txt" contain allele age estimates under mutation clock (M), recombination clock (R), and joint clock models (J). The format of these files are described on site of the original tool: https://github.com/pkalbers/geva



geva_results.candidates.at_vs_me.tar.gz and geva_results.candidates.we_vs_ea.tar.gz

These archives contains four TSV files each. For each population (e.g. "at") there are two files. One of them collects all minor allele age estimates under all three models and the other only for the joint model.</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>