<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">SH information from UNITE databases</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-19411403-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.19411403</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.19411403">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">SH information from UNITE databases</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-19411403-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.19411403</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Tångrot, Jeanette</AuthEnty>
      </rspStmt>
      <prodStmt>
        <grantNo xml:lang="en" agency="Swedish Research Council">2019-00242_VR</grantNo>
      </prodStmt>
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-11-19" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-11-19" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.19411403">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">The data in this repository is the result of querying the PlutoF API (https://plutof.docs.apiary.io, Abarenkov et al 2010) with all sequence names in the UNITE general FASTA release (https://doi.org/10.15156/BIO/2959332; Abarenkov et al. 2024), in order to find the sequence hypothesis (SH) at level 1.5, version 10, for each sequence, resulting in a sequence-to-SH matching file (*.seq2SH.tsv). For each SH, the complete taxonomy is extracted from PlutoF by querying the PlutoF API, and stored in the *.SH.tax files.

Files are available for UNITE version 10.0; sh_general_release_dynamic_04.04.2024.seq2sh.tsv.bz2 containing sequence to SH matchings, and sh_general_release_dynamic_04.04.2024.SHs.tax.bz2 containing SH taxonomies. Corresponding files are also available for the all eukaryotes version of the UNITE database (https://doi.org/10.15156/BIO/2959334; Abarenkov et al 2024b). All files are tab separated text files compressed with bzip2.

Assignment of species hypothesis to ITS amplicons using this data and the UNITE general FASTA release is available as an optional argument to the nf-core/ampliseq Nextflow workflow from version 2.3.2: `--addsh` together with `--dada_ref_taxonomy unite-fungi` or `--dada_ref_taxonomy unite-alleuk` (https://nf-co.re/ampliseq; Straub et al. 2020).

Generation of files
After download and file extraction of the UNITE general FASTA release, each sequence name in the fasta file was used as query to PlutoF to find which SH at level 1.5 in release 10 the sequence belongs to, in order to generate the *.seq2sh.tsv files with sequence to SH matchings. Each SH was subsequently used as query to PlutoF to extract the complete taxonomy for the SH, stored in the *.SHs.tax files.
Two python scripts for automatic querying and generation of the files can be found in the `scripts` folder in the GitHub repo: https://github.com/biodiversitydata-se/unite-shinfo. See the accompanying README file for usage information.</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>