<ddi:DDIInstance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:instance:3_3 http://ddialliance.org/Specification/DDI-Lifecycle/3.3/XMLSchema/instance.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ddi="ddi:instance:3_3" xmlns:r="ddi:reusable:3_3" xmlns:s="ddi:studyunit:3_3" xmlns:d="ddi:datacollection:3_3" xmlns:a="ddi:archive:3_3" xmlns:c="ddi:conceptualcomponent:3_3" xmlns:cm="ddi:comparative:3_3" xmlns:g="ddi:group:3_3" xmlns:l="ddi:logicalproduct:3_3" xmlns:p="ddi:physicaldataproduct:3_3" xmlns:pi="ddi:physicalinstance:3_3" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xml="http://www.w3.org/XML/1998/namespace" isMaintainable="true" scopeOfUniqueness="Agency">
  <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678:0</r:URN>
  <r:Agency>SND</r:Agency>
  <r:ID>doi-10-17044-scilifelab-28211678</r:ID>
  <r:Version>0</r:Version>
  <g:ResourcePackage>
    <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.ResourcePackage:2.0</r:URN>
    <r:OtherMaterialScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.OtherMaterialScheme:2.0</r:URN>
    </r:OtherMaterialScheme>
    <a:OrganizationScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.OrganizationScheme-0:2.0</r:URN>
      <a:Individual>
        <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.Individual-0:2.0</r:URN>
        <r:UserAttributePair>
          <r:AttributeKey>affiliation</r:AttributeKey>
          <r:AttributeValue>Science for Life Laboratory</r:AttributeValue>
        </r:UserAttributePair>
        <a:IndividualIdentification>
          <a:IndividualName>
            <a:FirstGiven>Daniel</a:FirstGiven>
            <a:LastFamily>Lundin</a:LastFamily>
            <a:FullName>
              <r:String>Daniel Lundin</r:String>
            </a:FullName>
          </a:IndividualName>
          <a:ResearcherID>
            <a:TypeOfID>ORCID</a:TypeOfID>
            <a:ResearcherIdentification>0000-0002-8779-6464</a:ResearcherIdentification>
          </a:ResearcherID>
        </a:IndividualIdentification>
      </a:Individual>
    </a:OrganizationScheme>
  </g:ResourcePackage>
  <s:StudyUnit>
    <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.StudyUnit:2.0</r:URN>
    <r:UserID typeOfUserID="datasetIdentifier">doi-10-17044-scilifelab-28211678</r:UserID>
    <r:Citation>
      <r:Title>
        <r:String xml:lang="en">nf-core/metatdenovo taxonomy</r:String>
      </r:Title>
      <r:Creator>
        <r:CreatorReference>
          <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.Individual-0:2.0</r:URN>
          <r:TypeOfObject>Individual</r:TypeOfObject>
        </r:CreatorReference>
      </r:Creator>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Linnéuniversitetet</r:String>
          <r:String xml:lang="en">Linnaeus University</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Linnéuniversitetet</r:String>
          <r:String xml:lang="en">Linnaeus University</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:PublicationDate>
        <r:SimpleDate>2025-02-24</r:SimpleDate>
      </r:PublicationDate>
      <r:InternationalIdentifier>
        <r:IdentifierContent>10.17044/SCILIFELAB.28211678</r:IdentifierContent>
        <r:ManagingAgency controlledVocabularyAgencyName="DOI">DOI</r:ManagingAgency>
      </r:InternationalIdentifier>
    </r:Citation>
    <r:Abstract>
      <r:Content xml:lang="en">The data in this repository can be used to assign taxonomy to sequences with Diamond [Buchfink et al. 2015], particularly using the --diamond_dbs parameter in nf-core/metatdenovo (https://nf-co.re/metatdenovo) , release 1.1 or later.Currently, the data available represents species-representative genomes from the Genome Taxonomy Database (GTDB), release R09-RS220 [Parks et al. 2018].

File preparationAll species-representative genomes from GTDB were downloaded from the National Center for Biotechnology Information (NCBI) and annotated with Prokka [v. 1.14.6; Seemann 2014], and the sequences for all resulting proteins were used for this data. The taxonomy dump files (in NCBI taxonomy dump format) were created from the GTDB metadata with TaxonKit [v. 0.18.0; Shen and Ren 2021] and the Diamond database with Diamond [v. 2.1.10; Buchfink et al. 2015] in "taxonomy mode", i.e. using the taxonomy dump created with TaxonKit. (See below for commands used.)

File descriptionsThere are five files:

- gtdb-r220.faa.gz: Fasta file with protein sequences. Not used by nf-core/metatdenovo but can be used to create the Diamond database below.
- gtdb-r220.taxonomy.dmnd: Diamond database with taxonomy information.
- gtdb-r220.names.dmp: Taxonomy dump file.
- gtdb-r220.nodes.dmp: Nodes dump file.
- gtdb-r220.seqid2taxid.tsv.gz: Mapping from protein accession to taxon.
The Diamond database and taxonomy dump files can be used with nf-core/metatdenovo (Version &gt;1.1) by providing a csv file like below to the --diamond_dbs parameter. (Although Nextflow can use https-urls for paths, it is usually better to download the very large files and keep local copies.)

db,dmnd_path,taxdump_names,taxdump_nodes,ranks,parse_with_taxdump

gtdb,gtdb_r220_repr.dmnd,gtdb_taxdump/names.dmp,gtdb_taxdump/nodes.dmp,domain;phylum;class;order;genus;species;strain,

Commands used to prepare taxonomy dump files and the Diamond database- Taxonomy dump: cut -f 1,19-20 *metadata.tsv | grep -v 'accession' | awk 'BEGIN { FS="\t" } { if ( $2 == "t" ) { print $1 "\t" $3 } }' | taxonkit create-taxdump --gtdb -O .
- Diamond database: gunzip -c gtdb-r220.faa.gz | sed '/^&gt;/s/ .*//' | diamond makedb --taxonmap gtdb-r220.seqid2taxid.tsv.gz --taxonnames gtdb-r220.names.dmp --taxonnodes gtdb-r220.nodes.dmp --db gtdb-r220.taxonomy.dmnd --no-parse-seqids
Revision history20250211 First version</r:Content>
    </r:Abstract>
    <r:Coverage>
      <r:TopicalCoverage>
        <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.TopicalCoverage:2.0</r:URN>
        <r:Subject xml:lang="en" controlledVocabularyID="3" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Medical and Health Sciences</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="3" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Medicin och hälsovetenskap</r:Subject>
        <r:Subject xml:lang="en" controlledVocabularyID="106" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Biological Sciences</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="106" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Biologi</r:Subject>
        <r:Subject xml:lang="en" controlledVocabularyID="10611" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Ecology</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="10611" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Ekologi</r:Subject>
        <r:Subject xml:lang="en" controlledVocabularyID="10609" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Genetics and Genomics</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="10609" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Genetik och genomik</r:Subject>
      </r:TopicalCoverage>
      <r:SpatialCoverage />
    </r:Coverage>
    <a:Archive>
      <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.Archive:2.0</r:URN>
      <a:ArchiveSpecific>
        <a:Item>
          <a:Access>
            <r:URN>urn:ddi:se.researchdata:doi-10-17044-scilifelab-28211678.Archive-ArchiveSpecificType-AccessType:2.0</r:URN>
            <a:TypeOfAccess controlledVocabularyName="info:eu-repo-Access-Terms vocabulary"></a:TypeOfAccess>
          </a:Access>
          <a:DataFileQuantity>0</a:DataFileQuantity>
        </a:Item>
      </a:ArchiveSpecific>
    </a:Archive>
  </s:StudyUnit>
</ddi:DDIInstance>