Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and Blastobotrys malaysiensis
This dataset contains the gene annotation data for three species of Blastobotrys yeats: B. mokoenaii, B. illinoisensis, and B. malaysiensis. The genome assemblies for B. mokoenaii (NRRL Y-27120) and B. malaysiensis (NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively. The genome assembly for B. illinoisensis (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1. File description- bmokoenaii_annotation.gff This file contains the gene models predicted for B. mokoenaii (GCA_003705765.3). - billinoisensis_annotation.gff This file contains the gene models predicted for B. illinoisensis (GCA_003705765.3). - bmalaysiensis_annotation.gff This file contains the gene models predicted for B. malaysiensis (GCA_030558815.1). Gene annotation methodsRepeat MaskingPrior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5. $ RepeatModeler -database ${DB} -engine ncbi -pa 16 $ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fasta Structural Annotation Structural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11Öppnas i en ny tabb). $ braker.pl --genome="$genome" \ --prot_seq=${protein} --workingdir=${PWD} \ --gff3 --threads=16 --verbosity=3 \ --nocleanup --species=${i} Functional Annotation The predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) functional_annotation nextflow pipeline v2.0.0 (https://github.com/NBISweden/pipelines-nextflowÖppnas i en ny tabb). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0. tRNAs and rRNAs Transfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0. $ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta $ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gff Annotation integrationFinnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0. $ agat_sp_complement_annotations.pl --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff
Citering och åtkomst
Citering och åtkomst
Administrativ information
Administrativ information
Ämnesområde och nyckelord
Ämnesområde och nyckelord
Metadata
Metadata
