<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">SMHI IFCB Plankton Image Reference Library</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-25883455-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.25883455</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.25883455">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">SMHI IFCB Plankton Image Reference Library</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-25883455-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.25883455</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Torstensson, Anders</AuthEnty>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Karlson, Bengt</AuthEnty>
      </rspStmt>
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-05-31" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-05-31" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.25883455">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">This repository includes manually annotated plankton images by phytoplankton experts at the Swedish Meteorological and Hydrological Institute (SMHI). The images were captured using an Imaging FlowCytobot (IFCB, McLane Research Laboratories (https://mclanelabs.com/imaging-flowcytobot/) ) from different locations and seasons in the Skagerrak, Kattegat, and Baltic Proper. These images can be used for training automatic image classifiers to identify various plankton species. 

From version 6 onward, the images have been consolidated into a single dataset, combining three previously separate sources: RV Svea (Baltic Proper, 2022–2026), RV Svea (Skagerrak–Kattegat, 2022–2026), and Tångesund (2016). Previous versions are still accessible in this repository.

The dataset consists of two ZIP archives. The first, annotated_images, contains .png images organized into class-specific subfolders, along with accompanying .tsv files that store image-level and class metadata. The second, matlab_files, includes raw data files (.roi, .hdr, .adc) as well as .mat files intended for developing a random forest image classifier using MATLAB code from the ifcb-analysis repository.

The images in this dataset undergo continuous quality control, and new images are regularly added. Consequently, this dataset will be updated on a regular basis. If you find any mislabeled images, please contact the authors.

Version history

- Version 6 (2026-03-31): 86,232 annotated images. The three datasets in the previous versions has been merged into a single dataset.
- Version 5 (2025-12-19): 82,123 annotated images.
- Version 4 (2024-11-04): 76,032 annotated images. Corrected class names to better match WoRMS, and continued quality control of images in the Tångesund dataset.
- Version 3 (2024-08-05): 72,086 annotated images. Added iRfcb dataset for user and unit testing.
- Version 2 (2024-06-03): 71,525 annotated images. Updated class names and corrected manual files in the Tångesund dataset. Continued quality control of images in the Tångesund dataset.
- Version 1 (2024-05-31): 65,435 annotated images</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>