Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden

Karl Löwenmark; Fredrik Sandin; Marcus Liwicki; Stephan Schnabel

doi:10.5878/hafd-ms27

Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden

https://doi.org/10.5878/hafd-ms27

This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title. Data can be accessed in Python with: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']

Download documentation (1 files / 73.1 KiB)

Documentation files

dataset_description.pdf
73.1 KiB
Download: dataset_description.pdf

Citation and access

Data access level:

Access to data is restricted

Creator/Principal investigator(s):

Research principal:

Luleå University of Technology
Opens a new window at ror.org.
ROR

Principal's reference number:

2019-02533

Data contains personal data:

Yes

Type of personal data:

Signed annotations are preserved in the raw data. As a result, the dataset contains pseudonymised personal data.

Citation:

License:

Creative Commons Attribution 4.0 International (CC BY 4.0)

Language:

Swedish

Corpus

Foreseen use:

NLP application, Human use

Text part

Linguality:

Monolingual

Language:

Swedish (swe)
Tekniskt Språk (Jargon)
:

Modality:

Written Language

Size:

Entries: 2385

Expressions: 1613

Annotation:

Entity Mentions
Automatic
Other

Original source:

https://doi.org/10.5878/z34p-qj52

Link to other media:

Method and outcome

Time period(s) investigated:

2018 - 2022

Data format/data structure:

Geographic coverage

Geographic location:

Sweden

Geographic description:

Northern Sweden

Administrative information

Responsible department/unit:

Department of Computer Science, Electrical and Space Engineering

Contributor(s):

Peter Wikström - SCA Munksund
Håkan Sirkka - Smurfit Kappa
Pär-Erik Martinsson - Luleå University of Technology - Department of Computer Science, Electrical and Space Engineering
Kjell Lundberg - Smurfit Kappa
Per-Erik Larsson - SKF (Sweden)

Funding

Funding agency:

VINNOVA
Opens a new window at ror.org.
ROR

Award number:

2019-02533

Award title:

Kunskapsintegrering för klassificering av maskinskador

Funding information:

https://www.vinnova.se/p/kunskapsintegrering-for-klassificering-av-maskinskador/

Topic and keywords

Swedish Standard Classification of Research Subjects 2025:

Natural language processing

Keywords:

Relations

Homepage:

Related research data:

Publications

Citation:

Löwenmark, K., Taal, C., Nivre, J., Liwicki, M., & Sandin, F. (2022). Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study. In Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022 (pp. 306–314).

DOI:
10.36001/phme.2022.v7i1.3356
URN:
urn:nbn:se:ltu:diva-95407

SwePub:

oai:DiVA.org:ltu-95407

Metadata

Version 1

Luleå University of Technology

Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden

Documentation files

Citation and access

Data access level:

Creator/​Principal investigator(s):

Research principal:

Principal's reference number:

Data contains personal data:

Type of personal data:

Citation:

License:

Language:

Corpus

Foreseen use:

Text part

Linguality:

Language:

Modality:

Size:

Annotation:

Original source:

Link to other media:

Method and outcome

Time period(s) investigated:

Data format/​data structure:

Geographic coverage

Geographic location:

Geographic description:

Administrative information

Responsible department/​unit:

Contributor(s):

Funding

Funding agency:

Award number:

Award title:

Funding information:

Topic and keywords

Swedish Standard Classification of Research Subjects 2025:

Keywords:

Relations

Homepage:

Related research data:

Publications

Citation:

DOI:

URN:

SwePub:

Metadata

Creator/Principal investigator(s):

Data format/data structure:

Responsible department/unit: