Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden
https://doi.org/10.5878/hafd-ms27
This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title.
Data can be accessed in Python with:
import pandas as pd
annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl")
annotation_contents = annotations_df['noteComment']
annotation_titles = annotations_df['title']
Citation and access
Citation and access
Data access level:
Creator/Principal investigator(s):
Research principal:
Principal's reference number:
- 2019-02533
Data contains personal data:
Yes
Type of personal data:
Signed annotations are preserved in the raw data. As a result, the dataset contains pseudonymised personal data.
Citation:
Language:
Corpus
Corpus
Foreseen use:
NLP application, Human use
Text part
Text part
Linguality:
Monolingual
Language:
Swedish (swe)
Tekniskt Språk (Jargon)
:
Modality:
Written Language
Size:
Entries: 2385
Expressions: 1613
Annotation:
Entity Mentions
Automatic
Other
Original source:
Link to other media:
Method and outcome
Method and outcome
Time period(s) investigated:
Geographic coverage
Geographic coverage
Geographic location:
Geographic description:
Northern Sweden
Administrative information
Administrative information
Responsible department/unit:
Department of Computer Science, Electrical and Space Engineering
Contributor(s):
- Peter Wikström - SCA Munksund
- Håkan Sirkka - Smurfit Kappa
- Pär-Erik Martinsson - Luleå University of Technology - Department of Computer Science, Electrical and Space Engineering
- Kjell Lundberg - Smurfit Kappa
- Per-Erik Larsson - SKF (Sweden)
- Smurfit Kappa
Funding
Funding
Funding agency:
Award number:
2019-02533
Award title:
Kunskapsintegrering för klassificering av maskinskador
Funding information:
https://www.vinnova.se/p/kunskapsintegrering-for-klassificering-av-maskinskador/
Topic and keywords
Topic and keywords
Standard för svensk indelning av forskningsämnen 2025:
Relations
Relations
Homepage:
Related research data:
Publications
Publications
Citation:
Löwenmark, K., Taal, C., Nivre, J., Liwicki, M., & Sandin, F. (2022). Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study. In Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022 (pp. 306–314).
Metadata
Metadata
Version 1

Luleå University of Technology