Dataset with condition monitoring vibration data annotated with technical language, from paper machine industries in northern Sweden
https://doi.org/10.5878/hxc0-bd07
Labelled industry datasets are one of the most valuable assets in prognostics and health management (PHM) research. However, creating labelled industry datasets is both difficult and expensive, making publicly available industry datasets rare at best, in particular labelled datasets.
Recent studies have showcased that industry annotations can be used to train artificial intelligence models directly on industry data ( https://doi.org/10.36001/ijphm.2022.v13i2.3137Opens in a new tab , https://doi.org/10.36001/phmconf.2023.v15i1.3507Opens in a new tab ), but while many industry datasets also contain text descriptions or logbooks in the form of annotations and maintenance work orders, few, if any, are publicly available.
Therefore, we release a dataset consisting with annotated signal data from two large (80mx10mx10m) paper machines, from a Kraftliner production company in northern Sweden. The data consists of 21 090 pairs of signals and annotations from one year of production. The annotations are written in Swedish, by on-site Swedish experts, and the signals consist primarily of accelerometer vibration measurements from the two machines.
The dataset is structured as a Pandas dataframe and serialized as a pickle (.pkl) file and a JSON (.json) file. The first column (‘id’) is the ID of the samples; the second column (‘Spectra’) are the fast Fourier transform and envelope-transformed vibration signals; the third column (‘Notes’) are the associated annotations, mapped so that each annotation is associated with all signals from ten days before the annotation date, up to the annotation date; and finally the fourth column (‘Embeddings’) are pre-computed embeddings using Swedish SentenceBERT. Each row corresponds to a vibration measurement sample, though there is no distinction in this data between which sensor or machine part each measurement is from.
Citation and access
Citation and access
Data access level:
Creator/Principal investigator(s):
Research principal:
Principal's reference number:
- 2019-02533
Data contains personal data:
Yes
Type of personal data:
Signed annotations are preserved in the raw data. As a result, the dataset contains pseudonymised personal data.
Citation:
License:
Data collection - Recording
Data collection - Recording
Mode of collection:
Recording
Description of the mode of collection:
Vibration data collected through accelerometers (SKF IMx-system with CMSS sensors)
Data collector:
- Luleå University of Technology
Opens a new window at ror.org.
ROROpens in a new tab
Source of the data:
- Physical objects
Instrument
Instrument
Name:
SKF CMSS 2200
Description of the instrument:
https://www.skf.com/ph/productinfo/productid-CMSS%202200
Name:
SKF CMSS 2207
Description of the instrument:
https://www.skf.com/ph/productinfo/productid-CMSS%202207
Geographic coverage
Geographic coverage
Geographic location:
Administrative information
Administrative information
Responsible department/unit:
Department of Computer Science, Electrical and Space Engineering
Other research principals:
Contributor(s):
- Peter Wikström - SCA Munksund
- Kjell Lundberg - Smurfit Kappa
- Per-Erik Larsson - SKF (Sweden)
- Pär-Erik Martinsson - Luleå University of Technology - Department of Computer Science, Electrical and Space Engineering
- Håkan Sirkka - Smurfit Kappa
- Smurfit Kappa
Funding
Funding
Funding agency:
Award number:
2019-02533_Vinnova
Award title:
Kunskapsintegrering för klassificering av maskinskador
Funding information:
Knowledge integration for fault severity estimation
Funding agency:
- Luleå University of Technology
Opens a new window at ror.org.
ROROpens in a new tab
Topic and keywords
Topic and keywords
Standard för svensk indelning av forskningsämnen 2025:
Relations
Relations
Homepage:
Related research data:
Publications
Publications
Citation:
Löwenmark, K. (2023). Technical Language Supervision for Intelligent Fault Diagnosis [Licentiate thesis]. Luleå University of Technology.
ISBN:
9789180482547
Citation:
Löwenmark, K., Taal, C., Nivre, J., Liwicki, M., & Sandin, F. (2022). Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study. In Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022 (pp. 306–314).
Citation:
Löwenmark, K., Taal, C., Vurgaft, A., Liwicki, M., Nivre, J., & Sandin, F. (2023). Labelling of annotated condition monitoring data through technical language processing.
Citation:
Löwenmark, K., Taal, C., Schnabel, S., Liwicki, M., & Sandin, F. (2022). Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry. In International Journal of Prognostics and Health Management (Vol. 13, Issue 2).
Versions
Versions
Version:
2
Metadata corrected:
Updated level of accessibility to restricted access
Published:
Metadata
Metadata
Versions
Versions
Version:
2
Metadata corrected:
Updated level of accessibility to restricted access
Published:

Luleå University of Technology