Skip to main content
Researchdata.se

Dataset with condition monitoring vibration data annotated with technical language, from paper machine industries in northern Sweden

Annotated condition monitoring data for technical language processing and supervision
https://doi.org/10.5878/hxc0-bd07

Labelled industry datasets are one of the most valuable assets in prognostics and health management (PHM) research. However, creating labelled industry datasets is both difficult and expensive, making publicly available industry datasets rare at best, in particular labelled datasets. Recent studies have showcased that industry annotations can be used to train artificial intelligence models directly on industry data ( https://doi.org/10.36001/ijphm.2022.v13i2.3137Opens in a new tab , https://doi.org/10.36001/phmconf.2023.v15i1.3507Opens in a new tab ), but while many industry datasets also contain text descriptions or logbooks in the form of annotations and maintenance work orders, few, if any, are publicly available. Therefore, we release a dataset consisting with annotated signal data from two large (80mx10mx10m) paper machines, from a Kraftliner production company in northern Sweden. The data consists of 21 090 pairs of signals and annotations from one year of production. The annotations are written in Swedish, by on-site Swedish experts, and the signals consist primarily of accelerometer vibration measurements from the two machines. The dataset is structured as a Pandas dataframe and serialized as a pickle (.pkl) file and a JSON (.json) file. The first column (‘id’) is the ID of the samples; the second column (‘Spectra’) are the fast Fourier transform and envelope-transformed vibration signals; the third column (‘Notes’) are the associated annotations, mapped so that each annotation is associated with all signals from ten days before the annotation date, up to the annotation date; and finally the fourth column (‘Embeddings’) are pre-computed embeddings using Swedish SentenceBERT. Each row corresponds to a vibration measurement sample, though there is no distinction in this data between which sensor or machine part each measurement is from.

Citation and access

Method and outcome

Data collection Recording

Geographic coverage

Administrative information

Topic and keywords

Relations

Publications

Contact

Metadata

Versions

doris
Luleå University of Technology