Thesis Data Science: Investigation of the anomaly detection performance of a feedback-based data selection in an automotive context
- Subject:Data Science/Big Data
- Type:Master thesis
- Date:ab 02 / 2026
- Tutor:
Thesis Data Science: Investigation of the anomaly detection performance of a feedback-based data selection in an automotive context
Context
Modern vehicles generate very large amounts of data while driving (up to 2.5 GB/s). Since the complete storage and transmission of this data is extremely expensive, methods for intelligent data selection are increasingly being used to reduce redundant data and at the same time preserve relevant rare content (corner cases).
One promising approach is feedback data collection1This involves continuously building a model of already stored data, which evaluates new incoming data according to its novelty value and decides whether it should be stored.
In addition to diversity and redundancy reduction, the recognition of rare events, such as anomalies, out-of-distribution (OOD) situations or corner cases, plays a central role. In high-dimensional systems, however, the definition of "anomaly" is not trivial. Anomalies can be categorized as weak/strong and trivial/non-trivial. The aim of this thesis is to investigate whether and how feedback data collection can be used specifically for anomaly detection and what limitations arise in the process.
Tasks
Familiarization with feedback data collection for vehicle data
Familiarization with the state of the art of anomaly and OOD detection in a high-dimensional context
Derivation of an evaluation concept including suitable metrics
Investigation of how feedback data collection behaves as an anomaly detector
Derive recommendations on whether and how feedback systems are suitable for robust anomaly detection in vehicle data
Requirements
You work independently and in a structured manner, are motivated and committed.
Python knowledge
You have a very good command of written and spoken German and English
Knowledge of machine learning / statistics, ideally in streaming algorithms and anomaly detection / distribution estimation
