Thesis Data Science: Robustness analysis of a feedback-based online data selection for live use in the vehicle
- Subject:Data Science/Big Data
- Type:Master thesis
- Date:ab 02 / 2026
- Tutor:
Thesis Data Science: Robustness analysis of a feedback-based online data selection for live use in the vehicle
Context
Modern vehicles generate very large amounts of data (up to 2.5 GB/s) while driving. In practice, the complete storage, transmission and processing of this data is cost-intensive and often not scalable. At the same time, this data is essential for data-driven methods such as machine learning, and since vehicle data typically follows a long-tail distribution, many data points are redundant, while rare situations (corner cases, anomalies) only occur very rarely. A feedback-driven data collection addresses this problem by using a model based on already collected data to evaluate the novelty value of new data in order to store diverse and informative data and avoid redundancy. The approach currently used is based on the assumption of a Gaussian distributed input data stream. However, this assumption is often violated in real driving data (e.g. temporal redundancies, non-stationary distributions, multi-modality). In this work, the robustness of the feedback mechanism is therefore to be systematically investigated and improved.
Tasks
Familiarization with feedback data collection for vehicle data
Research of robustness approaches and evaluation of their transferability to a feedback data collection
Integration of a method into a feedback data collection framework
Comparison of methods for feedback data collection
Prerequisites
You work independently and in a structured manner, are motivated and committed.
Python knowledge
You have a very good command of written and spoken German and English
Knowledge of machine learning / statistics, ideally in streaming algorithms and anomaly detection / distribution estimation
