Thesis Data Science: Incremental, non-parametric estimation of data distribution for a feedback data collection
- Subject:Data Science/Big Data
- Type:Master thesis
- Date:ab 02 / 2026
- Tutor:
Thesis Data Science: Incremental, non-parametric estimation of data distribution for a feedback data collection
Context
Modern vehicles generate very large amounts of data while driving (up to 2.5 GB/s). However, the complete storage, transfer and processing of this data is extremely expensive in practice. At the same time, this data is essential for data-driven methods such as machine learning. A central problem: vehicle data often follows long-tail distributions. A lot of data is redundant, while rare events (anomalies, corner cases) only occur very sporadically. A promising approach is therefore feedback data collection, in which data that has already been collected is continuously evaluated in order to assess new data during the journey according to its novelty value. The previous approach is based on parametric estimation using Gaussian distributions and evaluation via the analytically computable Mahalanobis distance. In this work, non-parametric methods are to be investigated as an alternative.
Tasks
Familiarization with feedback data collection for vehicle data
Research and selection of suitable non-parametric methods for density estimation
Integration of a method into a feedback data collection framework
Comparison of non-parametric vs. parametric methods for feedback data collection
Prerequisites
You work independently and in a structured manner, are motivated and committed.
Python knowledge
You have a very good command of written and spoken German and English
Knowledge of machine learning / statistics, ideally in streaming algorithms and anomaly detection / distribution estimation
