Real-time speech analysis for research and medicine

Mann mit Symbolen von Auge, Ohr, Lautsprecher, Fragezeichen und Sprechblasen um ihn herum.

Context

Audio and speech signals contain a wide range of information about a person's inner state. In addition to the content of speech, voice color, intonation, pauses, volume and other acoustic characteristics reflect emotional reactions, stress, cognitive strain and mental stress. Such audio-based markers are becoming increasingly important in psychological research, as they provide insights into emotional processes and mental states - objectively, continuously and without interfering with natural communication situations.

To exploit this potential, robust and modular AI methods are needed that uniquely identify speakers, reliably extract emotional and stress-related features from audio data and effectively filter noise signals. In this thesis, you will develop and evaluate building blocks for a flexible audio analysis pipeline that addresses precisely these tasks. You will compare modern AI methods, implement your own prototype modules and contribute to making speech signals accessible as a valuable data source for psychological research - towards a data-driven, objective analysis of human emotion and stress.

Tasks

We determine the focus together. Possible tasks may include
You will conduct a literature review on current methods in speaker verification, emotion recognition or audio analysis.
You will develop and adapt AI methods for individual modules of the pipeline.
You will integrate and test various components in an overall system.
You support the data collection, processing and analysis of audio recordings.
You will design and carry out evaluation studies.
You will assess technical feasibility and make recommendations for future implementation steps and possible user studies.

Requirements

You are studying electrical engineering, computer science, data science, mechanical engineering or a related course of study or have the relevant knowledge.
You are interested in audio analysis, AI, signal processing and/or human-machine interaction
You have good programming skills (Python) and experience with AI frameworks (PyTorch, TensorFlow or similar).
You like to work independently, are structured and show initiative.
You have very good written and spoken German and English skills.