Qualification Type: | PhD |
---|---|
Location: | Nottingham |
Funding for: | UK Students |
Funding amount: | Annual tax-free stipend based on the UKRI rate (currently £20,780), Home tuition fee, and £3000 p.a. Research Training Support Grant. |
Hours: | Full Time |
Placed On: | 7th April 2025 |
---|---|
Closes: | 5th May 2025 |
Our project:
Using facial and vocal tract dynamics to improve speech comprehension in noise
Around 18 million individuals in the UK are estimated to have hearing loss, including over half of the population aged 55 or more. Hearing aids are the most common intervention for hearing loss, however, one in five people who should wear hearing aids do not and a failure to comprehend speech in noisy situations is one of the most common complaints of hearing aid users. Difficulties with communication negatively impact quality of life and can lead to social isolation, depression and problems with maintaining employment.
Clearly, there is a growing need to make hearing aids more attractive and one route to achieve this is to enable users to better understand speech in noisy situations. Facial speech offers a previously untapped source of information which is immune from auditory noise. Importantly, auditory includes sounds that are spectrally like the target voice, such as competing voices, which are particularly challenging for noise reduction algorithms currently employed in hearing aids. With multimodal hearing aids, which capture a speaker’s face using a video camera, already in development, it is now vital that we establish how to use facial speech information to augment hearing aid function.
What you will do
This PhD project offers the opportunity to explore how the face and voice are linked to a common source (the vocal tract) with the aim of predicting speech from the face alone or combined with noisy audio speech. You will work with a state-of-the-art multimodal speech dataset, recorded in Nottingham, in which facial video and voice audio have been recorded simultaneously with real-time magnetic resonance imaging of the vocal tract (for examples, see https://doi.org/10.1073/pnas.2006192117). You will use a variety of analytical methods including principal components analysis and machine-learning to model the covariation of the face, voice and vocal tract during speech.
Who would be suitable for this project?
This project would equally suit a computational student with an interest in applied research or a psychology/neuroscience student with an interest in developing skills in programming, sophisticated analysis and AI. You should have a degree in Psychology, Neuroscience, Computer Science, Maths, Physics or a related area. You should have experience of programming and a strong interest in speech and machine learning.
Supervisors: Dr Chris Scholes (School of Psychology), Dr Joy Egede (School of Computer Science), Prof Alan Johnston (School of Psychology).
For further details and to arrange an interview please contact Dr Chris Scholes (School of Psychology)
Start date - 1st October 2025
Type / Role:
Subject Area(s):
Location(s):