Home > Research > Publications & Outputs > Using speech synthesis to explain automatic spe...

Associated organisational unit

View graph of relations

Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Published
Close
Publication date20/08/2023
Number of pages5
Pages4723-4727
<mark>Original language</mark>English
EventInterspeech 2023 - Convention Centre , Dublin, Ireland
Duration: 20/08/202324/08/2023
Conference number: 24th
https://interspeech2023.org/

Conference

ConferenceInterspeech 2023
Country/TerritoryIreland
CityDublin
Period20/08/2324/08/23
Internet address

Abstract

Some speech synthesis systems make use of zero-shot adaptation to generate speech based on a target speaker. These systems produce speaker embeddings in the same way that speaker embeddings (often called 'x-vectors') are produced in automatic speaker recognition systems. This commonality between the two technologies could lower barriers that constrain the use of automatic speaker recognition systems in forensic speech analysis casework. A key barrier to the use of automatic speaker recognition in the forensic context is the issue of explainability, including what information about the voice a system uses in order to arrive at conclusions. This paper sets out a new approach that could be used to effectively communicate this type of information to audiences in the legal setting. Specifically, it is proposed that exposing listeners to synthetic speech produced by a zero-shot adaptation system could illustrate what aspects of the voice an automatic speaker recognition system captures.