Final published version
Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review
Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review
}
TY - CONF
T1 - Using speech synthesis to explain automatic speaker recognition
T2 - Interspeech 2023
AU - Brown, Georgina
AU - Kirchhübel, Christin
AU - Cuthbert, Ramiz
N1 - Conference code: 24th
PY - 2023/8/20
Y1 - 2023/8/20
N2 - Some speech synthesis systems make use of zero-shot adaptation to generate speech based on a target speaker. These systems produce speaker embeddings in the same way that speaker embeddings (often called 'x-vectors') are produced in automatic speaker recognition systems. This commonality between the two technologies could lower barriers that constrain the use of automatic speaker recognition systems in forensic speech analysis casework. A key barrier to the use of automatic speaker recognition in the forensic context is the issue of explainability, including what information about the voice a system uses in order to arrive at conclusions. This paper sets out a new approach that could be used to effectively communicate this type of information to audiences in the legal setting. Specifically, it is proposed that exposing listeners to synthetic speech produced by a zero-shot adaptation system could illustrate what aspects of the voice an automatic speaker recognition system captures.
AB - Some speech synthesis systems make use of zero-shot adaptation to generate speech based on a target speaker. These systems produce speaker embeddings in the same way that speaker embeddings (often called 'x-vectors') are produced in automatic speaker recognition systems. This commonality between the two technologies could lower barriers that constrain the use of automatic speaker recognition systems in forensic speech analysis casework. A key barrier to the use of automatic speaker recognition in the forensic context is the issue of explainability, including what information about the voice a system uses in order to arrive at conclusions. This paper sets out a new approach that could be used to effectively communicate this type of information to audiences in the legal setting. Specifically, it is proposed that exposing listeners to synthetic speech produced by a zero-shot adaptation system could illustrate what aspects of the voice an automatic speaker recognition system captures.
U2 - 10.21437/Interspeech.2023-1013
DO - 10.21437/Interspeech.2023-1013
M3 - Conference paper
SP - 4723
EP - 4727
Y2 - 20 August 2023 through 24 August 2023
ER -