Home > Research > Publications & Outputs > Using speech synthesis to explain automatic spe...

Associated organisational unit

View graph of relations

Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Published

Standard

Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech. / Brown, Georgina; Kirchhübel, Christin; Cuthbert, Ramiz.
2023. 4723-4727 Paper presented at Interspeech 2023, Dublin, Ireland.

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Harvard

Brown, G, Kirchhübel, C & Cuthbert, R 2023, 'Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech', Paper presented at Interspeech 2023, Dublin, Ireland, 20/08/23 - 24/08/23 pp. 4723-4727. https://doi.org/10.21437/Interspeech.2023-1013

APA

Brown, G., Kirchhübel, C., & Cuthbert, R. (2023). Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech. 4723-4727. Paper presented at Interspeech 2023, Dublin, Ireland. https://doi.org/10.21437/Interspeech.2023-1013

Vancouver

Brown G, Kirchhübel C, Cuthbert R. Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech. 2023. Paper presented at Interspeech 2023, Dublin, Ireland. doi: 10.21437/Interspeech.2023-1013

Author

Brown, Georgina ; Kirchhübel, Christin ; Cuthbert, Ramiz. / Using speech synthesis to explain automatic speaker recognition : a new application of synthetic speech. Paper presented at Interspeech 2023, Dublin, Ireland.5 p.

Bibtex

@conference{0520bda5e6e844b9a860f78be12026c9,
title = "Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech",
abstract = "Some speech synthesis systems make use of zero-shot adaptation to generate speech based on a target speaker. These systems produce speaker embeddings in the same way that speaker embeddings (often called 'x-vectors') are produced in automatic speaker recognition systems. This commonality between the two technologies could lower barriers that constrain the use of automatic speaker recognition systems in forensic speech analysis casework. A key barrier to the use of automatic speaker recognition in the forensic context is the issue of explainability, including what information about the voice a system uses in order to arrive at conclusions. This paper sets out a new approach that could be used to effectively communicate this type of information to audiences in the legal setting. Specifically, it is proposed that exposing listeners to synthetic speech produced by a zero-shot adaptation system could illustrate what aspects of the voice an automatic speaker recognition system captures.",
author = "Georgina Brown and Christin Kirchh{\"u}bel and Ramiz Cuthbert",
year = "2023",
month = aug,
day = "20",
doi = "10.21437/Interspeech.2023-1013",
language = "English",
pages = "4723--4727",
note = "Interspeech 2023 ; Conference date: 20-08-2023 Through 24-08-2023",
url = "https://interspeech2023.org/",

}

RIS

TY - CONF

T1 - Using speech synthesis to explain automatic speaker recognition

T2 - Interspeech 2023

AU - Brown, Georgina

AU - Kirchhübel, Christin

AU - Cuthbert, Ramiz

N1 - Conference code: 24th

PY - 2023/8/20

Y1 - 2023/8/20

N2 - Some speech synthesis systems make use of zero-shot adaptation to generate speech based on a target speaker. These systems produce speaker embeddings in the same way that speaker embeddings (often called 'x-vectors') are produced in automatic speaker recognition systems. This commonality between the two technologies could lower barriers that constrain the use of automatic speaker recognition systems in forensic speech analysis casework. A key barrier to the use of automatic speaker recognition in the forensic context is the issue of explainability, including what information about the voice a system uses in order to arrive at conclusions. This paper sets out a new approach that could be used to effectively communicate this type of information to audiences in the legal setting. Specifically, it is proposed that exposing listeners to synthetic speech produced by a zero-shot adaptation system could illustrate what aspects of the voice an automatic speaker recognition system captures.

AB - Some speech synthesis systems make use of zero-shot adaptation to generate speech based on a target speaker. These systems produce speaker embeddings in the same way that speaker embeddings (often called 'x-vectors') are produced in automatic speaker recognition systems. This commonality between the two technologies could lower barriers that constrain the use of automatic speaker recognition systems in forensic speech analysis casework. A key barrier to the use of automatic speaker recognition in the forensic context is the issue of explainability, including what information about the voice a system uses in order to arrive at conclusions. This paper sets out a new approach that could be used to effectively communicate this type of information to audiences in the legal setting. Specifically, it is proposed that exposing listeners to synthetic speech produced by a zero-shot adaptation system could illustrate what aspects of the voice an automatic speaker recognition system captures.

U2 - 10.21437/Interspeech.2023-1013

DO - 10.21437/Interspeech.2023-1013

M3 - Conference paper

SP - 4723

EP - 4727

Y2 - 20 August 2023 through 24 August 2023

ER -