Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Smiling in the Face and Voice of Avatars and Robots
T2 - Evidence for a 'Smiling McGurk Effect'.
AU - Torre, Ilaria
AU - Holk, Simon
AU - Yadollahi, Elmira
AU - Leite, Iolanda
AU - McDonnell, Rachel
AU - Harte, Naomi
PY - 2024/5/31
Y1 - 2024/5/31
N2 - Multisensory integration influences emotional perception, as the McGurk effect demonstrates for the communication between humans. Human physiology implicitly links the production of visual features with other modes like the audio channel: Face muscles responsible for a smiling face also stretch the vocal cords that result in a characteristic smiling voice. For artificial agents capable of multimodal expression, this linkage is modeled explicitly. In our studies, we observe the influence of visual and audio channels on the perception of the agents’ emotional expression. We created videos of virtual characters and social robots either with matching or mismatching emotional expressions in the audio and visual channels. In two online studies, we measured the agents’ perceived valence and arousal. Our results consistently lend support to the ‘emotional McGurk effect’ hypothesis, according to which face transmits valence information, and voice transmits arousal. When dealing with dynamic virtual characters, visual information is enough to convey both valence and arousal, and thus audio expressivity need not be congruent. When dealing with robots with fixed facial expressions, however, both visual and audio information need to be present to convey the intended expression.
AB - Multisensory integration influences emotional perception, as the McGurk effect demonstrates for the communication between humans. Human physiology implicitly links the production of visual features with other modes like the audio channel: Face muscles responsible for a smiling face also stretch the vocal cords that result in a characteristic smiling voice. For artificial agents capable of multimodal expression, this linkage is modeled explicitly. In our studies, we observe the influence of visual and audio channels on the perception of the agents’ emotional expression. We created videos of virtual characters and social robots either with matching or mismatching emotional expressions in the audio and visual channels. In two online studies, we measured the agents’ perceived valence and arousal. Our results consistently lend support to the ‘emotional McGurk effect’ hypothesis, according to which face transmits valence information, and voice transmits arousal. When dealing with dynamic virtual characters, visual information is enough to convey both valence and arousal, and thus audio expressivity need not be congruent. When dealing with robots with fixed facial expressions, however, both visual and audio information need to be present to convey the intended expression.
U2 - 10.1109/TAFFC.2022.3213269
DO - 10.1109/TAFFC.2022.3213269
M3 - Journal article
VL - 15
SP - 393
EP - 404
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 2
ER -