Integrating Gaze and Speech for Enabling Implicit Interactions

Computing and Communications

Text available via DOI:

https://doi.org/10.1145/3491102.3502134
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Anam Ahmad Khan
Joshua Newn
James Bailey
Eduardo Velloso

More...

Publication date	29/04/2022
Host publication	CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
Place of Publication	New York
Publisher	ACM
Pages	1-14
Number of pages	14
ISBN (electronic)	9781450391573
<mark>Original language</mark>	English

Abstract

Gaze and speech are rich contextual sources of information that, when combined, can result in effective and rich multimodal interactions. This paper proposes a machine learning-based pipeline that leverages and combines users’ natural gaze activity, the semantic knowledge from their vocal utterances and the synchronicity between gaze and speech data to facilitate users’ interaction. We evaluated our proposed approach on an existing dataset, which involved 32 participants recording voice notes while reading an academic paper. Using a Logistic Regression classifier, we demonstrate that our proposed multimodal approach maps voice notes with accurate text passages with an average F1-Score of 0.90. Our proposed pipeline motivates the design of multimodal interfaces that combines natural gaze and speech patterns to enable robust interactions.

Research

Links

Text available via DOI:

Integrating Gaze and Speech for Enabling Implicit Interactions

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us