Integrating Gaze and Speech for Enabling Implicit Interactions

Computing and Communications

Text available via DOI:

https://doi.org/10.1145/3491102.3502134
Final published version

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Integrating Gaze and Speech for Enabling Implicit Interactions. / Khan, Anam Ahmad; Newn, Joshua; Bailey, James et al.
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2022. p. 1-14 349.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Khan, AA, Newn, J, Bailey, J & Velloso, E 2022, Integrating Gaze and Speech for Enabling Implicit Interactions. in CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems., 349, ACM, New York, pp. 1-14. https://doi.org/10.1145/3491102.3502134

APA

Khan, A. A., Newn, J., Bailey, J., & Velloso, E. (2022). Integrating Gaze and Speech for Enabling Implicit Interactions. In CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1-14). Article 349 ACM. https://doi.org/10.1145/3491102.3502134

Vancouver

Khan AA, Newn J, Bailey J, Velloso E. Integrating Gaze and Speech for Enabling Implicit Interactions. In CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. New York: ACM. 2022. p. 1-14. 349 doi: 10.1145/3491102.3502134

Author

Khan, Anam Ahmad ; Newn, Joshua ; Bailey, James et al. / Integrating Gaze and Speech for Enabling Implicit Interactions. CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. New York : ACM, 2022. pp. 1-14

Bibtex

@inproceedings{0b229bb5491448c8a1f6f690b42b8e8d,

title = "Integrating Gaze and Speech for Enabling Implicit Interactions",

abstract = "Gaze and speech are rich contextual sources of information that, when combined, can result in effective and rich multimodal interactions. This paper proposes a machine learning-based pipeline that leverages and combines users{\textquoteright} natural gaze activity, the semantic knowledge from their vocal utterances and the synchronicity between gaze and speech data to facilitate users{\textquoteright} interaction. We evaluated our proposed approach on an existing dataset, which involved 32 participants recording voice notes while reading an academic paper. Using a Logistic Regression classifier, we demonstrate that our proposed multimodal approach maps voice notes with accurate text passages with an average F1-Score of 0.90. Our proposed pipeline motivates the design of multimodal interfaces that combines natural gaze and speech patterns to enable robust interactions.",

author = "Khan, {Anam Ahmad} and Joshua Newn and James Bailey and Eduardo Velloso",

year = "2022",

month = apr,

day = "29",

doi = "10.1145/3491102.3502134",

language = "English",

pages = "1--14",

booktitle = "CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems",

publisher = "ACM",

}

RIS

TY - GEN

T1 - Integrating Gaze and Speech for Enabling Implicit Interactions

AU - Khan, Anam Ahmad

AU - Newn, Joshua

AU - Bailey, James

AU - Velloso, Eduardo

PY - 2022/4/29

Y1 - 2022/4/29

N2 - Gaze and speech are rich contextual sources of information that, when combined, can result in effective and rich multimodal interactions. This paper proposes a machine learning-based pipeline that leverages and combines users’ natural gaze activity, the semantic knowledge from their vocal utterances and the synchronicity between gaze and speech data to facilitate users’ interaction. We evaluated our proposed approach on an existing dataset, which involved 32 participants recording voice notes while reading an academic paper. Using a Logistic Regression classifier, we demonstrate that our proposed multimodal approach maps voice notes with accurate text passages with an average F1-Score of 0.90. Our proposed pipeline motivates the design of multimodal interfaces that combines natural gaze and speech patterns to enable robust interactions.

AB - Gaze and speech are rich contextual sources of information that, when combined, can result in effective and rich multimodal interactions. This paper proposes a machine learning-based pipeline that leverages and combines users’ natural gaze activity, the semantic knowledge from their vocal utterances and the synchronicity between gaze and speech data to facilitate users’ interaction. We evaluated our proposed approach on an existing dataset, which involved 32 participants recording voice notes while reading an academic paper. Using a Logistic Regression classifier, we demonstrate that our proposed multimodal approach maps voice notes with accurate text passages with an average F1-Score of 0.90. Our proposed pipeline motivates the design of multimodal interfaces that combines natural gaze and speech patterns to enable robust interactions.

U2 - 10.1145/3491102.3502134

DO - 10.1145/3491102.3502134

M3 - Conference contribution/Paper

SP - 1

EP - 14

BT - CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

PB - ACM

CY - New York

ER -

Research

Links

Text available via DOI: