Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Multimodal Biometric Human Recognition for Perceptual Human-Computer Interaction
AU - Jiang, Richard M.
AU - Sadka, Abdul H.
AU - Crookes, Danny
PY - 2010/11
Y1 - 2010/11
N2 - In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human-computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
AB - In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human-computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
KW - Laplacian Eigenmap
KW - low-level feature fusion
KW - multimodal biometrics
KW - perceptual human-computer interaction (HCI)
KW - speaker recognition
U2 - 10.1109/TSMCC.2010.2050476
DO - 10.1109/TSMCC.2010.2050476
M3 - Journal article
VL - 40
SP - 676
EP - 681
JO - IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
JF - IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
SN - 1094-6977
IS - 6
ER -