Virtual characters are a potentially valuable tool for creating stimuli for research investigating the perception of emotion. We conducted an audio-visual experiment to investigate the effectiveness of our stimuli to convey the intended emotion. We used dynamic virtual faces in addition to pre-recorded (Burkhardt et al, 2005, Interspeech'2005, 1517–1520) and synthesized speech to create audio-visual stimuli which conveyed all possible combinations of stimuli. Each voice and face stimuli aimed to express one of seven different emotional categories. The participants made judgments of the prevalent emotion. For the pre-recorded voice, the vocalized emotion influenced participants’ emotion judgment more than the facial expression. However, for the synthesized voice, facial expression influenced participants’ emotion judgment more than vocalized emotion. While participants rather accurately labeled (>76%) the stimuli when face and voice emotion were the same, they performed worse overall on correctly identifying the stimuli when the voice was synthesized. We further analyzed the difference between the emotional categories in each stimulus and found that valence distance in the emotion of the face and voice significantly impacted recognition of the emotion judgment for both natural and synthesized voices. This experimental design provides a method to improve virtual character emotional expression.