This paper investigates how the object tracking performance is affected by the fusion quality of videos from visible (VIZ) and infrared (IR) surveillance cameras, as compared to tracking in single modality videos. The videos have been fused using the simple averaging, and various multiresolution techniques. Tracking has been accomplished by means of a particle filter using colour and edge cues. The highest tracking accuracy has been obtained in IR sequences, whereas the VIZ video was affected by many artifacts and showed the worst tracking performance. Among the fused videos, the complex wavelet and the averaging techniques, offered the best tracking performance, comparable to that of IR. Thus, of all the methods investigated, the fused videos, containing complementary contextual information from both single modality input videos, are the best source for further analysis by a human observer or a computer program.