A new approach to autonomously detect and track a moving object in a video captured by a moving camera (possibly mounted on a unmanned vehicle, UxV) is proposed in this paper. It is based on a combination of the recently introduced recursive density estimation (RDE) approach and the well-known scale invariant feature transformation (SIFT). The new approach involves building a model of the background using RDE in video sequences captured by a moving camera. RDE was robust in many videos with moving background in the absence of image registration (pixel position alignment). The output of RDE is a cluster of foreground pixels which can be associated with the object of interest. After the moving object is detected, the foreground pixels are enclosed in a rectangular region of interest (ROI). The approximate size and location of the rectangular region is then sent to the object tracking algorithm. The tracking algorithm uses the rectangular search area to detect and match SIFT keypoints across successive video frames. If and when the tracking fails, the RDE algorithm is started again to detect the moving object. The proposed algorithm does not require any human involvement and it operates in real-time. The tracking algorithm is also computationally efficient because only a small ROI is processed in each video frame. In the future we aim to substitute the SIFT approach with speeded-up robust features (SURF) for higher accuracy in tracking and for faster processing speed. Additionally, the case of multiple objects can be addressed using clustering in the spatial domain and is a subject of current research.