Depth sensors open up possibilities of dealing with the human action recognition problem by providing 3D human skeleton data and depth images of the scene. Analysis of hu- man actions based on 3D skeleton data has become popular recently, due to its robustness and view-invariant represen- tation. However, the skeleton alone is insufficient to distin- guish actions which involve human-object interactions. In this paper, we propose a deep model which efficiently mod- els human-object interactions and intra-class variations un- der viewpoint changes. First, a human body-part model is introduced to transfer the depth appearances of body-parts to a shared view-invariant space. Second, an end-to-end learning framework is proposed which is able to effectively combine the view-invariant body-part representation from skeletal and depth images, and learn the relations between the human body-parts and the environmental objects, the interactions between different human body-parts, and the temporal structure of human actions. We have evaluated the performance of our proposed model against 15 existing techniques on two large benchmark human action recogni- tion datasets including NTU RGB+D and UWA3DII. The Experimental results show that our technique provides a significant improvement over state-of-the-art methods. 1.