Behavioural phenotyping of drosphila is an important means in biological and medical research to identify genetic, pathologic or psychologic impact on animal behviour. Automated behavioural phenotyping from videos has been a desired capability that can waive long-time boring manual work in behavioral analysis. In this paper, we introduced deep learning into this challenging topic, and proposed a new 2D+3D hybrid CNN framework for drosphila’s social behavioural phenotyping. In the proposed multitask learning framework, action detection and localization of drosphila jointly is carried out with action classification, and a given video is divided into clips with fixed length. Each clip is fed into the system and a 2-D CNN is applied to extract features at frame level. Features extracted from adjacent frames are then connected and fed into a 3-D CNN with a spatial region proposal layer for classification. In such a 2D+3D hybrid framework, drosophila detection at the frame level enables the action analysis at different durations instead of a fixed period. We tested our framework with different base layers and classification architectures and validated the proposed 3D CNN based social behavioral phenotyping framework under various models, detectors and classifiers.