Human detection has received great attention during the past few decades, which is yet still a challenging problem. In this paper, we focus on the problem of 3-D human detection, i.e., finding the human bodies and determining their 3-D coordinates in complex 3-D space using depth data only. Since the traditional sliding-window-based approaches for target localization are time-consuming and the recent deep-learning-based object detectors generate too many region proposals, we propose to utilize the candidate head-top locating stage to efficiently and quickly find the plausible head-top locations. In the second stage, we propose a Depth map, Multiorder depth template, and Height difference map representation encoding three channels of information for each candidate region to utilize the neural network pretrained on large-scale well-annotated datasets to classify the candidate regions. We evaluate our method on four publicly available challenging datasets. Extensive experimental results demonstrate that the proposed method is superior to the state-of-the-art methods while achieving real-time performance