This paper presents a novel method for estimating 3D human body pose from stereo image sequences based on top-down learning. Human body pose is represented by a linear combination of prototypes of 2D depth images and their corresponding 3D body models in terms of the position of a predetermined set of joints. With a 2D depth image, we can estimate optimal coefficients for a linear combination of prototypes of the 2D depth images by solving least square minimization. The 3D body model of the input depth image is obtained by applying the estimated coefficients to the corresponding 3D body model of prototypes. In the learning stage, the proposed method is hierarchically constructed by classifying the training data into several clusters with a silhouette images and a depth images recursively. Also, in the estimating stage, the proposed method hierarchically estimates 3D human body pose with a silhouette image and a depth image. The experimental results show that our method can be efficient and effective for estimating 3D human body pose.