Despite the recent success of 3D human reconstruction methods, recovering the accurate and smooth 3D human motion from video is still challenging. Designing a temporal model in the encoding stage is not sufficient enough to settle the trade-off problem between the per-frame accuracy and the motion smoothness. To address this problem, we approach some of the fundamental problems of 3D reconstruction tasks, simultaneously predicting 3D pose and 3D motion dynamics. First, we utilize the power of uncertainty to address the problem of multiple 3D configurations resulting in the same 2D projections. Second, we confirmed that dividing the body into local regions shows outstanding results for estimating 3D motion dynamics. In this paper, we propose (i) an encoder that makes two different estimations: a static feature that presents 2D pose feature as distribution and a dynamic feature that includes optical flow information and (ii) a decoder that divides the body into five different local regions to estimate the 3D motion dynamics of each region. We demonstrate how our method recovers the accurate and smooth motion and achieves the state-of-the-art results for both constrained and in-the-wild videos.