This paper presents a low area and energy efficient hardware accelerator for the deep convolutional neural networks (CNNs). Based on the multiply-accumulate (MAC) based architecture, three design techniques are proposed to reduce the hardware cost of the convolutional computations. First, to reduce the computational bit-width of convolutions, an adaptive bit-width reduction scheme is proposed based on differential input method. The bit-width reduction approach can reduce the 37 % of operation bit-width with almost ignorable CNN accuracy degradation. Second, it has been found that adapting bi-directional filtering window in CNN accelerator can considerably reduce the energy for data movement with much smaller number of memory accesses. To expedite the bi-directional filtering operations, we also propose a bidirectional first-input-first-output (bi-FIFO). With SRAM bit-cell layout manner, the proposed bi-FIFO facilitates fast data re-distribution with area and energy efficiency. To verify the effectiveness of the proposed techniques, the AlexNet accelerator has been designed. The numerical results show that the proposed adaptive bit-width reduction scheme achieves 25.9% and 47.3% of area and energy savings, respectively. The bi-FIFO based accelerator also achieves 33 % improved processing time.