In multi-hop wireless networks, the number of supportable VoIP calls can be surprisingly small due to the increased spatial interference. To mitigate the interference, voice frame aggregation can be used. In this paper, we depart from the traditional approaches that perform aggregation at the voice source, and propose a technique called the Self-Controlled Frame Aggregation (SCFA) that runs at wireless routers. The core idea of SCFA is to let the congestion itself control the degree of aggregation. Unlike existing frame aggregation approaches, SCFA does not incur fixed delay cost, since it is used only when and by exactly as much as it is needed. In this paper, we take the example of 802.11-based multi-hop network to show the impact of SCFA, since many emerging multi-hop networks are built on the 802.11 technology. The result shows that SCFA on 802.11-based multi-hop network can boost the number of calls approximately twofold, or extends the hop distance threefold for a given number of calls to carry.