Global tractography estimates brain connectivity by determining the optimal configuration of signal-generating fiber segments that best describes the measured diffusion-weighted data, promising better stability than local greedy methods with respect to imaging noise. However, global tractography is computationally very demanding and requires computation times that are often prohibitive for clinical applications. We present here a reformulation of the global tractography algorithm for fast parallel implementation amendable to acceleration using multicore CPUs and general-purpose GPUs. Our method is motivated by the key observation that each fiber segment is affected by a limited spatial neighborhood. That is, a fiber segment is influenced only by the fiber segments that are (or can potentially be) connected to its both ends and also by the diffusion-weighted signal in its proximity. This observation makes it possible to parallelize the Markov chain Monte Carlo (MCMC) algorithm used in the global tractography algorithm so that updating of independent fiber segments can be done concurrently. The experiments show that the proposed algorithm can significantly speed up global tractography, while at the same time maintain or improve tractography performance.