In this paper, we present a new class of entropy-regularized Markov decision processes (MDPs), which will be referred to as Tsallis MDPs. that inherently generalize well-known maximum entropy reinforcement learning (RL) by introducing an additional real-valued parameter called an entropic index. Our theoretical result enables us to derive and analyze different types of optimal policies with interesting properties relate to the stochasticity of the optimal policy by controlling the entropic index. To handle complex and model-free problems, such as learning a controller for a soft mobile robot, we propose a Tsallis actor-critic (TAC) method. We first observe that different RL problems have different desirable entropic indices where using proper entropic index results in superior performance compared to the state-of-the-art actor-critic methods. To mitigate the exhaustive search of the entropic index, we propose a quick-and-dirty curriculum method of gradually increasing the entropic index which will be referred to as TAC with Curricula (TAC2 ). TAC2 shows comparable performance to TAC with the optimal entropic index. Finally, We apply TAC2 to learn a controller of a soft mobile robot where TAC2 outperforms existing actor-critic methods in terms of both convergence speed and utility.