Accurate tooth labeling on 3D dental surfaces is a vital task in computer-aided orthodontic treatment planning. Existing automated or semi-automated methods usually require human interactions, which is time-consuming. Also, they typically use simple geometric properties as the criteria for segmentation, which cannot well handle the high variation of tooth appearance across different patients. Recently, several pioneering deep neural networks (e.g., PointNet) have been proposed in the computer vision and computer graphics communities to efficiently segment 3D shapes in an end-to-end manner. However, these methods do not perform well in our specific task of tooth labeling, especially considering that they cannot explicitly model fine-grained local geometric context of teeth (although only a small portion of dental surfaces but with different shapes and appearances). In this paper, we propose a specific deep neural network (called MeshSNet) for end-to-end tooth segmentation on 3D dental surfaces captured by advanced intraoral scanners. Using directly raw mesh data as input, our MeshSNet adopts novel graph-constrained learning modules to hierarchically extract multi-scale contextual features, and then densely integrates local-to-global geometric features to comprehensively characterize mesh cells for the segmentation task. We evaluated our proposed method on an in-house clinic dataset via 3-fold cross-validation. The experimental results demonstrate the superior performance of our MeshSNet method, compared with the state-of-the-art deep learning methods for 3D shape segmentation.