Most present localization algorithms are either range or vision-based. In many environments, only one type of sensor cannot often ensure successful localization; furthermore, using low-priced range sensors instead of expensive, but accurate, laser scanners often lead to poor performance. This paper proposes an MCL-based localization method that robustly estimates the robot pose with fusion of the range information from a low-cost IR scanner and the SIFT based visual information gathered using a mono camera. With sensor fusion, the rough pose estimation from range-based sensors is compensated by the vision-based sensors and slow object recognition can be overcome by the frequent update of the range information. In order to synchronize the two sensors with different bandwidths, the encoder information gathered during object recognition is exploited. This paper also suggests a method for evaluating localization performance that is based on the normalized probability of a vision sensor model. Various experiments show that the proposed algorithm can estimate the robot pose reasonably well and can accurately evaluate the localization performance.