Text appearing in video provides semantic knowledge and significant information for video indexing and retrieval system. This paper proposes an effective method for text detection in video based on the similarity in stroke width of text (which is defined as the distance between two edges of a stroke). From the observation that text regions can be characterized by a dominant fixed stroke width, edge detection with local adaptive thresholds is first devised to keep text- while reducing background-regions. Second, morphological dilation operator with adaptive structuring element size determined by stroke width value is exploited to roughly localize text regions. Finally, to reduce false alarm and refine text location, a new multi-frame refinement method is applied. Experimental results show that the proposed method is not only robust to different levels of background complexity, but also effective to different fonts (size, color) and languages of text.