As video information proliferates, managing video sources becomes increasingly important. Indices must be constructed to allow any future retrieval. We distinguish two categories of indexing: (i) those that are general-purpose and do not make use of domain-specific knowledge, and (ii) those that are application-dependent. In this paper, we present our work in both categories within the VideoBook project. We discuss how to structure video data into shots (physical parts) and clusters (semantic parts). A video partitioning algorithm is described. Its effectiveness and efficiency lies in the use of both statistical and spatial information in the images without, however, having to examine the entire images. To improve the querying efficiency, we propose to investigate in two directions: deriving higher-level indices through classification and providing a method that finds targets of interest through interactive learning. The first technique takes advantage of domain knowledge of underlying applications. The second technique accounts for quantification effect and noise in images and accommodates “learning from negative examples”, resulting into quite good discriminating power. Experimental results are given to demonstrate the effectiveness of our approach.