---------------------------------------------------------------------------------------------------------------------------------------------- TITLE : A Fast Filtering Scheme for Large Database Cleansing ABSTRACT Existing data cleansing methods are costly and will take very long time to cleanse large databases. Since large databases are common nowadays, it is necessary to reduce the cleansing time. Data cleansing consists of two main components, detection method and comparison method. In this paper, we first propose a simple and fast comparison method, TI-Similarity, which reduces the time for each comparison. Based on TI-Similarity, we propose a new detection method, RAR, to further reduce the number of comparisons. With RAR and TI-Similarity, our new approach for cleansing large databases is composed of two processes: Filtering process and Pruning process. In filtering process, a fast scan on the database is carried out with RAR and TI-Similarity. This process guarantees the detection of potential duplicate records but may introduce false positives. In pruning process, the duplicate result from the filtering process is pruned to eliminate the false positives using more trustworthy comparison methods. The performance study shows that our approach is efficient and scalable for cleansing large databases, and is about an order of magnitude faster than existing cleansing methods. ABOUT THE SPEAKER --------------------------------- Prof. Sam Y. Sung received B.Sc. from National Taiwan University in 1973, M.Sc and Ph.D in computer science from University of Minnesota , in 1979 and 1983, respectively. He was with University of Oklahoma and University of Memphis in USA, before joining the University of Singapore in 1989. His research interests include information retrievals, data mining, pictorial databases and mobile computing. He has published extensively in various conferences and journals, including IEEE Transaction on Software Engineering, IEEE Transaction on Knowledge & Data Engineering, etc.