Abstract—Accuracy and consistency are the most important factors in any databases but increasing size of data has become a great challenge in this area. Detecting duplicate records is an important and very difficult process in huge databases containing millions of records. Field matching is a major process for duplicated record detection. In this paper, an attempt is made to provide a brief survey of field matching techniques and their efficiency.
Index Terms—Duplicate detection, character based similarity metrics, edit distance, Jaro distance, Q-Grams.
Mohammad Reza Feizi Derakhshi is with Department of Computer, University of Tabriz, Tabriz, Iran (e-mail: mfeizi@tabrizu.ac.ir)
Mahsa Sabbagh Nobarian is with Department of Computer, Islamic Azad University, Shabestar Branch, Shabestar, Iran (e-mail:msn.sabbagh@yahoo.com)
Cite: Mahsa Sabbagh Nobarian and Mohammad Reza Feizi Derakhshi, "The Review of Fields Similarity Estimation Methods," International Journal of Machine Learning and Computing vol. 2, no. 5, pp. 614-617, 2012.