Abstract—In this paper we adopted the similarity upper approximation based clustering of web logs using various similarity/distance metrics. The paper shows the viability of our methodology. Web logs capture the information about web sites as well the sequence of the visit. Sequence of visit provides an important insight about the behavior of the user. Rough set, a soft computing technique, deals with vagueness present in data. It captures the indiscernibility at different levels of granularity. The paper has shown the results on msnbc data set with different similarity measures along with explanation of results.
Index Terms—Clustering, sequential data, similarity upper approximation.
Authors are with Information Technology & Systems Department, Indian Institute of Management Lucknow, IIM Lucknow Uttar Pradesh- India (e-mail: rajhans@ iiml.org; pradeepkumar@iiml.ac.in).
Cite: Rajhans Mishra and Pradeep Kumar, "Clustering Web Logs Using Similarity Upper Approximation with Different Similarity Measures," International Journal of Machine Learning and Computing vol. 2, no. 3, pp. 219-221, 2012.