SURVEY OF SIMILARITY JOIN ALGORITHMS BASED ON MAPREDUCE

Authors

  • Amer Al-Badarneh Computer Information System Department, Jordan University of Science and Technology, Irbid, Jordan
  • Amnah Al-Abdi Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
  • Sana’a Al-Shboul Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
  • Hassan Najadat Computer Information System Department, Jordan University of Science and Technology, Irbid, Jordan

DOI:

https://doi.org/10.20319/mijst.2016.s21.214234

Keywords:

Hadoop, MapReduce, Similarity Join

Abstract

Similarity Join is a data processing and analysis operation that retrieves all data pairs whose their distance is less than a pre-defined threshold. The similarity join algorithms are used in different real world applications such as finding similarity in documents, images, and strings. In this survey we will explain some of the similarity join algorithms which are based on MapReduce approach. These algorithms are: Set-Similarity Join, SSJ-2R, MRSimJoin, Pair-wise similarity, multi-sig-er method, Trie-join, and PreJoin algorithm. We then make a comparison between these algorithms according to some criteria and discuss the results.

References

Baraglia, R., Morales, G. D. F., & Lucchese, C. (2010, December). Document similarity self-join with MapReduce. In 2010 IEEE International Conference on Data Mining (pp. 731-736). IEEE.

Gouda, K, & Rashad M (2012, May). Prejoin: An efficient trie-based string similarity join algorithm. In Informatics and Systems (INFOS), 2012 8th International Conference on (pp. DE-37). IEEE.

Kolb, L., Thor, A., & Rahm, E. (2013, June). Don't match twice: redundancy-free similarity computation with MapReduce. In Proceedings of the Second Workshop on Data Analytics in the Cloud (pp. 1-5). ACM.

Pang, J., Gu, Y., Xu, J., Bao, Y., & Yu, G. (2014, June). Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce. In International Conference on Web-Age Information Management (pp. 415-418). Springer International Publishing.

Silva, Y. N., & Reed, J. M. (2012, May). Exploiting MapReduce-based similarity joins. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (pp. 693-696). ACM.

Silva, Y. N., Reed, J. M., & Tsosie, L. M. (2012, August). MapReduce-based similarity join for metric spaces. In Proceedings of the 1st International Workshop on Cloud Intelligence (p. 3). ACM. Vernica, R., Carey, M. J., & Li, C. (2010, June). Efficient parallel set-similarity joins using MapReduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 495-506). ACM.

Wang, J., Feng, J., & Li, G. (2010). Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. Proceedings of the VLDB Endowment, 3(1-2), 1219-1230.

Yan, C., Song, Y., Wang, J., & Guo, W. (2015, May). Eliminating the Redundancy in MapReduce-based Entity Resolution. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on (pp. 1233-1236). IEEE.

Downloads

Published

2016-12-19

How to Cite

Al-Badarneh, A., Al-Abdi, A., Al-Shboul, S., & Najadat, H. (2016). SURVEY OF SIMILARITY JOIN ALGORITHMS BASED ON MAPREDUCE. MATTER: International Journal of Science and Technology, 2(1), 214–234. https://doi.org/10.20319/mijst.2016.s21.214234

Most read articles by the same author(s)