INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES
DOI:
https://doi.org/10.20319/mijst.2016.s21.200213Keywords:
Hadoop, BigData, MapReduce, Join Algorithms, Indexing.Abstract
Map Reduce stays an important method that deals with semi-structured or unstructured big data files, however, querying data mostly needs a Join procedure to accumulate the desired result from multiple huge files. Indexing in another hand, remains the best way to ease the access to a specific record(s) in a timely manner. In this paper, the authors are investigating the performance gain by implementing Map File indexing and Join algorithms together.
References
Adventure Works for SQL Server 2012. (2012, 3 12). Retrieved from http://msftdbprodsamples.codeplex.com/releases/view/55330
Apache™ Hadoop®. (n.d.). Retrieved from http://hadoop.apache.org/
Asad, S. (2015, Jan 29 ). Implementing Joins in Hadoop Map-Reduce. Retrieved from http://www.codeproject.com/Articles/869383/Implementing-Join-in-Hadoop-Map-Reduce
Asad, S. (2015, 3 16). Implementing Joins in Hadoop Map-Reduce using MapFiles. Retrieved from http://www.codeproject.com/Articles/887028/Implementing-Joins-in-Hadoop-Map-Reduce-using-MapF Class MapFile. (n.d.). Retrieved from https://hadoop.apache.org/docs/r2.6.2/api/org/apache/hadoop/io/MapFile.html
Cloudera.com. (2016). Retrieved from QuickStart Downloads for CDH 5.5: http://www.cloudera.com/downloads/quickstart_vms/5-5.html
Hadoop I/O: Sequence, Map, Set, Array, BloomMap Files. (2011). Retrieved from http://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommapfiles/
Khafagy, M. H. (2015). Indexed Map-Reduce Join Algorithm. International Journal of Scientific & Engineering Research, 6(5), 705-711.
Khetrapal, A., & Ganesh, V. (2006). HBase and Hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, 22-28.
Liu, W., Shen, Y., & Wang, P. (2016). An efficient MapReduce algorithm for similarity join in metric spaces. The Journal of Supercomputing, 72(3), 1179-1200.
Microsoft Windows 10 Pro. (2015). Retrieved from https://www.microsoft.com/enus/ windows/features.
Oracle© VM VirtualBox® V 5.0.16,”. (2016, 3 4). Retrieved from Oracle: https://www.virtualbox.org
Pigul, A. (2012). Comparative Study Parallel Join Algorithms for MapReduce environment. Труды Института системного программирования РАН, 23.
Prasad, B. R., & Agarwal, S. (2016). Comparative Study of Big Data Computing and Storage Tools.
Zhang, C., Li, J., & Wu, L. (2013). Optimizing Theta-Joins in a MapReduce Environment
International Journal of Database Theory and Application, 6(4), 91-107.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2016 Authors
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright of Published Articles
Author(s) retain the article copyright and publishing rights without any restrictions.
All published work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.