INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES

Amer Al Badarneh; Mohammed Al-Rudaini; Faisal Ali; Hassan Najadat

doi:10.20319/mijst.2016.s21.200213

Authors

Amer Al Badarneh Jordan University of Science and Technology, Irbid, Jordan
Mohammed Al Rudaini Jordan University of Science and Technology, Irbid, Jordan
Faisal Ali Jordan University of Science and Technology, Irbid, Jordan
Hassan Najadat Jordan University of Science and Technology, Irbid, Jordan

DOI:

https://doi.org/10.20319/mijst.2016.s21.200213

Keywords:

Hadoop, BigData, MapReduce, Join Algorithms, Indexing.

Abstract

Map Reduce stays an important method that deals with semi-structured or unstructured big data files, however, querying data mostly needs a Join procedure to accumulate the desired result from multiple huge files. Indexing in another hand, remains the best way to ease the access to a specific record(s) in a timely manner. In this paper, the authors are investigating the performance gain by implementing Map File indexing and Join algorithms together.

References

Adventure Works for SQL Server 2012. (2012, 3 12). Retrieved from http://msftdbprodsamples.codeplex.com/releases/view/55330

Apache™ Hadoop®. (n.d.). Retrieved from http://hadoop.apache.org/

Asad, S. (2015, Jan 29 ). Implementing Joins in Hadoop Map-Reduce. Retrieved from http://www.codeproject.com/Articles/869383/Implementing-Join-in-Hadoop-Map-Reduce

Asad, S. (2015, 3 16). Implementing Joins in Hadoop Map-Reduce using MapFiles. Retrieved from http://www.codeproject.com/Articles/887028/Implementing-Joins-in-Hadoop-Map-Reduce-using-MapF Class MapFile. (n.d.). Retrieved from https://hadoop.apache.org/docs/r2.6.2/api/org/apache/hadoop/io/MapFile.html

Cloudera.com. (2016). Retrieved from QuickStart Downloads for CDH 5.5: http://www.cloudera.com/downloads/quickstart_vms/5-5.html

Hadoop I/O: Sequence, Map, Set, Array, BloomMap Files. (2011). Retrieved from http://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommapfiles/

Khafagy, M. H. (2015). Indexed Map-Reduce Join Algorithm. International Journal of Scientific & Engineering Research, 6(5), 705-711.

Khetrapal, A., & Ganesh, V. (2006). HBase and Hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, 22-28.

Liu, W., Shen, Y., & Wang, P. (2016). An efficient MapReduce algorithm for similarity join in metric spaces. The Journal of Supercomputing, 72(3), 1179-1200.

Microsoft Windows 10 Pro. (2015). Retrieved from https://www.microsoft.com/enus/ windows/features.

Pigul, A. (2012). Comparative Study Parallel Join Algorithms for MapReduce environment. Труды Института системного программирования РАН, 23.

Prasad, B. R., & Agarwal, S. (2016). Comparative Study of Big Data Computing and Storage Tools.

Zhang, C., Li, J., & Wu, L. (2013). Optimizing Theta-Joins in a MapReduce Environment

International Journal of Database Theory and Application, 6(4), 91-107.

INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Submission