INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES

Authors

  • Amer Al Badarneh Jordan University of Science and Technology, Irbid, Jordan
  • Mohammed Al Rudaini Jordan University of Science and Technology, Irbid, Jordan
  • Faisal Ali Jordan University of Science and Technology, Irbid, Jordan
  • Hassan Najadat Jordan University of Science and Technology, Irbid, Jordan

DOI:

https://doi.org/10.20319/mijst.2016.s21.200213

Keywords:

Hadoop, BigData, MapReduce, Join Algorithms, Indexing.

Abstract

Map Reduce stays an important method that deals with semi-structured or unstructured big data files, however, querying data mostly needs a Join procedure to accumulate the desired result from multiple huge files. Indexing in another hand, remains the best way to ease the access to a specific record(s) in a timely manner. In this paper, the authors are investigating the performance gain by implementing Map File indexing and Join algorithms together.

References

Adventure Works for SQL Server 2012. (2012, 3 12). Retrieved from http://msftdbprodsamples.codeplex.com/releases/view/55330

Apache™ Hadoop®. (n.d.). Retrieved from http://hadoop.apache.org/

Asad, S. (2015, Jan 29 ). Implementing Joins in Hadoop Map-Reduce. Retrieved from http://www.codeproject.com/Articles/869383/Implementing-Join-in-Hadoop-Map-Reduce

Asad, S. (2015, 3 16). Implementing Joins in Hadoop Map-Reduce using MapFiles. Retrieved from http://www.codeproject.com/Articles/887028/Implementing-Joins-in-Hadoop-Map-Reduce-using-MapF Class MapFile. (n.d.). Retrieved from https://hadoop.apache.org/docs/r2.6.2/api/org/apache/hadoop/io/MapFile.html

Cloudera.com. (2016). Retrieved from QuickStart Downloads for CDH 5.5: http://www.cloudera.com/downloads/quickstart_vms/5-5.html

Hadoop I/O: Sequence, Map, Set, Array, BloomMap Files. (2011). Retrieved from http://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommapfiles/

Khafagy, M. H. (2015). Indexed Map-Reduce Join Algorithm. International Journal of Scientific & Engineering Research, 6(5), 705-711.

Khetrapal, A., & Ganesh, V. (2006). HBase and Hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, 22-28.

Liu, W., Shen, Y., & Wang, P. (2016). An efficient MapReduce algorithm for similarity join in metric spaces. The Journal of Supercomputing, 72(3), 1179-1200.

Microsoft Windows 10 Pro. (2015). Retrieved from https://www.microsoft.com/enus/ windows/features.

Oracle© VM VirtualBox® V 5.0.16,”. (2016, 3 4). Retrieved from Oracle: https://www.virtualbox.org

Pigul, A. (2012). Comparative Study Parallel Join Algorithms for MapReduce environment. Труды Института системного программирования РАН, 23.

Prasad, B. R., & Agarwal, S. (2016). Comparative Study of Big Data Computing and Storage Tools.

Zhang, C., Li, J., & Wu, L. (2013). Optimizing Theta-Joins in a MapReduce Environment

International Journal of Database Theory and Application, 6(4), 91-107.

Downloads

Published

2016-12-19

How to Cite

Al Badarneh, A., Al-Rudaini, M., Ali, F., & Najadat, H. (2016). INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES. MATTER: International Journal of Science and Technology, 2(1), 200–213. https://doi.org/10.20319/mijst.2016.s21.200213

Most read articles by the same author(s)