RDF DATABASES – CASE STUDY AND PERFORMANCE EVALUATION

Authors

  • Tony Nacional School of Business and Technology Webster University Thailand, Bangkok, Thailand
  • Marko Niinimaki School of Business and Technology Webster University Thailand, Bangkok, Thailand
  • Matti Heikkurinen PROCESS, Ludwig-Maximilians-Universität, Munich, Germany

DOI:

https://doi.org/10.20319/mijst.2019.53.0114

Keywords:

RDF, Database, noSQL, Benchmarking, Big Data, Query Performance

Abstract

The Resource Description Framework (RDF) data presentation model and the SPARQL query language have been the core of the semantic web technologies since the early 2000’s. In this article, we evaluate three RDF storage technologies. Our motivation is to find a storage solution that can be used to process “big data” RDF sets. Our method is based on measuring query response times with large samples (hundreds of thousands of RDF documents, millions of RDF statements). We find that all the proposed technologies provide much better performance than querying RDF data stored in files. However, with 300 000 documents, even with the fastest technology, an aggregation query still lasts more than 100 seconds in our environment. As a further performance improvement, we test the same data and queries with MongoDB, demonstrate its performance (10 seconds instead of 100) and scalability (up to 1000 000 documents). However, despite its benefits we must note that because of its data presentation and query limitations, MongoDB probably cannot serve as a generic storage for all kinds of RDF documents.

References

Agrawal, D., El Abbadi, A., Das, S., & Elmore, A. J. (2011). Database scalability, elasticity, and autonomy in the cloud. International Conference on Database Systems for Advanced Applications (pp. 2-15). Springer. https://doi.org/10.1007/978-3-642-20149-3_2

Arenas, M., Gutierrez, C., & Pérez, J. (2009). Foundations of RDF databases. Reasoning Web International Summer School (pp. 158-204). Heidelberg: Springer. https://doi.org/10.1007/978-3-642-03754-2_4

Banker, K. (2011). MongoDB in action. Manning Publications.

Becker, C. (2008). RDF Store Benchmarks with DBpedia. Berlin: Freie Universitat Berlin.

Botoeva, E., Calvanese, D., Cogrel, B., & Xiao, G. (2018). Expressivity and complexity of MongoDB queries. 21st International Conference on Database Theory. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. https://doi.org/10.3233/IA-190023

Broekstra, J., Kampman, A., & Van Harmelen, F. (2002). Sesame: A generic architecture for storing and querying RDF and RDF schema. Proc. 1st International semantic web conference (pp. 54-68). Sardinia: Springer. https://doi.org/10.1007/3-540-48005-6_7

Donohoe, P., Sherman, J., & Mistry, A. (2015). The Long Road to JATS. Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015.

Faye, D. C., & Curé, O. B. (2012). A survey of RDF storage approaches. Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, 15, (pp 11-35).

Hartig, O., & Pérez, J. (2017). An initial analysis of Facebook’s GraphQL language. AMW 2017 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web. Montevideo.

Levandoski, J. J., & Mokbel, M. F. (2009). RDF data-centric storage. 2009 IEEE International Conference on Web Services (pp. 911-918). IEEE. https://doi.org/10.1109/ICWS.2009.49

Miller, L., Seaborne, A., & Reggior, A. (2002). Three implementations of SquishQL, a simple RDF query language. Proc. International Semantic Web Conference. Heidelberg, Germany. https://doi.org/10.1007/3-540-48005-6_36

Morsey, M., Lehmann, J., Auer, S., & Ngomo, A. (2009). DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data. Proc. International semantic web conference 2011, (pp. 1-24). Springer

Niinimaki, M., & Niemi, T. (2009). An ETL process for OLAP using RDF/OWL ontologies. Journal of Data Semantics, XIII, 97-119. https://doi.org/10.1007/978-3-642-03098-7_4

Niinimaki, M., & Thanisch, P. (2019). Dataspace Management for Large Data Sets. In P. Vasant, I. Litvinchev, & Marmolejo-Saucedo. J., Innovative Computing Trends and Applications (pp. 13-21). Springer. https://doi.org/10.1007/978-3-030-03898-4_2

Niinimaki, M., Heikkurinen, M., & Schmidt, J. (2019). Performance of XML databases., forthcoming.

Oracle. (2016). Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph. Oracle.

Robinson, I., Webber, J., & Eifrem, E. (2015). Graph Databases (2nd ed.). Sebastopol, CA: O'Reilly.

Schmidt, M., Schallhorn, T., Lausen, G., & Pinkel, C. (2009). SP2Bench: A SPARQL performance benchmark. IEEE International Conference on Data Engineering, 42. https://doi.org/10.1109/ICDE.2009.28

Steinbrook, R. (2005, April). Public Access to NIH-Funded Research. New England Journal of Medicine(352), 1739-1741. https://doi.org/10.1056/NEJMp058088

Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., & Wilkins, D. (2010). A comparison of a graph database and a relational database: a data provenance perspective. Proceedings of the 48th annual Southeast regional conference. ACM. https://doi.org/10.1145/1900008.1900067

W3C. (2004). RDF Primer - W3C Recommendation.

W3C. (2008). SPARQL Query Language for RDF, W3C Recommendation.

W3C. (2014). RDF 1.1 N-Triples, A line-based syntax for an RDF graph. Retrieved from https://www.w3.org/TR/n-triples/

Downloads

Published

2019-11-15

How to Cite

Nacional, T., Niinimaki, M., & Heikkurinen, M. (2019). RDF DATABASES – CASE STUDY AND PERFORMANCE EVALUATION. MATTER: International Journal of Science and Technology, 5(3), 01–14. https://doi.org/10.20319/mijst.2019.53.0114