Large scale graph processing systems

survey and an experimental evaluation

Omar Batarfi, Radwa El Shawi, Ayman G. Fayoumi, Reza Nouri, Seyed Mehdi Reza Beheshti, Ahmed Barnawi, Sherif Sakr

Research output: Contribution to journalArticleResearchpeer-review

30 Citations (Scopus)

Abstract

Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.

Original languageEnglish
Pages (from-to)1189-1213
Number of pages25
JournalCluster Computing
Volume18
Issue number3
DOIs
Publication statusPublished - 30 Sep 2015

Fingerprint

Processing
Data structures
Global positioning system
Proteins
Big data

Keywords

  • Big graph
  • Experimental evaluation
  • Graph processing

Cite this

Batarfi, O., Shawi, R. E., Fayoumi, A. G., Nouri, R., Beheshti, S. M. R., Barnawi, A., & Sakr, S. (2015). Large scale graph processing systems: survey and an experimental evaluation. Cluster Computing, 18(3), 1189-1213. https://doi.org/10.1007/s10586-015-0472-6
Batarfi, Omar ; Shawi, Radwa El ; Fayoumi, Ayman G. ; Nouri, Reza ; Beheshti, Seyed Mehdi Reza ; Barnawi, Ahmed ; Sakr, Sherif. / Large scale graph processing systems : survey and an experimental evaluation. In: Cluster Computing. 2015 ; Vol. 18, No. 3. pp. 1189-1213.
@article{7f03ad94b66240a38d9f63fc62638a64,
title = "Large scale graph processing systems: survey and an experimental evaluation",
abstract = "Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.",
keywords = "Big graph, Experimental evaluation, Graph processing",
author = "Omar Batarfi and Shawi, {Radwa El} and Fayoumi, {Ayman G.} and Reza Nouri and Beheshti, {Seyed Mehdi Reza} and Ahmed Barnawi and Sherif Sakr",
year = "2015",
month = "9",
day = "30",
doi = "10.1007/s10586-015-0472-6",
language = "English",
volume = "18",
pages = "1189--1213",
journal = "Cluster Computing",
issn = "1386-7857",
publisher = "Kluwer Academic Publishers",
number = "3",

}

Batarfi, O, Shawi, RE, Fayoumi, AG, Nouri, R, Beheshti, SMR, Barnawi, A & Sakr, S 2015, 'Large scale graph processing systems: survey and an experimental evaluation', Cluster Computing, vol. 18, no. 3, pp. 1189-1213. https://doi.org/10.1007/s10586-015-0472-6

Large scale graph processing systems : survey and an experimental evaluation. / Batarfi, Omar; Shawi, Radwa El; Fayoumi, Ayman G.; Nouri, Reza; Beheshti, Seyed Mehdi Reza; Barnawi, Ahmed; Sakr, Sherif.

In: Cluster Computing, Vol. 18, No. 3, 30.09.2015, p. 1189-1213.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Large scale graph processing systems

T2 - survey and an experimental evaluation

AU - Batarfi, Omar

AU - Shawi, Radwa El

AU - Fayoumi, Ayman G.

AU - Nouri, Reza

AU - Beheshti, Seyed Mehdi Reza

AU - Barnawi, Ahmed

AU - Sakr, Sherif

PY - 2015/9/30

Y1 - 2015/9/30

N2 - Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.

AB - Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.

KW - Big graph

KW - Experimental evaluation

KW - Graph processing

UR - http://www.scopus.com/inward/record.url?scp=84942551417&partnerID=8YFLogxK

U2 - 10.1007/s10586-015-0472-6

DO - 10.1007/s10586-015-0472-6

M3 - Article

VL - 18

SP - 1189

EP - 1213

JO - Cluster Computing

JF - Cluster Computing

SN - 1386-7857

IS - 3

ER -

Batarfi O, Shawi RE, Fayoumi AG, Nouri R, Beheshti SMR, Barnawi A et al. Large scale graph processing systems: survey and an experimental evaluation. Cluster Computing. 2015 Sep 30;18(3):1189-1213. https://doi.org/10.1007/s10586-015-0472-6