What is Graph?
Graphs have been around for a long time. They are used in a lot of fields known to us like medicine, world wide web, computers and many more. But it is only in recent times that many of us have started to take notice about graphs and the way they can be stored (Graph databases) and processed (Graph analytical engines).
So what has caused this buzz around graphs and graph databases in particular?
The answer lies around the efficient handling of large volume of data that is coming in at a neck-breaking speed, aka Big Data. Now more and more applications are trying to extract meaningful information from large volume of, mostly unstructured, data. This data needs to be stored and should be accessible in near real time. A relatively new technology that has emerged to handle Big Data is the NOSQL. Graph databases come under the category of NOSQL.
NOSQL and Graph Database
To understand more about graph databases, we should understand different types of NOSQL databases that are present today. NOSQL databases can be categorized as below:
- Key Value Pair based databases: Amazon first introduced K-V pair database. The data is stored as collection of K-V pairs. They are extremely efficient in handling large amount of data. Riak, Amazon Dynamo is some of the mostly used K-V pair based database
- Column Family based databases: This was introduced in the Google’s Big table paper and is much better in handling more complex set of data than the K-V pair databases. Some of the column family databases are Hbase, Cassandra.
- Document based databases: Document based databases have been inspired by Lotus Notes. These types of databases store individual record in a document and map that document with a key. Most commonly used document based databases are CouchDB, MongoDB.
- Graph Databases: Euler’s theorem and graph theory inspire Graph databases. They mainly comprise of different entities or nodes or vertices and the relationship or edges connecting them. The nodes and relationships both can have K-V pairs. Some of the well known graph databases are AllegroGraph, Neo4j
So where do graph databases stand in comparison with other NOSQL databases? Well, when we talk about big data, the size and the complexity of the data should be considered and these databases compare differently when we combine these two factors.
As seen in the graph, K-V pair databases can handle extremely large data, which have simple schema. Graph databases on the other hand can handle largely complex data but comes low on data volume compared to other databases. Even though the scale for graph databases is the lowest on the data size, it still can handle billions of nodes and relationships, which cover a vast majority of application use cases.
How is data stored in Graph database?
In graph database, data is stored as nodes. Every node can have set of properties, which are key value pairs. Each node can have varied type of properties, which makes it very much able to handle complex data. A node can be connected to another node in the database by having a relationship with the node. Relationships, as nodes, can also have properties as key value pairs. A node can have more than one relationship with another node. The way data is stored makes it trivial to traverse through the graph to find if any two nodes are connected and if connected, how far down the graph or what is the degree of separation. This is a very useful representation of data, which are interconnected with any kind of relationship.
Graph Database and RDBMS
RDBMS still reigns as the preferred database for majority of applications. A big reason for this is that RDBMS has been around for a very long time. But NOSQL data models that have recently arrived at the scene have seriously challenged it. This is because the rigid schema needed to store data in RDBMS makes it very difficult to handle data, which is coming in very fast. Making the situation worse is the complexity of the data that needs to be ingested in modern applications.
In Graph database, the schema is not rigid. Here, the data itself is the schema, which makes it much easier to handle complex data. And when it comes to interconnected data, graph database handles that much more efficiently and gracefully.
Let’s see how interconnected data is processed in RDBMS and Graph databases. In RDBMS, if there are interconnected entities, we normally have a mapping table, which will store the entities that are related directly. If we need to find the two entities that are connected, we would need to join the master table and the mapping table. If we need to find two entities, which are connected but not directly, then the query becomes even more complicated and the execution time increases significantly. This is because joins on RDBMS are expensive. And as the data grows, the situation worsens.
In comparison, the way graph databases store the interconnected data; it is a trivial task to find if two entities are connected, and if so how far along the graph are these connected.
So, should graph database replace RDBMS? Well, the answer is no. As with any other NOSQL model, graph databases are not a replacement for RDBMS. They are supposed to be used along with RDBMS. We should use graph databases where data model is such that data is interconnected. That is where graph databases specialty lie.
Summary
Graph databases are good in maintaining complex interconnected data. Interestingly, almost all the application has a good portion of their data model, which is interconnected. If data growth is a very important consideration in your application, then graph database definitely should be given a serious consideration.
Leave a Reply