Best Graph Modelling Practices

Reeshabh Choudhary
5 min readJan 8, 2024

👷♂️ Software Architecture Series — Part 18

In the previous article, we had an overview of Graph Database. When it comes to modeling or designing, there is hardly any one stop solution, as every design or model comes with its own trade-offs. Same applies when talk about modelling data in databases. There can be numerous models which can be formed; however, the primary agenda is supposed to be ease of querying in databases. And to ensure that, we can follow some guidelines which help us in the cause:

Understanding User Stories

A database is supposed to provide users the ability to interact with data as efficiently as possible. Graph databases are used when there are numerous entities and many to many relationships between them. However, while representing these relationships in graph data, one need to account for what users want to achieve with data and what information they seek. This helps in capturing the key aspects of the domain that model should consider, and help in deciding nodes, relationships, and their properties.

Graph databases are easily adaptable, however, too much of iteration may disrupt the model from its actual purpose. One noble way to approach the modeling of graph data can be through capturing user stories in each iteration cycle and ensuring model remains aligned with evolving user needs. This can be done in a structured manner following Agile lifecycle. Here is a breakdown of possible steps:

End user stakeholders can be engaged in discussion via interviews, workshops, or surveys. Business stakeholders can be part of the design discussion, soon after requirement gathering. Based on the discussion, immediate course of action should be captured in user stories.

Next obvious step will be the analysis of gathered user stories to extract common themes or patterns. Once patterns start to emerge, we can identify relevant entities, their relationships and associated properties of entities and relationships respectively.

Then the imminent step of modelling can be performed by mapping the identified entities (nodes) and relationships (edges) into graph model. This model should be validated time and again against the user stories to ensure it accurately represents the domain. We must periodically evaluate query performance and optimize the model based on real query performance metrics and bottlenecks.

A proper documentation of graph model can be an added step to ensure clear communication among stakeholders about the rationale behind the chosen model and its implications on querying and performance.

A user centric approach ensures a graph model that addresses real needs, leading to a more effective and valuable data model for users to interact with and derive insights from.

Multiple Relationships and use cases

In real world, entities have multiple relationships between them, and relationships keep evolving over time. For example, in an organization, the relationships between employees can evolve in various ways. An employee can ‘WORK_WITH’ other employees and at the same time can ‘REPORT_TO’ an employee or a set of employees. In time, if the employee gets promoted over other employees, then the relation dynamics change considerably, and model should be capturing the changes effectively.

Having different relationship types between the same nodes for various use cases significantly influences the query ability and flexibility of a graph database model. Different relationship types add specificity and context to connections. This helps in defining clearer paths for graph traversals, ensuring queries target the intended relationships accurately. Moreover, graph databases like Neo4j make it super easy to introduce new relationships unlike traditional approach where one needs to create additional table or schemas. This effectively reduces development complexity. And to top it, these added relationships make the querying easier based on varied paths. This traversal-based approach facilitates efficient navigation across the model network.

Granular Approach

Database modelling techniques call for normalization which basically focusses on simplification of data models so that data is isolated for addition, modification and deletion and changing data at one granular level does not incur changes at other levels or part of the model.

In relational database, we do this by dividing larger tables into smaller and less redundant tables. However, once we have to query aggregated data, we need to perform multiple join operations which may tend to be costly. In contrast, applying normalization in graph data model is much cheaper as join operations are easier to perform. And since join operations are cheaper, there is a tendency to create thin nodes and relationships (i.e. nodes and relationships have fewer properties on them). For the sake of an effective design, we must consider trade-offs between representing data as properties on nodes/relationships versus creating separate nodes to hold specific properties.

Suppose we are working on a social networking platform where users are connected based on their interests. Each user will have a set of interests but how to model interests to users effectively?

There can be two approaches:

1. Nodes can have properties and interests can be added in these properties.

2. Separate interest nodes.

Let us discuss the favorable scenarios for both the approaches:

If the property doesn’t significantly impact the traversal pattern or isn’t frequently queried during traversal, then user interests can be stored as properties on the user node (e.g., ‘interests: [music, sports, cooking]’). This approach is simple in modelling, especially when data does not influence query paths.

In case, the property is crucial for traversal patterns or frequently queried during traversal, we can create a separate node category for interests, where each interest (e.g., ‘music’, ‘sports’, ‘cooking’) becomes a node connected to users. This approach facilitates efficient traversal and querying when exploring connections based on shared interests among users.

Note: we can observe a pattern in the usage of Graph database that decision making is query centric. We must analyze the nature of queries to determine the significance of the property in traversal and retrieval. Keeping the model simple is beneficial but not at the expense of query efficiency if certain properties significantly impact queries.

We also leverage in-graph indexes for properties stored in separate nodes to further enhance query performance by optimizing property-based lookups within the graph itself. The goal is to create a model that not only reflects the data accurately but also optimizes query efficiency for the anticipated use cases.

Use Property Graph Modeling Patterns

We must refer to established strategies or patterns commonly employed in property graph modeling to represent data structures and relationships effectively. Some common property graph modelling patterns are:

1. Hierarchical Modeling: It is used to represent parent-child relationships or hierarchical structures within the data.

2. Many-to-Many Relationships: If relationships between entities are complex, such as multiple entities of one type can be connected to multiple entities of other types, we can leverage this pattern.

3. Attribute Nodes: If attributes are shared by multiple entities, to avoid redundant properties across multiple nodes and maintaining consistency we can use this pattern where attributes are represented by nodes.

4. Label-Based Schema: We can use labels to categorize nodes based on their type or role. Labels group nodes and aid in efficient indexing and retrieval of nodes based on their categories.

By following these best practices and aligning our graph model with the questions and use cases driving the data exploration, we can create a model that maximizes query-ability, supports efficient traversals, and allows for scalable and performant interactions with the graph data model.

--

--

Reeshabh Choudhary

Software Architect and Developer | Author : Objects, Data & AI.