Graph Algorithms in Neo4j: Triangle Count & Clustering Coefficient


Graph algorithms provide the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database like Neo4j.

Last week we once again looked at Community Detection algorithms, with a focus on the Louvain Modularity algorithm.

Learn more about Triangle Count and Clustering Coefficient graph algorithms in Neo4j.


This week we wrap up our exploration of Community Detection algorithms, with a look at the Triangle Count and Average Clustering Coefficient algorithm, which measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique, and 0 when there are no connections.

About Triangle Count and Average Clustering Coefficient


Triangle Count is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.

Triangle counting gained popularity in social network analysis, where it is used to detect communities and measure the cohesiveness of those communities. It is also used to determine the stability of a graph and is often used as part of the computation of network indices, such as the clustering coefficient.

There are two types of clustering coefficients:

Local clustering coefficient

The local clustering coefficient of a node is the likelihood that its neighbors are also connected. The computation of this score involves triangle counting.

Global clustering coefficient

The global clustering coefficient is the normalized sum of those local clustering coefficients. The transitivity coefficient of a graph is sometimes used, which is three times the number of triangles divided by the number of triples in the graph. For more information, see Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study.”

When Should I Use Triangle Count and Clustering Coefficient?


Triangles Example


Let’s see how the Triangle Count and Clustering Coefficient algorithm works on a small dataset. The following Cypher statement creates a graph with people and KNOWS relationships between them.

MERGE (alice:Person{id:"Alice"})
MERGE (michael:Person{id:"Michael"})
MERGE (karin:Person{id:"Karin"})
MERGE (chris:Person{id:"Chris"})
MERGE (will:Person{id:"Will"})
MERGE (mark:Person{id:"Mark"})
MERGE (michael)-[:KNOWS]->(karin)
MERGE (michael)-[:KNOWS]->(chris)
MERGE (will)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(will)
MERGE (alice)-[:KNOWS]->(michael)
MERGE (will)-[:KNOWS]->(chris)
MERGE (chris)-[:KNOWS]->(karin);



The following query finds all the KNOWS triangles between people in our graph.

CALL algo.triangle.stream("Person","KNOWS")
YIELD nodeA,nodeB,nodeC
MATCH (a:Person) WHERE id(a) = nodeA
MATCH (b:Person) WHERE id(b) = nodeB
MATCH (c:Person) WHERE id(c) = nodeC
RETURN a.id AS nodeA, b.id AS nodeB, c.id AS node



We can see that there are KNOWS triangles containing “Will, Michael and Chris”, “Will, Mark and Michael”, and “Michael, Karin and Chris.” This means that everybody in the triangle knows each other. We work out the clustering coefficient of each person by running the following algorithm.

CALL algo.triangleCount.stream('Person', 'KNOWS')
YIELD nodeId, triangles, coefficient
MATCH (p:Person) WHERE id(p) = nodeId
RETURN p.id AS name, triangles, coefficient
ORDER BY coefficient DESC



We learn that Michael is part of the most triangles, but it’s Karin and Mark who are the best at introducing their friends – all of the people who know them, know each other!

Conclusion


As we’ve seen, the Average Clustering Coefficient is often used to estimate whether a network might exhibit “small-world” behaviors that are based on tightly knit clusters.

Next week we’ll look at graph algorithms in practice, where we’ll learn how to apply them in data-intensive applications using data from Yelp’s annual dataset challenge.


Find the patterns in your connected data
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.


Get the O’Reilly Ebook