Discovering Graph Neural Networks

Exploring the shortcomings of traditional deep learning for graph-structured information and the unique capabilities of GNNs in modeling complex interdependencies.

Jun 04, 2025

∙ Paid

IMP! - So, you're diving into the first piece of our big four-part series on Graph Neural Networks. Think of this whole thing as a book, broken down into chapters, and then those chapters are bundled into four main sections. This long read you're looking at now? That's Part One, and it pulls together a bunch of those early chapters. That's why it's pretty beefy – we're laying down all the foundational stuff here before we jump into more advanced topics in the next three parts.

The Challenge of Relational Data in Traditional Deep Learning

Traditional machine learning and deep learning models have revolutionized many fields, excelling at tasks like image recognition, natural language processing, and tabular data analysis. However, they face significant limitations when dealing with data that inherently possesses complex, non-Euclidean relationships—what we call relational data. This type of data is best represented as a graph, where entities are nodes and their connections are edges.

Understanding Relational Data: Beyond Tables

Imagine a social network. You’re not just interested in individual users (nodes) but also in their friendships, followers, and interactions (edges). Similarly, in chemistry, atoms are nodes, and chemical bonds are edges. These relationships are fundamental to understanding the system.

It’s crucial to distinguish between “relational data” in the context of graphs and “relational databases.” While relational databases store data in structured tables with defined relationships (e.g., a Customers table linked to an Orders table by a customer_id), they primarily manage structured collections of records. The relationships are implicitly defined by foreign keys and table joins.

In contrast, graph data explicitly models relationships as first-class citizens. Each edge between nodes carries meaning and can have its own properties.

Consider this example:

Relational Database:
- Posts table: id, content, user_id
- Likes table: user_id, post_id
  Here, a User likes a Post by a join through user_id and post_id. The relationship is inferred.
Graph Data:
- Nodes: User(A), User(B), Post(X), Post(Y)
- Edges: (User(A), LIKES, Post(X)), (User(A), FOLLOWS, User(B)), (User(B), CREATED, Post(Y))
  Here, LIKES, FOLLOWS, and CREATED are explicit edge types, and the relationships are directly modeled as connections. This direct representation allows algorithms to traverse and analyze the network structure directly.

Limitations of Traditional ML/DL for Graphs

When faced with graph-structured data, traditional machine learning and deep learning models struggle for several reasons:

Fixed Input Size Assumption: Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) expect data in a structured, grid-like format (images, sequences). Graphs, however, are non-Euclidean and irregular; they don’t have a fixed spatial order or size. The number of neighbors for a node can vary, and there’s no inherent “up,” “down,” “left,” or “right.”
Loss of Structural Information: Traditional methods often resort to flattening graph data into feature vectors. This means converting nodes and edges into a tabular format, which inevitably leads to the loss of crucial relational information.

The “Indirect Way”: Feature Engineering Pitfalls

Historically, to apply deep learning to graph data, practitioners would employ extensive feature engineering. This involved calculating static, aggregate features for each node or edge based on its position within the graph. Examples include:
- Node Degree: The number of connections a node has.
- Clustering Coefficient: A measure of how connected a node’s neighbors are to one another.
- Betweenness Centrality: How often a node lies on the shortest path between other nodes.
- PageRank Score: An importance score derived from link structure.
While these features provide some insight, they are a static snapshot and often fail to capture the dynamic, evolving nature of relationships. They are “indirect” because they don’t allow the model to learn directly from the graph’s topology during training.
Consider a simple analogy: Imagine trying to understand a complex group conversation by only knowing how many friends each person has and how many times they speak. You would miss the actual content of the conversation, who is influencing whom, and the subtle dynamics of the discussion. Similarly, with feature engineering, the deep learning model receives pre-digested information, losing the ability to truly “see” and leverage the underlying network structure. The model learns from the features derived from the graph, not the graph itself.
This approach is often brittle, time-consuming, and fails to generalize well to new or evolving graph structures. It requires domain expertise to select the right features, and it’s difficult to capture higher-order relationships effectively.

Introducing Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) emerged as a powerful solution to these challenges, specifically designed to process and learn from graph-structured data. They extend the deep learning paradigm to leverage the inherent connectivity and relationships within graphs.

What Are GNNs?

At their core, GNNs are neural networks that operate directly on graphs. They learn to represent nodes and edges in a low-dimensional vector space (embeddings) by aggregating information from their local neighborhood. Unlike traditional neural networks that operate on independent data points, GNNs understand that the representation of a node should be influenced by its neighbors and the edges connecting them.

The Core Idea: Message Passing and Aggregation

The fundamental mechanism behind most GNNs is message passing (also known as neighborhood aggregation or information propagation). This process allows information to flow across the graph, enabling each node to recursively gather and integrate features from its immediate and extended neighborhood.

Here’s a high-level conceptual overview:

Message Generation: Each node generates a “message” based on its own features and the features of its outgoing edges.
Message Aggregation: Each node collects messages from all its incoming neighbors. It then aggregates these messages using a permutation-invariant function (e.g., sum, mean, max). This ensures that the order of neighbors doesn’t affect the outcome.
Feature Update: The node updates its own feature representation (embedding) by combining its previous representation with the aggregated message from its neighbors. This update typically involves a neural network layer.

This process is typically repeated for several “layers” or “hops,” allowing information to propagate further across the graph. After multiple rounds, a node’s final embedding will encapsulate information not just from its direct neighbors but also from its neighbors’ neighbors, and so on, effectively capturing local and global graph structure.

Let’s illustrate this with a conceptual pseudo-code representation of a single message passing step:

# Conceptual representation of a single GNN layer's forward pass

class GNNLayer:
    def __init__(self, input_dim, output_dim):
        # Initialize learnable weights for transforming node features and messages
        self.node_transform_weights = initialize_weights(input_dim, output_dim)
        self.message_transform_weights = initialize_weights(input_dim, output_dim) # Example

    def forward(self, node_features, adjacency_matrix):
        # Step 1: Initialize aggregated features for each node
        aggregated_features = {node_id: [] for node_id in node_features}

        # Step 2: Message Generation and Collection (iterating over edges conceptually)
        for node_u, node_v in get_all_edges(adjacency_matrix):
            # Node 'u' sends a message to node 'v'
            # The message is based on 'u's features, possibly transformed
            message_from_u = transform_features(node_features[node_u], self.message_transform_weights)
            aggregated_features[node_v].append(message_from_u) # Node 'v' collects message

        # Step 3: Message Aggregation for each node
        # Each node aggregates messages from its neighbors
        for node_id in node_features:
            if aggregated_features[node_id]:
                # Aggregate messages using a permutation-invariant function (e.g., sum, mean)
                # For simplicity, let's use sum here
                aggregated_features[node_id] = sum_all_messages(aggregated_features[node_id])
            else:
                # If a node has no neighbors, its aggregated message is typically zero
                aggregated_features[node_id] = zeros_vector(output_dim)

        # Step 4: Feature Update
        # Each node updates its own feature representation
        new_node_features = {}
        for node_id in node_features:
            # Combine current node features with aggregated neighbor features
            # This often involves a concatenation followed by a neural network layer
            combined_input = concatenate(node_features[node_id], aggregated_features[node_id])
            new_node_features[node_id] = apply_neural_network_layer(combined_input, self.node_transform_weights)

        return new_node_features

# The process iterates over multiple layers, allowing information to propagate further.
# Each layer refines the node embeddings.

Why GNNs Excel with Graph Data

GNNs are superior to traditional methods for graph-structured data because they:

Dynamically Learn Features: Instead of relying on static, hand-crafted features, GNNs learn features directly from the graph structure during the training process. The message passing mechanism allows them to automatically capture complex, multi-hop relationships.
Preserve Graph Structure: By design, GNNs operate on the graph’s topology. They inherently understand that nodes are connected and that these connections carry meaning. This prevents the loss of crucial relational information that occurs when flattening graph data.
Are Permutation-Invariant: The aggregation functions (like sum or mean) used in GNNs are permutation-invariant. This means that if you reorder a node’s neighbors, the aggregated message remains the same, which is essential because there’s no inherent order to neighbors in a graph.
Are Inductive: Many GNN architectures can generalize to unseen nodes or even entirely new graphs. Once trained, they can generate embeddings for new nodes by applying the same message-passing rules, making them suitable for dynamic graphs.

Real-World Applications of GNNs

GNNs have found widespread success across various domains where relational data is prevalent. Their ability to model complex dependencies makes them ideal for tasks that were previously challenging for traditional machine learning.