Skip to content

Generating embeddings for nodes in a graph using random walks and sequence modeling

Notifications You must be signed in to change notification settings

kelvin-jose/deepwalk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepWalk - Graph Embedding and Node Classification

Overview

This repository contains Python code for implementing the DeepWalk algorithm, a graph embedding technique, and using the resulting embeddings for node classification. DeepWalk is a method for learning latent representations of nodes in a graph by performing random walks on the graph and applying Word2Vec to learn node embeddings. These embeddings can then be used for various downstream tasks, such as node classification.

Contents

  1. deepwalk.py: Implementation of the DeepWalk algorithm.
  2. train.py: Training the DeepWalk model and generating node embeddings.
  3. classifier.py: Building and training a node classifier using the embeddings.

deepwalk.py

Introduction

The deepwalk.py file contains the implementation of the DeepWalk algorithm, which involves generating random walks on a given graph and training a Word2Vec model to learn embeddings for nodes.

Class: DeepWalk

Methods

  • __init__(self, graph: Graph): Initializes the DeepWalk algorithm with a given graph.
  • random_walk(self, node: int, walk_length: int) -> List: Performs a random walk starting from a given node.
  • generate_train_samples(self, num_samples: int, walk_length: int) -> List: Generates training samples for the DeepWalk model.
  • train(self, X: List, embed_dim: int = 128, window: int = 5, min_count: int = 1, workers: int = 4) -> W2V: Trains the DeepWalk model using the generated training samples.

train.py

Introduction

The train.py file demonstrates how to use the DeepWalk algorithm to generate node embeddings from a graph and provides an example using a dataset.

Steps

  1. Load the necessary libraries and set constants and hyperparameters.
  2. Load and preprocess the dataset (e.g., Cora dataset).
  3. Generate training samples by applying DeepWalk to the graph.
  4. Train a Word2Vec model using the generated training samples.
  5. Visualize the embeddings using PCA.
  6. Save the trained model.

classifier.py

Introduction

The classifier.py file showcases how to use the embeddings generated by DeepWalk for node classification. It provides an example of building a simple neural network classifier and training it on the embeddings.

Steps

  1. Load the Word2Vec model trained using DeepWalk.
  2. Load and preprocess the dataset (e.g., Cora dataset).
  3. Create a neural network classifier and set hyperparameters.
  4. Split the data into training and testing sets.
  5. Initialize the neural network classifier and optimizer.
  6. Train the classifier on the embeddings.
  7. Plot the training and testing loss to visualize the training progress.

Usage

To use this code for your own graph data and tasks:

  1. Install the required libraries, such as NetworkX, Gensim, Node2Vec, and PyTorch.
  2. Prepare your graph data and make any necessary modifications to the code.
  3. Execute train.py to generate node embeddings.
  4. Execute classifier.py to train a classifier on the embeddings.

Please note that this README provides a high-level overview, and you should refer to the individual code files for detailed information and customization.

Dependencies

  • NetworkX
  • Gensim
  • Node2Vec
  • PyTorch
  • Pandas
  • Matplotlib
  • Scikit-learn

License

This code is provided under the MIT License. See the LICENSE file for more details.

Acknowledgments

This code is based on the DeepWalk algorithm and borrows from various open-source projects and libraries. Please refer to the relevant documentation and licenses of these projects.


Feel free to modify and expand upon this code to suit your specific needs. If you have any questions or encounter any issues, please create an issue on GitHub or contact the author.

About

Generating embeddings for nodes in a graph using random walks and sequence modeling

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published