Skip to content

Latest commit

 

History

History
89 lines (58 loc) · 3.71 KB

README.md

File metadata and controls

89 lines (58 loc) · 3.71 KB

DeepWalk - Graph Embedding and Node Classification

Overview

This repository contains Python code for implementing the DeepWalk algorithm, a graph embedding technique, and using the resulting embeddings for node classification. DeepWalk is a method for learning latent representations of nodes in a graph by performing random walks on the graph and applying Word2Vec to learn node embeddings. These embeddings can then be used for various downstream tasks, such as node classification.

Contents

  1. deepwalk.py: Implementation of the DeepWalk algorithm.
  2. train.py: Training the DeepWalk model and generating node embeddings.
  3. classifier.py: Building and training a node classifier using the embeddings.

deepwalk.py

Introduction

The deepwalk.py file contains the implementation of the DeepWalk algorithm, which involves generating random walks on a given graph and training a Word2Vec model to learn embeddings for nodes.

Class: DeepWalk

Methods

  • __init__(self, graph: Graph): Initializes the DeepWalk algorithm with a given graph.
  • random_walk(self, node: int, walk_length: int) -> List: Performs a random walk starting from a given node.
  • generate_train_samples(self, num_samples: int, walk_length: int) -> List: Generates training samples for the DeepWalk model.
  • train(self, X: List, embed_dim: int = 128, window: int = 5, min_count: int = 1, workers: int = 4) -> W2V: Trains the DeepWalk model using the generated training samples.

train.py

Introduction

The train.py file demonstrates how to use the DeepWalk algorithm to generate node embeddings from a graph and provides an example using a dataset.

Steps

  1. Load the necessary libraries and set constants and hyperparameters.
  2. Load and preprocess the dataset (e.g., Cora dataset).
  3. Generate training samples by applying DeepWalk to the graph.
  4. Train a Word2Vec model using the generated training samples.
  5. Visualize the embeddings using PCA.
  6. Save the trained model.

classifier.py

Introduction

The classifier.py file showcases how to use the embeddings generated by DeepWalk for node classification. It provides an example of building a simple neural network classifier and training it on the embeddings.

Steps

  1. Load the Word2Vec model trained using DeepWalk.
  2. Load and preprocess the dataset (e.g., Cora dataset).
  3. Create a neural network classifier and set hyperparameters.
  4. Split the data into training and testing sets.
  5. Initialize the neural network classifier and optimizer.
  6. Train the classifier on the embeddings.
  7. Plot the training and testing loss to visualize the training progress.

Usage

To use this code for your own graph data and tasks:

  1. Install the required libraries, such as NetworkX, Gensim, Node2Vec, and PyTorch.
  2. Prepare your graph data and make any necessary modifications to the code.
  3. Execute train.py to generate node embeddings.
  4. Execute classifier.py to train a classifier on the embeddings.

Please note that this README provides a high-level overview, and you should refer to the individual code files for detailed information and customization.

Dependencies

  • NetworkX
  • Gensim
  • Node2Vec
  • PyTorch
  • Pandas
  • Matplotlib
  • Scikit-learn

License

This code is provided under the MIT License. See the LICENSE file for more details.

Acknowledgments

This code is based on the DeepWalk algorithm and borrows from various open-source projects and libraries. Please refer to the relevant documentation and licenses of these projects.


Feel free to modify and expand upon this code to suit your specific needs. If you have any questions or encounter any issues, please create an issue on GitHub or contact the author.