DeepWalk - Graph Embedding and Node Classification

Overview

This repository contains Python code for implementing the DeepWalk algorithm, a graph embedding technique, and using the resulting embeddings for node classification. DeepWalk is a method for learning latent representations of nodes in a graph by performing random walks on the graph and applying Word2Vec to learn node embeddings. These embeddings can then be used for various downstream tasks, such as node classification.

deepwalk.py

Introduction

The deepwalk.py file contains the implementation of the DeepWalk algorithm, which involves generating random walks on a given graph and training a Word2Vec model to learn embeddings for nodes.

Class: `DeepWalk`

Methods

__init__(self, graph: Graph): Initializes the DeepWalk algorithm with a given graph.
random_walk(self, node: int, walk_length: int) -> List: Performs a random walk starting from a given node.
generate_train_samples(self, num_samples: int, walk_length: int) -> List: Generates training samples for the DeepWalk model.
train(self, X: List, embed_dim: int = 128, window: int = 5, min_count: int = 1, workers: int = 4) -> W2V: Trains the DeepWalk model using the generated training samples.

train.py

Introduction

The train.py file demonstrates how to use the DeepWalk algorithm to generate node embeddings from a graph and provides an example using a dataset.

Steps

Load the necessary libraries and set constants and hyperparameters.
Load and preprocess the dataset (e.g., Cora dataset).
Generate training samples by applying DeepWalk to the graph.
Train a Word2Vec model using the generated training samples.
Visualize the embeddings using PCA.
Save the trained model.

classifier.py

Introduction

The classifier.py file showcases how to use the embeddings generated by DeepWalk for node classification. It provides an example of building a simple neural network classifier and training it on the embeddings.

Steps

Load the Word2Vec model trained using DeepWalk.
Load and preprocess the dataset (e.g., Cora dataset).
Create a neural network classifier and set hyperparameters.
Split the data into training and testing sets.
Initialize the neural network classifier and optimizer.
Train the classifier on the embeddings.
Plot the training and testing loss to visualize the training progress.

Usage

To use this code for your own graph data and tasks:

Install the required libraries, such as NetworkX, Gensim, Node2Vec, and PyTorch.
Prepare your graph data and make any necessary modifications to the code.
Execute train.py to generate node embeddings.
Execute classifier.py to train a classifier on the embeddings.

Please note that this README provides a high-level overview, and you should refer to the individual code files for detailed information and customization.

Dependencies

NetworkX
Gensim
Node2Vec
PyTorch
Pandas
Matplotlib
Scikit-learn

License

This code is provided under the MIT License. See the LICENSE file for more details.

Acknowledgments

This code is based on the DeepWalk algorithm and borrows from various open-source projects and libraries. Please refer to the relevant documentation and licenses of these projects.

Feel free to modify and expand upon this code to suit your specific needs. If you have any questions or encounter any issues, please create an issue on GitHub or contact the author.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DeepWalk - Graph Embedding and Node Classification

Overview

Contents

deepwalk.py

Introduction

Class: `DeepWalk`

Methods

train.py

Introduction

Steps

classifier.py

Introduction

Steps

Usage

Dependencies

License

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

DeepWalk - Graph Embedding and Node Classification

Overview

Contents

deepwalk.py

Introduction

Class: DeepWalk

Methods

train.py

Introduction

Steps

classifier.py

Introduction

Steps

Usage

Dependencies

License

Acknowledgments

Class: `DeepWalk`