This repository contains Python code for implementing the DeepWalk algorithm, a graph embedding technique, and using the resulting embeddings for node classification. DeepWalk is a method for learning latent representations of nodes in a graph by performing random walks on the graph and applying Word2Vec to learn node embeddings. These embeddings can then be used for various downstream tasks, such as node classification.
- deepwalk.py: Implementation of the DeepWalk algorithm.
- train.py: Training the DeepWalk model and generating node embeddings.
- classifier.py: Building and training a node classifier using the embeddings.
The deepwalk.py
file contains the implementation of the DeepWalk algorithm, which involves generating random walks on a given graph and training a Word2Vec model to learn embeddings for nodes.
__init__(self, graph: Graph)
: Initializes the DeepWalk algorithm with a given graph.random_walk(self, node: int, walk_length: int) -> List
: Performs a random walk starting from a given node.generate_train_samples(self, num_samples: int, walk_length: int) -> List
: Generates training samples for the DeepWalk model.train(self, X: List, embed_dim: int = 128, window: int = 5, min_count: int = 1, workers: int = 4) -> W2V
: Trains the DeepWalk model using the generated training samples.
The train.py
file demonstrates how to use the DeepWalk algorithm to generate node embeddings from a graph and provides an example using a dataset.
- Load the necessary libraries and set constants and hyperparameters.
- Load and preprocess the dataset (e.g., Cora dataset).
- Generate training samples by applying DeepWalk to the graph.
- Train a Word2Vec model using the generated training samples.
- Visualize the embeddings using PCA.
- Save the trained model.
The classifier.py
file showcases how to use the embeddings generated by DeepWalk for node classification. It provides an example of building a simple neural network classifier and training it on the embeddings.
- Load the Word2Vec model trained using DeepWalk.
- Load and preprocess the dataset (e.g., Cora dataset).
- Create a neural network classifier and set hyperparameters.
- Split the data into training and testing sets.
- Initialize the neural network classifier and optimizer.
- Train the classifier on the embeddings.
- Plot the training and testing loss to visualize the training progress.
To use this code for your own graph data and tasks:
- Install the required libraries, such as NetworkX, Gensim, Node2Vec, and PyTorch.
- Prepare your graph data and make any necessary modifications to the code.
- Execute
train.py
to generate node embeddings. - Execute
classifier.py
to train a classifier on the embeddings.
Please note that this README provides a high-level overview, and you should refer to the individual code files for detailed information and customization.
- NetworkX
- Gensim
- Node2Vec
- PyTorch
- Pandas
- Matplotlib
- Scikit-learn
This code is provided under the MIT License. See the LICENSE file for more details.
This code is based on the DeepWalk algorithm and borrows from various open-source projects and libraries. Please refer to the relevant documentation and licenses of these projects.
Feel free to modify and expand upon this code to suit your specific needs. If you have any questions or encounter any issues, please create an issue on GitHub or contact the author.