Link-Prediction-Graph-Analytics

Use graph analytics to predict relationships between entities

Project Background :

This repo implements the link prediction concepts described in Chapter - 8 of "Graph Algorithm" book by Mark Needham & Amy E. Holder. One can download a free copy of this excellent book at https://neo4j.com/graph-algorithms-book/. In this book, authors have implemented link prediction algorithm using PySpark ML algorithms.

In this repo I have shared the implementation of link prediction algorithm using popular Sklearn and Pandas python libraries (instead of PySpark) and also demontrated how we can use Neo4J (Graph Database) connectors to transmit data/commands across python and Neo4J servers.

We’ll use paper citation dataset containing information about research papers, authors, author relationship and citation relationship. We'll create a coauthors graph based on authors who have collaborated on papers and then predict future collaborations between pairs of authors. We’re only interested in collaborations between authors who haven’t collaborated before—we’re not concerned with multiple collaborations between pairs of authors. (see Dataset section below for more details)

Technologies used in the repo:

Python (Sklearn, Pandas and Neo4j libraries)
Neo4J Desktop Edition
Cypher Query Language

Prerequisites:

Graph Theory Concepts (presented very well in the book)
Familiarity with Neo4J Graph Database
Basic knowledge of Cypher query language
Knowledge of Sklearn and Pandas libraries

Dataset (info taken from page 200 of "Graph Algorithms" book)
Citation Network Dataset , a research dataset extracted from DBLP, ACM, and MAG. The dataset is described in the paper “ArnetMiner: Extraction and Mining of Academic Social Networks”, by J. Tang et al. The dataset version used in the book (DBLP-Citation-network V10) contains 3,079,007 papers, 1,766,547 authors, 9,437,718 author relationships, and 25,166,994 citation relationships. We’ll be working with a subset focused on articles that appeared in the following publications:
• Lecture Notes in Computer Science
• Communications of the ACM
• International Conference on Software Engineering
• Advances in Computing and Communications

Our resulting dataset contains 51,956 papers, 80,299 authors, 140,575 author relationships, and 28,706 citation relationships. We’ll create a coauthors graph based on authors who have collaborated on papers and then predict future collaborations between pairs of authors. We’re only interested in collaborations between authors who haven’t collaborated before—we’re not concerned with multiple collaborations between pairs of authors.

Process Overview (see Python Code for details)
Load data into Neo4J
Create balance data and split samples into Pandas DataFrames for training and testing
Implement ML methods for link prediction
Train and evaluate various versions of ML prediction models, starting with basic graphy features and adding more graph algorithm features extracted using Neo4j

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Link-Prediction-Graph-Analytics

About

Releases

Packages

Languages

ashu649/Graph-Analytics-Link-Prediction

Folders and files

Latest commit

History

Repository files navigation

Link-Prediction-Graph-Analytics

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages