Use-case article: Representation Learning on Graph Structured Data #25

ricsi98 · 2023-12-18T20:49:32Z

This article covers two popular algorithms for node representation learning.

Outline:

Introduction to node representation learning + introducing demo dataset
Introduction to Node2Vec + code example on demo dataset
Introduction to GraphSAGE + code example on demo dataset
Conclusion: interpreting results and a final comparison of the two algorithms

morkapronczay

Thank you very much, great work all in all, some minor suggestions

morkapronczay · 2023-12-19T13:54:12Z

docs/use_cases/node_representation_learning.md

please put SEO here and remove POST2 from the article title, these are published automatically.

morkapronczay · 2023-12-19T14:02:38Z

docs/use_cases/node_representation_learning.md

+The random walks are sampled according to a policy, which is guided by 2 parameters: return $p$, and in-out $q$.
+
+- The return parameter $p$ impacts the likelihood of returning to the previous node. A higher p leads to more locally focused walks.
+- The in-out parameter $q$ affects the likelihood of visiting nodes in the same or different neighborhood. A higher q encourages Depth First Search, while a lower q promotes Breadth First Search.


I think if these are mentioned, they should be explained more, like one more sentence

morkapronczay · 2023-12-19T17:34:04Z

docs/use_cases/node_representation_learning.md

+
+Different types of information, like words, pictures, and connections between things, show us different sides of the world. Relationships, especially, are interesting because they show how things interact and create networks. In this post, we'll talk about how we can use these relationships to understand and describe things in a network better.
+
+We're diving into a real-life example to explain how entities can be turned into vectors using their connections, a common practice in machine learning. The dataset we're going to work with is the a subset of the Cora citation network. It comprises 2708 scientific papers (nodes) and the connections indicate citations between them. Each paper has a BoW (Bag-of-Words) descriptor containing 1433 words. The challenge at hand involves predicting the specific scientific category to which each paper belongs to, selecting from a pool of seven distinct categories.


I'd propose a descriptive statistic about the dataset. Let's calculate cosine similarity between all items, and create a chart that shows:

bins of cosine similarity ranges in terms of BoW representations (1-0.98, 0.98-0.96, etc.)

against the probability (or just counts of pairs having or not having a citation connection on a 2 bidirectional barchart like this of having a citation between them)
This would show how connected the 2 aspects are, how much information is there in incorporating both aspects into our vectors.

This is the distribution of the pairwise cosine similarities.

*In the second bullet point do you want to show how well the cosine similarities reflect connections in the graph?
I don't exactly get it how the plot should look like.
Additionally I can visualize the ROC curce of nodes being connected predicted based on BoW feature cosine similarity - that would tell us something like *.

Added this part in the latest commit. For me it feels a bit odd, we should tell the reader why we need this statistic. Do you have any idea how to blend it in more to the "story line"?

morkapronczay · 2023-12-19T17:36:53Z

docs/use_cases/node_representation_learning.md

+
+The results are slightly worse (3%) than the results we got by combining Node2Vec with BoW features however, remember that with this model we can embed completely new nodes too. If our scenario requires inductiveness, GraphSAGE might be a better solution however, if we had a transductive setting, Node2Vec would give us a better solution.
+
+## Conclusion


Don't you think it would be worth embedding the text of the papers with some sentence transformer model also? And repeat the scenarios where it is concatenated to node2vec?
GraphSage works on the vectors, or does it embed the text itself? Because it could be worth adding it to that scenario as well. This is a reasonably sized, relatively well performing model.

Sure, I will try to do that. Unfortunately the torch_geometric dataset does not contain the text of the articles. However, I found the original data (from which the torch_geometric dataset should be derived) that contains paper extracts. I will try to match the paper IDs and embed the abstracts with the LLM.

GraphSAGE uses the BoW features as input. Also we can try to train the sage model with the LLM features.

morkapronczay

Loving this! I suggested 2 small typo changes, this can go to Robert!

morkapronczay · 2023-12-20T17:12:44Z

docs/use_cases/node_representation_learning.md

+
+In this plot, we divided the groups (shown on the y-axis) to have about the same number of pairs in each. The only exception was the 0-0.04 group, where lots of pairs had no similar words - they couldn't be split into smaller groups.
+
+From the plot, it's clear that connected nodes usually have higher cosine similarities. This means papers that cite each other often use similar words. But when we ignore zero similarities, papers that have note cited each other seem to have a wide range of common words.


Suggested change

From the plot, it's clear that connected nodes usually have higher cosine similarities. This means papers that cite each other often use similar words. But when we ignore zero similarities, papers that have note cited each other seem to have a wide range of common words.

From the plot, it's clear that connected nodes usually have higher cosine similarities. This means papers that cite each other often use similar words. But when we ignore zero similarities, papers that have not cited each other seem to have a wide range of common words.

docs/use_cases/node_representation_learning.md

Co-authored-by: Mór Kapronczay <mor.kapronczay@gmail.com>

krichard98 added 13 commits December 17, 2023 22:50

Initial post

1e0043b

Improve introduction and N2V

e08d7ef

.

e998181

reset table

4cd3f2e

Improve N2V GraphSAGE transition

c973f7e

Add GraphSAGE intro

455ff2e

GraphSAGE results

7d010e2

Fix batch size

4155619

Minor fixes

5e4e78d

Update table with actual results

4197bd4

V1

f59b7a7

Rename article file

07b2b85

Update title

77ad963

morkapronczay added the stage: content review PR under review of the high level content direction label Dec 19, 2023

morkapronczay self-assigned this Dec 19, 2023

morkapronczay requested changes Dec 19, 2023

View reviewed changes

krichard98 added 8 commits December 20, 2023 14:44

Add SEO text

07ddfbb

Add cosine similarity plot

42d9070

Add bin chart for N2V + rephrased some parts

b5ae0e6

Update numbers and plots

2eabcd9

Add LLM results

bb43c7d

Minor improvements

95c8f14

fixes in conclusion

910fa3e

Better explanation for N2V p,q parameters

baf4c2f

morkapronczay approved these changes Dec 21, 2023

View reviewed changes

morkapronczay added stage: style review PR under review for style guide compliance ( https://hub.superlinked.com/contributing ) and removed stage: content review PR under review of the high level content direction labels Dec 21, 2023

ricsi98 and others added 3 commits December 21, 2023 12:03

Update docs/use_cases/node_representation_learning.md

f794fd6

Co-authored-by: Mór Kapronczay <mor.kapronczay@gmail.com>

Update docs/use_cases/node_representation_learning.md

05abc51

Co-authored-by: Mór Kapronczay <mor.kapronczay@gmail.com>

Merge branch 'main' into stage2

4ce45ad

robertdhayanturner merged commit 035426e into superlinked:main Jan 2, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use-case article: Representation Learning on Graph Structured Data #25

Use-case article: Representation Learning on Graph Structured Data #25

ricsi98 commented Dec 18, 2023

morkapronczay left a comment

morkapronczay Dec 19, 2023

morkapronczay Dec 19, 2023

morkapronczay Dec 19, 2023

ricsi98 Dec 20, 2023

ricsi98 Dec 20, 2023

ricsi98 Dec 20, 2023

ricsi98 Dec 20, 2023

morkapronczay Dec 19, 2023

ricsi98 Dec 20, 2023

morkapronczay left a comment

morkapronczay Dec 20, 2023


		Different types of information, like words, pictures, and connections between things, show us different sides of the world. Relationships, especially, are interesting because they show how things interact and create networks. In this post, we'll talk about how we can use these relationships to understand and describe things in a network better.

		We're diving into a real-life example to explain how entities can be turned into vectors using their connections, a common practice in machine learning. The dataset we're going to work with is the a subset of the Cora citation network. It comprises 2708 scientific papers (nodes) and the connections indicate citations between them. Each paper has a BoW (Bag-of-Words) descriptor containing 1433 words. The challenge at hand involves predicting the specific scientific category to which each paper belongs to, selecting from a pool of seven distinct categories.


		The results are slightly worse (3%) than the results we got by combining Node2Vec with BoW features however, remember that with this model we can embed completely new nodes too. If our scenario requires inductiveness, GraphSAGE might be a better solution however, if we had a transductive setting, Node2Vec would give us a better solution.

		## Conclusion


		In this plot, we divided the groups (shown on the y-axis) to have about the same number of pairs in each. The only exception was the 0-0.04 group, where lots of pairs had no similar words - they couldn't be split into smaller groups.

		From the plot, it's clear that connected nodes usually have higher cosine similarities. This means papers that cite each other often use similar words. But when we ignore zero similarities, papers that have note cited each other seem to have a wide range of common words.

Use-case article: Representation Learning on Graph Structured Data #25

Use-case article: Representation Learning on Graph Structured Data #25

Conversation

ricsi98 commented Dec 18, 2023

morkapronczay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morkapronczay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment