Hetero subgraph with dispatching #43

ZenoTan · 2022-05-04T10:42:50Z

No description provided.

…ero_subgraph

codecov-commenter · 2022-05-05T23:40:37Z

Codecov Report

Merging #43 (48c115d) into master (afcc419) will decrease coverage by 0.75%.
The diff coverage is 93.50%.

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
- Coverage   97.27%   96.51%   -0.76%     
==========================================
  Files          10       12       +2     
  Lines         220      287      +67     
==========================================
+ Hits          214      277      +63     
- Misses          6       10       +4

Impacted Files	Coverage Δ
pyg_lib/csrc/sampler/cpu/mapper.h	`80.00% <75.00%> (ø)`
pyg_lib/csrc/utils/hetero_dispatch.h	`77.77% <77.77%> (ø)`
pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp	`100.00% <100.00%> (ø)`
pyg_lib/csrc/sampler/subgraph.cpp	`100.00% <100.00%> (ø)`
pyg_lib/csrc/utils/types.h	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afcc419...48c115d. Read the comment docs.

for more information, see https://pre-commit.ci

was checking out wrong commit this morning.

…ero_subgraph

pyg_lib/csrc/sampler/cpu/mapper.h

yaoyaowd · 2022-05-12T17:06:22Z

pyg_lib/csrc/sampler/cpu/mapper.h

@@ -23,26 +25,28 @@ class Mapper {

  void fill(const scalar_t* nodes_data, const scalar_t size) {
    if (use_vec) {
-      for (scalar_t i = 0; i < size; ++i)
+      for (scalar_t i = 0; i < size; ++i) {


Let me post my question here, I read some documents and based on my understanding scalar_t includes both float, double, int32, int64 during compile. But in a lot of our usecases we are iterating over integers. How does pytorch avoid compile float type for these functions? Is there a better way to be more specific to the data types here?

There are some helper functions like is_integral for a dtype, but IMO it is mostly runtime checking. We can also use some STL type checking for compile time.

The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

pyg_lib/csrc/utils/types.h

yaoyaowd · 2022-05-12T17:26:16Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

-  });
-
-  return std::make_tuple(out_rowptr, out_col, out_edge_id);
+  return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);


The code structure looks a little weird to me because csrc/sampler/cpu/subgraph_kernel exists for register TORCH_LIBRARY_IMPL and it is using a general implementation in csr/sampler/subgraph.cpp. How about reorganize the code like this:

csr - ops # all ops expose for pytorch. - sampler # all general graph operation. - sampler

We don't need to refactor the code structure now. But want to hear your opinion.

nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

We could follow the style in other pyg repos: put CPU/GPU specific impl in separate folders and provide common interface in a higher directory.

yaoyaowd · 2022-05-12T18:27:54Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

-  });
-
-  return std::make_tuple(out_rowptr, out_col, out_edge_id);
+  return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);


nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

pyg_lib/csrc/utils/hetero_dispatch.h

yaoyaowd · 2022-05-13T22:46:16Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

+
+        auto res = subgraph_with_mapper<scalar_t>(rowptr, col, src_nodes,
+                                                  mapper, return_edge_id);
+        out_rowptr = std::get<0>(res);


or maybe we could do std::tie(out_powptr, out_col, out_edge_id) = res?

yaoyaowd · 2022-05-13T22:52:49Z

pyg_lib/csrc/sampler/subgraph.cpp

+
+  for (const auto& kv : rowptr) {
+    const auto& edge_type = kv.key();
+    bool pass = filter_args_by_edge(edge_type, src_nodes_arg, dst_nodes_arg,


I'd still prefer

pass = src_nodes_args.filter_by_edge(edge_type) && dst_nodes_args.filter_by_edge(edge_type) && edge_id_arg.filter_by_edge(edge_type)

or from an efficiency point of view.

auto dst = get_dst(edge_type) auto src = get_src(edge_type) bool pass = return_edge_id.counts(edge_type) > 0 && src_nodes.counts(src) > 0 && dst_nodes.counts(dst) > 0;

yaoyaowd · 2022-05-13T22:53:43Z

pyg_lib/csrc/sampler/subgraph.cpp

+      const auto& r = rowptr.at(edge_type);
+      const auto& c = col.at(edge_type);
+      res.insert(edge_type,
+                 subgraph_bipartite(r, c, std::get<0>(vals), std::get<1>(vals),


and here would just be

subgraph_bipartite(r, c, src_nodes.at(src), dst_nodes.at(dst), return_edge_id.at(edge_type));

CHANGELOG.md

rusty1s · 2022-05-06T12:12:10Z

pyg_lib/csrc/sampler/subgraph.cpp

@@ -25,10 +28,42 @@ std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>> subgraph(
  return op.call(rowptr, col, nodes, return_edge_id);
 }

+c10::Dict<utils::edge_t,


I actually would have expected we return a tuple of dictionaries, similar to how the input looks like.

test/csrc/utils/test_utils.cpp

rusty1s · 2022-05-15T16:19:33Z

pyg_lib/csrc/sampler/cpu/mapper.h

@@ -23,26 +25,28 @@ class Mapper {

  void fill(const scalar_t* nodes_data, const scalar_t size) {
    if (use_vec) {
-      for (scalar_t i = 0; i < size; ++i)
+      for (scalar_t i = 0; i < size; ++i) {


The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

rusty1s · 2022-05-15T16:21:09Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

-            offset++;
-          }
+  AT_DISPATCH_INTEGRAL_TYPES(
+      nodes.scalar_type(), "subgraph_kernel_with_mapper", [&] {


Can we make this a one-liner again?

rusty1s · 2022-05-15T16:25:25Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

 }  // namespace

 TORCH_LIBRARY_IMPL(pyg, CPU, m) {
  m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph"), TORCH_FN(subgraph_kernel));
+  m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph_bipartite"),


Any reason we want to expose that? Looks more like an internal function to me.

If the user want to build a subgraph of a bipartite graph then he can use it.

rusty1s · 2022-05-15T16:26:21Z

pyg_lib/csrc/sampler/subgraph.cpp

+}
+
+c10::Dict<utils::EdgeType,
+          std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>


IMO, the output should be a tuple of dictionaries (similar to the input).

rusty1s · 2022-05-15T16:27:16Z

pyg_lib/csrc/sampler/subgraph.cpp

+    if (pass) {
+      const auto& r = rowptr.at(edge_type);
+      const auto& c = col.at(edge_type);
+      res.insert(edge_type, subgraph_bipartite(


Shouldn't we user the mapper here? Other-wise, we will re-map across every edge type.

Yes it has a cost, but the mapper is more read-intensive. I will add a TODO here.

rusty1s · 2022-05-15T16:28:03Z

pyg_lib/csrc/utils/types.h

+
+inline NodeType get_dst(const EdgeType& e) {
+  return e.substr(e.find_last_of(SPLIT_TOKEN) + 1);
+}


We could also add a function that maps tuples to strings and vice versa.

rusty1s · 2022-05-15T16:31:35Z

pyg_lib/csrc/sampler/subgraph.cpp

+          std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>
+hetero_subgraph(const utils::EdgeTensorDict& rowptr,
+                const utils::EdgeTensorDict& col,
+                const utils::NodeTensorDict& src_nodes,


Not sure why we have both src_nodes and dst_nodes. IMO, these can be safely merged as in https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.HeteroData.subgraph.

Separating src and dst is just to give some flexibility. We could also have the merged API though.

Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>

ZenoTan added 4 commits May 3, 2022 16:19

init hetero subgraph

2a50f4c

update

cba41dd

Merge branch 'master' of https://github.com/pyg-team/pyg-lib into het…

7494c2d

…ero_subgraph

hetero dispatch logic

f87e8f3

ZenoTan changed the title ~~[WIP] Hetero subgraph API~~ Hetero subgraph API May 5, 2022

ZenoTan changed the title ~~Hetero subgraph API~~ Hetero subgraph with dispatching May 5, 2022

ZenoTan and others added 2 commits May 6, 2022 12:03

update

8a1cb14

[pre-commit.ci] auto fixes from pre-commit.com hooks

08faf36

for more information, see https://pre-commit.ci

ZenoTan self-assigned this May 6, 2022

ZenoTan added 0 - Priority P0 feature sampler labels May 6, 2022

ZenoTan requested review from rusty1s and yaoyaowd and removed request for yaoyaowd and rusty1s May 6, 2022 12:05

yaoyaowd previously approved these changes May 6, 2022

View reviewed changes

ZenoTan added 3 commits May 11, 2022 19:24

Merge branch 'master' of https://github.com/pyg-team/pyg-lib into het…

c59be3f

…ero_subgraph

refactor

dd2e8b6

better structure

d1c98cc

ZenoTan requested a review from yaoyaowd May 12, 2022 12:25

yaoyaowd reviewed May 12, 2022

View reviewed changes

ZenoTan added 3 commits May 12, 2022 22:30

fix type name

0a4bc01

structure

1f20313

simplify code using etype loop

f8f9059

ZenoTan requested review from yaoyaowd May 13, 2022 22:17

yaoyaowd reviewed May 13, 2022

View reviewed changes

fix comments

c4c446a

rusty1s reviewed May 15, 2022

View reviewed changes

ZenoTan and others added 3 commits May 15, 2022 17:36

Update CHANGELOG.md

487eaf6

Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>

Update test/csrc/utils/test_utils.cpp

48c115d

Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>

Merge branch 'master' into hetero_subgraph

663a675

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hetero subgraph with dispatching #43

Hetero subgraph with dispatching #43

ZenoTan commented May 4, 2022

codecov-commenter commented May 5, 2022 •

edited

Loading

yaoyaowd May 12, 2022

ZenoTan May 13, 2022

rusty1s May 15, 2022

yaoyaowd May 12, 2022

yaoyaowd May 12, 2022

ZenoTan May 13, 2022

yaoyaowd May 12, 2022

yaoyaowd May 13, 2022

rusty1s May 15, 2022

yaoyaowd May 13, 2022

yaoyaowd May 13, 2022

rusty1s May 6, 2022

rusty1s May 15, 2022

rusty1s May 15, 2022

rusty1s May 15, 2022

ZenoTan May 15, 2022

rusty1s May 15, 2022

rusty1s May 15, 2022

ZenoTan May 15, 2022

rusty1s May 15, 2022 •

edited

Loading

ZenoTan May 15, 2022

rusty1s May 15, 2022

ZenoTan May 15, 2022

Hetero subgraph with dispatching #43

Are you sure you want to change the base?

Hetero subgraph with dispatching #43

Conversation

ZenoTan commented May 4, 2022

codecov-commenter commented May 5, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rusty1s May 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented May 5, 2022 •

edited

Loading

rusty1s May 15, 2022 •

edited

Loading