Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hetero subgraph with dispatching #43

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Hetero subgraph with dispatching #43

wants to merge 16 commits into from

Conversation

ZenoTan
Copy link
Member

@ZenoTan ZenoTan commented May 4, 2022

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented May 5, 2022

Codecov Report

Merging #43 (48c115d) into master (afcc419) will decrease coverage by 0.75%.
The diff coverage is 93.50%.

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
- Coverage   97.27%   96.51%   -0.76%     
==========================================
  Files          10       12       +2     
  Lines         220      287      +67     
==========================================
+ Hits          214      277      +63     
- Misses          6       10       +4     
Impacted Files Coverage Δ
pyg_lib/csrc/sampler/cpu/mapper.h 80.00% <75.00%> (ø)
pyg_lib/csrc/utils/hetero_dispatch.h 77.77% <77.77%> (ø)
pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp 100.00% <100.00%> (ø)
pyg_lib/csrc/sampler/subgraph.cpp 100.00% <100.00%> (ø)
pyg_lib/csrc/utils/types.h 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afcc419...48c115d. Read the comment docs.

@ZenoTan ZenoTan changed the title [WIP] Hetero subgraph API Hetero subgraph API May 5, 2022
@ZenoTan ZenoTan changed the title Hetero subgraph API Hetero subgraph with dispatching May 5, 2022
@ZenoTan ZenoTan self-assigned this May 6, 2022
@ZenoTan ZenoTan requested review from rusty1s and yaoyaowd and removed request for yaoyaowd and rusty1s May 6, 2022 12:05
yaoyaowd
yaoyaowd previously approved these changes May 6, 2022
@yaoyaowd yaoyaowd dismissed their stale review May 6, 2022 17:30

was checking out wrong commit this morning.

@ZenoTan ZenoTan requested a review from yaoyaowd May 12, 2022 12:25
pyg_lib/csrc/sampler/cpu/mapper.h Outdated Show resolved Hide resolved
@@ -23,26 +25,28 @@ class Mapper {

void fill(const scalar_t* nodes_data, const scalar_t size) {
if (use_vec) {
for (scalar_t i = 0; i < size; ++i)
for (scalar_t i = 0; i < size; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me post my question here, I read some documents and based on my understanding scalar_t includes both float, double, int32, int64 during compile. But in a lot of our usecases we are iterating over integers. How does pytorch avoid compile float type for these functions? Is there a better way to be more specific to the data types here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some helper functions like is_integral for a dtype, but IMO it is mostly runtime checking. We can also use some STL type checking for compile time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

pyg_lib/csrc/utils/types.h Outdated Show resolved Hide resolved
});

return std::make_tuple(out_rowptr, out_col, out_edge_id);
return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code structure looks a little weird to me because csrc/sampler/cpu/subgraph_kernel exists for register TORCH_LIBRARY_IMPL and it is using a general implementation in csr/sampler/subgraph.cpp. How about reorganize the code like this:

csr
  - ops
    # all ops expose for pytorch.
    - sampler
  # all general graph operation.
  - sampler

We don't need to refactor the code structure now. But want to hear your opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could follow the style in other pyg repos: put CPU/GPU specific impl in separate folders and provide common interface in a higher directory.

});

return std::make_tuple(out_rowptr, out_col, out_edge_id);
return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

pyg_lib/csrc/utils/hetero_dispatch.h Outdated Show resolved Hide resolved
@ZenoTan ZenoTan requested review from yaoyaowd May 13, 2022 22:17

auto res = subgraph_with_mapper<scalar_t>(rowptr, col, src_nodes,
mapper, return_edge_id);
out_rowptr = std::get<0>(res);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we could do std::tie(out_powptr, out_col, out_edge_id) = res?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


for (const auto& kv : rowptr) {
const auto& edge_type = kv.key();
bool pass = filter_args_by_edge(edge_type, src_nodes_arg, dst_nodes_arg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer

pass = src_nodes_args.filter_by_edge(edge_type) && dst_nodes_args.filter_by_edge(edge_type) && edge_id_arg.filter_by_edge(edge_type)

or from an efficiency point of view.

auto dst = get_dst(edge_type)
auto src = get_src(edge_type)
bool pass = return_edge_id.counts(edge_type) > 0 && src_nodes.counts(src) > 0 && dst_nodes.counts(dst) > 0;

const auto& r = rowptr.at(edge_type);
const auto& c = col.at(edge_type);
res.insert(edge_type,
subgraph_bipartite(r, c, std::get<0>(vals), std::get<1>(vals),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here would just be

subgraph_bipartite(r, c, src_nodes.at(src), dst_nodes.at(dst), return_edge_id.at(edge_type));

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -25,10 +28,42 @@ std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>> subgraph(
return op.call(rowptr, col, nodes, return_edge_id);
}

c10::Dict<utils::edge_t,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually would have expected we return a tuple of dictionaries, similar to how the input looks like.

test/csrc/utils/test_utils.cpp Outdated Show resolved Hide resolved
@@ -23,26 +25,28 @@ class Mapper {

void fill(const scalar_t* nodes_data, const scalar_t size) {
if (use_vec) {
for (scalar_t i = 0; i < size; ++i)
for (scalar_t i = 0; i < size; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

offset++;
}
AT_DISPATCH_INTEGRAL_TYPES(
nodes.scalar_type(), "subgraph_kernel_with_mapper", [&] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a one-liner again?

} // namespace

TORCH_LIBRARY_IMPL(pyg, CPU, m) {
m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph"), TORCH_FN(subgraph_kernel));
m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph_bipartite"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we want to expose that? Looks more like an internal function to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user want to build a subgraph of a bipartite graph then he can use it.

}

c10::Dict<utils::EdgeType,
std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the output should be a tuple of dictionaries (similar to the input).

if (pass) {
const auto& r = rowptr.at(edge_type);
const auto& c = col.at(edge_type);
res.insert(edge_type, subgraph_bipartite(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we user the mapper here? Other-wise, we will re-map across every edge type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it has a cost, but the mapper is more read-intensive. I will add a TODO here.


inline NodeType get_dst(const EdgeType& e) {
return e.substr(e.find_last_of(SPLIT_TOKEN) + 1);
}
Copy link
Member

@rusty1s rusty1s May 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add a function that maps tuples to strings and vice versa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>
hetero_subgraph(const utils::EdgeTensorDict& rowptr,
const utils::EdgeTensorDict& col,
const utils::NodeTensorDict& src_nodes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we have both src_nodes and dst_nodes. IMO, these can be safely merged as in https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.HeteroData.subgraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating src and dst is just to give some flexibility. We could also have the merged API though.

ZenoTan and others added 3 commits May 15, 2022 17:36
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants