-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gds: init scc computation using kosaraju's algorithm #4893
base: master
Are you sure you want to change the base?
Conversation
void compute() { | ||
for (const auto& [tableID, numNodes] : numNodesMap) { | ||
sccState.pinTable(tableID); | ||
for (auto& [fromEntry, toEntry, relEntry] : graph->getRelFromToEntryInfos()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to loop through the vertices and their neighbors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. Though I would constraint kosaraju to single rel table for now until we figure out what's the right API for DFS. Let's validate this in StronglyConnectedComponent::bind()
return; | ||
}; | ||
sccState.updateVisit(node.offset, true); | ||
auto scanState = graph->prepareRelScan(relEntry, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This state should be recreated in each recursion, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but the performance might be terrible. I would do something like
auto state = graph->prepareRelScan(); // Always reuse this scan state
dfs(n):
nbrs;
for chunk : scanFwd(state) {
// update nbrs
}
for nbr : nbrs : {
dfs(nbr)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I implemented this if I understood you correctly. Is it the case that maintaining a second copy of the graph is faster than scanning the main graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what do you by a copy of graph. I was trying to reuse scanState in the example above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Let's discuss a bit more with @semihsalihoglu-uw
return; | ||
}; | ||
sccState.updateVisit(node.offset, true); | ||
auto scanState = graph->prepareRelScan(relEntry, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but the performance might be terrible. I would do something like
auto state = graph->prepareRelScan(); // Always reuse this scan state
dfs(n):
nbrs;
for chunk : scanFwd(state) {
// update nbrs
}
for nbr : nbrs : {
dfs(nbr)
}
} | ||
|
||
private: | ||
std::vector<nodeID_t> queue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this will perform if the size of nodes is 1B. Let's see it's performance first.
Benchmark ResultMaster commit hash:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have minor comments.
Let's get some benchmark numbers both in terms of performance and correctness. You can validate this against Neo4j. If the benchmark goes fine, let's extend it to other algorithms as well, Howe did some setup here. You can take a look how much of it will be useful.
Also let's understand if std::queue will scale or not.
return true; | ||
} | ||
|
||
bool getVisit(offset_t offset) { return visits[offset]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to visited
for (auto chunk : graph->scanBwd(nodeID, scanState)) { | ||
chunk.forEach([&](auto nbrNodeID, auto) { nbrs.push_back(nbrNodeID.offset); }); | ||
} | ||
neighbors[node] = nbrs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to save nbrs into neighbors
. I thought you could just keep it as local variable. Same for fwdDFS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps reuse the allocation from forwardDFS
in backwardsDFS
.
return; | ||
}; | ||
sccState.updateVisit(node.offset, true); | ||
auto scanState = graph->prepareRelScan(relEntry, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what do you by a copy of graph. I was trying to reuse scanState in the example above.
void compute() { | ||
for (const auto& [tableID, numNodes] : numNodesMap) { | ||
sccState.pinTable(tableID); | ||
for (auto& [fromEntry, toEntry, relEntry] : graph->getRelFromToEntryInfos()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. Though I would constraint kosaraju to single rel table for now until we figure out what's the right API for DFS. Let's validate this in StronglyConnectedComponent::bind()
Description
Single-threaded implementation of SCC using Kosaraju's_algorithm.
EdgeCompute
, but instead is directly called fromexec()
.Contributor agreement