You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Backward of SpMM and SDDMM is supported in branch dev_spmm.
However, In pass sampler, the backward of gs.ops.u_mul_v(subA, u_feats @ W_2, v_feats @ W_2), i.e. (dX = gspmm(_gidx, "mul", "sum", Y, dZ, rev_format)), produces nan values while its inputs have no nan vlaues.
To reproduce:
$ git checkout origin/dev_spmm
$ build and install the project
$ cd examples/pass
$ python train_minibatch.py
Namespace(device='cuda', use_uva=None, dataset='reddit', batchsize=512, samples='10,10', num_workers=0)
Graph(num_nodes=232965, num_edges=114848857,
ndata_schemes={}
edata_schemes={})
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20230207 03:35:00.528640 18177 graph.cc:19] Loaded CSC with 232965 nodes and 114848857 edges
Check load successfully: [None, None, tensor([1., 1., 1., ..., 1., 1., 1.], device='cuda:0'), tensor([ 0, 2205, 2360, ..., 114848225, 114848365,
114848857], device='cuda:0'), tensor([225202, 177307, 107546, ..., 232594, 232634, 232964], device='cuda:0')]
memory allocated before training: 2.2396583557128906 GB
0%|| 0/300 [00:00<?, ?it/s]
/home/ubuntu/aws_projects/graph_sampling/examples/pass/train_minibatch.py:148: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging.
with torch.autograd.detect_anomaly():
/home/ubuntu/aws_projects/graph_sampling/examples/pass/train_minibatch.py:156: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging.
with torch.autograd.detect_anomaly():
/home/ubuntu/anaconda3/envs/dgl/lib/python3.9/site-packages/torch/autograd/__init__.py:173: UserWarning: Error detected in GSDDMMBackward. Traceback of forward call that caused the error:
File "/home/ubuntu/aws_projects/graph_sampling/examples/pass/train_minibatch.py", line 247, in<module>
train(dataset, args)
File "/home/ubuntu/aws_projects/graph_sampling/examples/pass/train_minibatch.py", line 125, in train
input_nodes, output_nodes, blocks, loss_tuple = compiled_func(
File "/home/ubuntu/aws_projects/graph_sampling/examples/pass/train_minibatch.py", line 33, in matrix_sampler
att2 = torch.sum(gs.ops.u_mul_v(subA, u_feats @ W_2,
File "/home/ubuntu/anaconda3/envs/dgl/lib/python3.9/site-packages/gs-0.1-py3.9.egg/gs/ops/sddmm.py", line 115, in func
return gsddmm(g, binary_op, x, y,
File "/home/ubuntu/anaconda3/envs/dgl/lib/python3.9/site-packages/gs-0.1-py3.9.egg/gs/ops/sddmm.py", line 72, in gsddmm
return gsddmm_internal(
File "/home/ubuntu/anaconda3/envs/dgl/lib/python3.9/site-packages/gs-0.1-py3.9.egg/gs/ops/sparse.py", line 286, in gsddmm
return GSDDMM.apply(gidx, op, lhs_data, rhs_data, lhs_target, rhs_target, on_format)
(Triggered internally at /opt/conda/conda-bld/pytorch_1656352657443/work/torch/csrc/autograd/python_anomaly_mode.cpp:102.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
0%|| 0/300 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/ubuntu/aws_projects/graph_sampling/examples/pass/train_minibatch.py", line 247, in<module>
train(dataset, args)
File "/home/ubuntu/aws_projects/graph_sampling/examples/pass/train_minibatch.py", line 157, in train
sample_loss.backward()
File "/home/ubuntu/anaconda3/envs/dgl/lib/python3.9/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/ubuntu/anaconda3/envs/dgl/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Function 'GSDDMMBackward' returned nan values in its 0th output.
The text was updated successfully, but these errors were encountered:
Backward of SpMM and SDDMM is supported in branch
dev_spmm
.However, In pass sampler, the backward of
gs.ops.u_mul_v(subA, u_feats @ W_2, v_feats @ W_2)
, i.e. (dX = gspmm(_gidx, "mul", "sum", Y, dZ, rev_format)
), produces nan values while its inputs have no nan vlaues.To reproduce:
The text was updated successfully, but these errors were encountered: