some problems when I run ./script/template.sh #4

lingtengqiu · 2022-03-10T10:16:43Z

Thanks for your great job.
When I run your code to fit my video.
there has some problem, in train_one_epoch()
I debug it and find the bug happened when

self._num_faces_per_mesh.unique() ==1 403 line in pytorch3d/meshes.py

I got *** RuntimeError: std::bad_alloc*** issue.
I don't know why? all the environments are installed according to your yaml.

2*3090 are used when I run your code.

lingtengqiu · 2022-03-10T11:08:56Z

And,also, when I delete the

rendered_seq, aux_seq = self.eval()

in train.py, the question do not happen. I am very confused about the problem.

gengshan-y · 2022-03-10T15:22:41Z

The issue happened to me when environment/packages are not installed properly. There are two things to check:
Did you follow the readme to install with a new environment. Were you able to run it on the demo video?

lingtengqiu · 2022-03-10T15:55:35Z

Of course, I create conda environment based on your misc/banmo-cu113.yml.
The other problem is when i run # mode: fine tunning without pose correction
I see this problem, due to vars_np['mesh_rest'] is trimesh(0, 3). therefore vars_np['mesh_rest'].bounds return is None.

    pts = trimesh.bounds.corners(vars_np['mesh_rest'].bounds)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/trimesh/bounds.py", line 435, in corners
    raise ValueError('bounds must be (2,2) or (2,3)!')

I I am very interested in your series of work, and Could I get your contact information and ask your more details about ViSER and banmo ^ ^

gengshan-y · 2022-03-10T17:34:24Z

Thanks for the info. Can you provide a full backtrace log? I’ll look into this bug.

You may reach me by email.

lingtengqiu · 2022-03-11T07:01:08Z

Thank you for your kind reply
the log of issue of training:

/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/home/lingteng/2022/project/banmo-main/main.py", line 54, in <module>
    app.run(main)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/lingteng/2022/project/banmo-main/main.py", line 51, in main
    trainer.train()
  File "/home/lingteng/2022/project/banmo-main/nnutils/train_utils.py", line 701, in train
    self.train_one_epoch(epoch, log)
  File "/home/lingteng/2022/project/banmo-main/nnutils/train_utils.py", line 942, in train_one_epoch
    total_loss,aux_out = self.model(batch)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lingteng/2022/project/banmo-main/nnutils/banmo.py", line 652, in forward_default
    mesh_rest = pytorch3d.structures.meshes.Meshes(
  File "/home/lingteng/2022/project/banmo-main/third_party/pytorch3d/pytorch3d/structures/meshes.py", line 408, in __init__
    if len(self._num_faces_per_mesh.unique()) == 1:
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/_tensor.py", line 530, in unique
    return torch.unique(self, sorted=sorted, return_inverse=return_inverse, return_counts=return_counts, dim=dim)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn
    return if_false(*args, **kwargs)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn
    return if_false(*args, **kwargs)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/functional.py", line 821, in _return_output
    output, _, _ = _unique_impl(input, sorted, return_inverse, return_counts, dim)
  File "/home/lingteng/anaconda3/envs/banmo-cu113/lib/python3.9/site-packages/torch/functional.py", line 735, in _unique_impl
    output, inverse_indices, counts = torch._unique2(
RuntimeError: std::bad_alloc
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 451799) of binary: /home/lingteng/anaconda3/envs/banmo-cu113/bin/python

lingtengqiu · 2022-03-11T08:17:05Z

The issue about vars_np['mesh_rest'], I find the reason is that I annotated following code to smooth training since above problem.

# rendered_seq, aux_seq = self.eval()

I find this code is very useful as it is able to update mesh['mesh_rest'].
I do not know if it is 3090 problem. no matter waht runing your code on demo video or my video, 'std::bad_alloc' happend. :（.

gengshan-y · 2022-03-11T15:12:18Z

It was tested on 3090 so I suppose it's not a hardware issue. The error log was not very useful unfortunately. It's likely a pytorch3d issue. While I'm trying to reproduce it on a new machine, there are a few things to try.

Install with [B. torch1.7+cu110] option.
or try moving verts and faces tensors to GPU by .to(self.device) before passing to pytorch3d.structures.meshes.Meshes at here

lingtengqiu · 2022-03-12T02:06:28Z

thanks for your suggestions.
[B. torch1.7+cu110] works well.

Fix issue #4 that causes error on certain archs for torch110+cu113 version

cynthia-you · 2022-11-08T02:13:34Z

Hi, could u tell me the sm of your 3090? If your compute capbility is sm_86, does sm_86 compatible torhc1.7 & cuda11.0? I have a 3070S GPU, and the NVIDIA CUDA DOC said the cuda11.0 support =<sm_80. I'm so confused.

gengshan-y · 2022-11-13T22:21:30Z

To use more recent architectures, you may need to replace these two lines with a compatible cuda version.

For reference I was using

cudatoolkit               11.6.0              hecad31d_10    conda-forge
cudatoolkit-dev           11.3.1           py39h3811e60_0    conda-forge

for 3090s.

Let me know if it does not work.

gengshan-y · 2022-11-13T22:24:34Z

Indeed, you may directly use the cuda 11.3 option for installation here.

lingtengqiu closed this as completed Mar 12, 2022

gengshan-y mentioned this issue Mar 21, 2022

RuntimeError: std::bad_alloc #10

Closed

gengshan-y added the bug Something isn't working label Apr 20, 2022

gengshan-y added a commit that referenced this issue Jun 2, 2022

#4

7773a11

Fix issue #4 that causes error on certain archs for torch110+cu113 version

gengshan-y reopened this Nov 13, 2022

kts707 mentioned this issue Jun 13, 2024

how run camm on one RTX 3090 GPU? kts707/camm#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some problems when I run ./script/template.sh #4

some problems when I run ./script/template.sh #4

lingtengqiu commented Mar 10, 2022

lingtengqiu commented Mar 10, 2022

gengshan-y commented Mar 10, 2022

lingtengqiu commented Mar 10, 2022

gengshan-y commented Mar 10, 2022

lingtengqiu commented Mar 11, 2022

lingtengqiu commented Mar 11, 2022

gengshan-y commented Mar 11, 2022

lingtengqiu commented Mar 12, 2022

cynthia-you commented Nov 8, 2022

gengshan-y commented Nov 13, 2022

gengshan-y commented Nov 13, 2022

some problems when I run ./script/template.sh #4

some problems when I run ./script/template.sh #4

Comments

lingtengqiu commented Mar 10, 2022

lingtengqiu commented Mar 10, 2022

gengshan-y commented Mar 10, 2022

lingtengqiu commented Mar 10, 2022

gengshan-y commented Mar 10, 2022

lingtengqiu commented Mar 11, 2022

lingtengqiu commented Mar 11, 2022

gengshan-y commented Mar 11, 2022

lingtengqiu commented Mar 12, 2022

cynthia-you commented Nov 8, 2022

gengshan-y commented Nov 13, 2022

gengshan-y commented Nov 13, 2022