Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASHES WITH CUDA ERROR: DEVICE-SIDE ASSERT TRIGGERED #114

Open
gregbugaj opened this issue May 27, 2024 · 0 comments
Open

CRASHES WITH CUDA ERROR: DEVICE-SIDE ASSERT TRIGGERED #114

gregbugaj opened this issue May 27, 2024 · 0 comments
Assignees

Comments

@gregbugaj
Copy link
Collaborator

gregbugaj commented May 27, 2024

Describe the bug

Application crashes the GPU. Sample document ID.

CRASHES WITH CUDA ERROR: DEVICE-SIDE ASSERT TRIGGERED
204092337
203925852
204092927
204092966
204166227
204041606
204040160

205967262 - EOB (medical_page_classifier)
208788841 - CORR
209425570 - CORR
209805214 - CORR
209976466 - CORR
211567153 - CORR
212670800 - CORR
213805705 - CORR / ROTATED
213942700 - CORR / ROTATED
214292051 - CORR
214288815 - CORR
214291267 - CORR / ROTATED
214292900 - CORR / LARGE
214894529 - CORR / ENVELOPE

INFO   marie@37 Executing pipeline for document : PID_1956_9362_0_203925852.tif, lbxid > /tmp/generators/a9de56b33b040d12568f379e0078684a                                           
INFO   marie@37 Executing pipeline runtime_conf : {'name': 'default-corr', 'page_splitter': {'enabled': False}, 'type': 'pipeline', 'page_cleaner': {'enabled':                     
       False}, 'page_classifier': {'enabled': True}}                                                                                                                                
INFO   marie@37 Feature : page classifier enabled : True                                                                                                                            
INFO   marie@37 Feature : page indexer enabled : True                                                                                                                               
INFO   marie@37 Loaded classifiers : corr-classifier, 3                                                                                                                             
INFO   marie@37 Loaded classifiers : corr-payer-classifier, 3                                                                                                                       
INFO   marie@37 Restoring assets from s3://marie/lbxid/pid_1956_9362_0_203925852 to /tmp/generators/a9de56b33b040d12568f379e0078684a                             [05/15/24 14:38:45]
INFO   marie@37 Bursting frames for PID_1956_9362_0_203925852.tif                                                                                                                   
INFO   marie@37 Processing classifier pipeline/group :  default-corr, corr-classifier                                                                                               
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [166,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
ERROR  marie@37 Error classifying document : CUDA error: device-side assert triggered                                                                            [05/15/24 14:38:45]
       CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.                                                      
       For debugging consider passing CUDA_LAUNCH_BLOCKING=1.                                                                                                                       
       Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.                                                                                                          
                                                                                                                                                                                    
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/marie/components/document_classifier/transformers.py", line 244, in predict
    for results in pipe_batched_results:
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1067, in forward
    model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 972, in _ensure_tensor_on_device
    return UserDict({name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()})
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 972, in <dictcomp>
    return UserDict({name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()})
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 980, in _ensure_tensor_on_device
    return inputs.to(device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ERROR  marie@37 Error while classifying documents: CUDA error: device-side assert triggered                                                                      [05/15/24 14:38:45]
       CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.                                                      
       For debugging consider passing CUDA_LAUNCH_BLOCKING=1.                                                                                                                       
       Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.  

@gregbugaj gregbugaj self-assigned this May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant