Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYNC-HAZARD-READ-AFTER-WRITE validation error (multithreaded, MoltenVK) #6344

Open
simonask opened this issue Sep 30, 2024 · 1 comment
Open

Comments

@simonask
Copy link

Description
Scenario:

  1. Multiple threads are submitting commands to the same device/queue pair (e.g. through test cases running with cargo test).
  2. Each thread has a complicated setup of buffers and textures with various combinations of buffer/texture usage flags.
  3. Each thread is submitting both render passes and compute passes.
  4. Occasionally, the Vulkan validation layers produce the error below, but it doesn't seem to be reproducible with --test-threads=1.
  5. The error also seems sensitive to the size of the involved buffers. For example, some of the involved vertex buffers occasionally have a size of 0.
  6. I'm using some unsafe APIs to create shader modules from raw precompiled SPIR-V, and to enable some Vulkan-specific shader extensions, but my understanding is that this validation error is on the host/driver side, and should not be something that an invalid shader can trigger. Could be wrong though? The SPIR-V is generated by slangc and should be correct and matching the environment (flavor profile glsl_460).

Theories:

  1. It feels likely that this is caused by a missing barrier in wgpu. Maybe related to Metal Autosync Doesn't Barrier Vertex Shader Write -> Copy Source Hazards #4732 (but doesn't explicitly involve an overlap in writable resources between different command encoders), or Vulkan SYNC-HAZARD-WRITE-AFTER-WRITE validation errors with Bevy. #5373 (but is a different sync hazard kind, so potentially a different barrier type).
  2. Maybe a buffer memory alignment error, or a buffer barrier "provenance" error with zero-sized buffers?

Validation error:

2024-09-30T11:06:49.048183Z ERROR wgpu_hal::vulkan::instance: VALIDATION [SYNC-HAZARD-READ-AFTER-WRITE (0xe4d96472)]
        Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x111751c18, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0xe4d96472 | vkQueueSubmit():  Hazard READ_AFTER_WRITE for entry 3, VkCommandBuffer 0x13202d138[], Submitted access info (submitted_usage: SYNC_VERTEX_ATTRIBUTE_INPUT_VERTEX_ATTRIBUTE_READ, command: vkCmdDraw, seq_no: 2, reset_no: 1). Access info (prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: SYNC_FRAGMENT_SHADER_COLOR_ATTACHMENT_READ|SYNC_FRAGMENT_SHADER_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_FRAGMENT_SHADER_INPUT_ATTACHMENT_READ|SYNC_EARLY_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_EARLY_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_WRITE|SYNC_LATE_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_LATE_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_WRITE|SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_READ|SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_WRITE|SYNC_SUBPASS_SHADER_HUAWEI_INPUT_ATTACHMENT_READ, queue: VkQueue 0x111751c18[], submit: 0, batch: 0, batch_tag: 1, command: vkCmdCopyBuffer, command_buffer: VkCommandBuffer 0x111641e78[(wgpu internal) PendingWrites], seq_no: 7, reset_no: 1).    
2024-09-30T11:06:49.048666Z ERROR wgpu_hal::vulkan::instance:   objects: (type: QUEUE, hndl: 0x111751c18, name: ?)  

Repro steps
Unfortunately I have found it extremely difficult to reproduce outside of my rather complicated code base. My approach has been to take the RUST_LOG=trace output and writing code that produces the same log output using raw wgpu APIs, but no luck. But my attempt to reproduce was also incomplete, in that I did not create all of the same resources (shader modules, pipeline layouts, pipelines, etc.).

Expected vs observed behavior
I see sync validation errors from Vulkan, and I expected wgpu to automatically insert all sync barriers. :-)

Extra materials
Since the bug is only (seemingly) apparent with multithreaded use, it's extremely difficult to get a useful trace output. Let me know if that would be helpful, though.

Platform
MacBook Pro M1 (macOS 15.0 Sequoia), latest MoltenVK on Vulkan apiVersion 1.2.283, custom rendering engine built on top of wgpu.

@simonask
Copy link
Author

Alright, I'm actually able to reproduce this on Windows 11 as well (NVIDIA, latest Vulkan SDK).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant