SYNC-HAZARD-READ-AFTER-WRITE validation error (multithreaded, MoltenVK) #6344

simonask · 2024-09-30T11:29:57Z

Description
Scenario:

Multiple threads are submitting commands to the same device/queue pair (e.g. through test cases running with cargo test).
Each thread has a complicated setup of buffers and textures with various combinations of buffer/texture usage flags.
Each thread is submitting both render passes and compute passes.
Occasionally, the Vulkan validation layers produce the error below, but it doesn't seem to be reproducible with --test-threads=1.
The error also seems sensitive to the size of the involved buffers. For example, some of the involved vertex buffers occasionally have a size of 0.
I'm using some unsafe APIs to create shader modules from raw precompiled SPIR-V, and to enable some Vulkan-specific shader extensions, but my understanding is that this validation error is on the host/driver side, and should not be something that an invalid shader can trigger. Could be wrong though? The SPIR-V is generated by slangc and should be correct and matching the environment (flavor profile glsl_460).

Theories:

It feels likely that this is caused by a missing barrier in wgpu. Maybe related to Metal Autosync Doesn't Barrier Vertex Shader Write -> Copy Source Hazards #4732 (but doesn't explicitly involve an overlap in writable resources between different command encoders), or Vulkan SYNC-HAZARD-WRITE-AFTER-WRITE validation errors with Bevy. #5373 (but is a different sync hazard kind, so potentially a different barrier type).
Maybe a buffer memory alignment error, or a buffer barrier "provenance" error with zero-sized buffers?

Validation error:

2024-09-30T11:06:49.048183Z ERROR wgpu_hal::vulkan::instance: VALIDATION [SYNC-HAZARD-READ-AFTER-WRITE (0xe4d96472)]
        Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x111751c18, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0xe4d96472 | vkQueueSubmit():  Hazard READ_AFTER_WRITE for entry 3, VkCommandBuffer 0x13202d138[], Submitted access info (submitted_usage: SYNC_VERTEX_ATTRIBUTE_INPUT_VERTEX_ATTRIBUTE_READ, command: vkCmdDraw, seq_no: 2, reset_no: 1). Access info (prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: SYNC_FRAGMENT_SHADER_COLOR_ATTACHMENT_READ|SYNC_FRAGMENT_SHADER_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_FRAGMENT_SHADER_INPUT_ATTACHMENT_READ|SYNC_EARLY_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_EARLY_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_WRITE|SYNC_LATE_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_LATE_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_WRITE|SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_READ|SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_WRITE|SYNC_SUBPASS_SHADER_HUAWEI_INPUT_ATTACHMENT_READ, queue: VkQueue 0x111751c18[], submit: 0, batch: 0, batch_tag: 1, command: vkCmdCopyBuffer, command_buffer: VkCommandBuffer 0x111641e78[(wgpu internal) PendingWrites], seq_no: 7, reset_no: 1).    
2024-09-30T11:06:49.048666Z ERROR wgpu_hal::vulkan::instance:   objects: (type: QUEUE, hndl: 0x111751c18, name: ?)

Repro steps
Unfortunately I have found it extremely difficult to reproduce outside of my rather complicated code base. My approach has been to take the RUST_LOG=trace output and writing code that produces the same log output using raw wgpu APIs, but no luck. But my attempt to reproduce was also incomplete, in that I did not create all of the same resources (shader modules, pipeline layouts, pipelines, etc.).

Expected vs observed behavior
I see sync validation errors from Vulkan, and I expected wgpu to automatically insert all sync barriers. :-)

Extra materials
Since the bug is only (seemingly) apparent with multithreaded use, it's extremely difficult to get a useful trace output. Let me know if that would be helpful, though.

Platform
MacBook Pro M1 (macOS 15.0 Sequoia), latest MoltenVK on Vulkan apiVersion 1.2.283, custom rendering engine built on top of wgpu.

The text was updated successfully, but these errors were encountered:

simonask · 2024-09-30T11:59:55Z

Alright, I'm actually able to reproduce this on Windows 11 as well (NVIDIA, latest Vulkan SDK).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYNC-HAZARD-READ-AFTER-WRITE validation error (multithreaded, MoltenVK) #6344

SYNC-HAZARD-READ-AFTER-WRITE validation error (multithreaded, MoltenVK) #6344

simonask commented Sep 30, 2024

simonask commented Sep 30, 2024

SYNC-HAZARD-READ-AFTER-WRITE validation error (multithreaded, MoltenVK) #6344

SYNC-HAZARD-READ-AFTER-WRITE validation error (multithreaded, MoltenVK) #6344

Comments

simonask commented Sep 30, 2024

simonask commented Sep 30, 2024