Ensure safety of indirect dispatch #5714

teoxoy · 2024-05-17T15:38:13Z

Ensure safety of indirect dispatch by injecting a compute shader that validates the content of the indirect buffer.

Part of #2431.

teoxoy · 2024-05-17T16:16:06Z

Hmm, I'm not sure how we should deal with the wgsl feature not being available.

cwfitzgerald · 2024-05-18T02:20:29Z

Let's just require it in core, honestly.

We could build a naga module in a build script, serialize it and store that in the binary, but that's wayyyy to much work for no real gain

teoxoy · 2024-05-20T09:26:12Z

We could build a naga module in a build script, serialize it and store that in the binary, but that's wayyyy to much work for no real gain

That's what I thought as well.

Requiring it in wgpu-core will make wgpu's wgsl feature meaningless as well. Removing the wgsl feature from both wgpu-core and wgpu will effectively revert #2890.

That PR was prefaced with:

It also might turn out entirely unnecessary if the compile-time savings aren't worth the added complexity.

This was also mentioned in #2549 (comment):

If you want to provide more concrete numbers on the binary size (that could be saved), that would help.

@daxpedda were the benefits of #2890 substantial in terms of compile time and binary size?

jimblandy

In our office hours meeting this morning, we came up with a better way to test, by requesting a device with a lower limit than the adapter offers. Testing this way would give a more positive indication that the limit is being imposed.

daxpedda · 2024-05-20T17:45:08Z

If you want to provide more concrete numbers on the binary size (that could be saved), that would help.

@daxpedda were the benefits of #2890 substantial in terms of compile time and binary size?

I just tried a minimal example when using GLSL shaders on Wasm with WebGL.
Using fat LTO, a single codegen unit, reference types and wasm-opt for all scenarios.

	O3 (O4 for `wasm-opt`)	Os	Oz
without `wgsl`	3 081 862	2 070 971	1 871 247
with `wgsl`	3 428 880	2 306 315	2 084 794
diff	(11.26%) 347 018	(11.36%) 235 344	(11.41%) 213 547

Compile times are similar. Total compile time with O3 goes from 12.21s to 13.48s, so again around a 10% increase.
For Naga alone (according to Cargo timings) I'm getting an increase from 7.74s to 9.67s, so around 25% longer.

Keep in mind that this is a minimal example and might not be representative for a full blown application.

teoxoy · 2024-05-21T12:02:52Z

I would say 10% is significant especially for web apps but I'd be curious how much it decreases with larger applications.

It's unfortunate though that since WebGL doesn't support indirect calls, this increase doesn't bring anything of value for this use case.

Maybe we can feature gate the validation on the wgsl feature and enable wgc/wgsl when not(target_arch = "wasm32") in wgpu?

The downside being that wgpu's wgsl feature won't be respected (wgsl input will always be enabled) on native platforms (non-web).

teoxoy · 2024-05-21T12:20:18Z

@jimblandy I lowered the limit required to 10 since the testing infra is not setup to be able to require limits based on what the adapter supports. I think 10 should be safe, the default value is 65535.

teoxoy · 2024-05-21T14:38:43Z

I added another test for the num_workgroups builtin (expected to fail on dx12 - for now).

teoxoy · 2024-05-24T15:35:27Z

I wanted to get CI working so I just implemented #5714 (comment) for now.

teoxoy · 2024-05-24T16:26:17Z

Filed #5739 for the metal issue which I've worked around in this PR.

teoxoy · 2024-05-24T17:05:04Z

The wgpu_test::compute_pass_resource_ownership::compute_pass_resource_ownership test is reliably failing on all platforms.

And is because the internal call to device_create_bind_group tries to get the buffer via its id which should no longer be used.

I was hoping to get away with calling wgpu-core's public APIs to implement this validation but it seems that won't work.

by injecting a compute shader that validates the content of the indirect buffer

…ed to `min_uniform_buffer_offset_alignment` also adds missing indirect buffer offset validation

teoxoy · 2024-05-28T14:01:00Z

Actually, I don't think the test is correct. Shouldn't our Drop implementation behave the same as calling .destroy() from the spec?

The destroyed property is checked by all Queue commands (notably .submit() in this case); the command only succeeding if none of the resources referenced by the command buffer have been tagged as destroyed.

The test was recently added in #5570.

Added a new test that ensures that resources passed into a ComputePass can be dropped before executing the compute pass.
Proved that the test would crash before these changes and passes with them.

@Wumpf could you chime in on this? Did we use to crash or raise a validation error?

cwfitzgerald · 2024-05-28T15:45:47Z

Actually, I don't think the test is correct. Shouldn't our Drop implementation behave the same as calling .destroy() from the spec?

No, drop just removes the user's handle to it, doesn't make access to the resource invalid.

jimblandy

Some thoughts so far:

tests/tests/dispatch_workgroups_indirect.rs

teoxoy · 2024-05-29T09:04:38Z

Actually, I don't think the test is correct. Shouldn't our Drop implementation behave the same as calling .destroy() from the spec?

No, drop just removes the user's handle to it, doesn't make access to the resource invalid.

I see, I will try to convert the code doing bindgroup creation to use Arcs instead of Ids (moving in the direction of #5121) to see how feasible that is.

cwfitzgerald

Performance/code simplification thoughts.

Dispatches are unfortunately not trivial to validate all at once :(

cwfitzgerald · 2024-05-29T20:50:53Z

wgpu-core/src/device/resource.rs

+    #[cfg(feature = "wgsl")]
+    pub(crate) dispatch_indirect_validation_pipeline:
+        Mutex<Option<(id::ComputePipelineId, id::BindGroupLayoutId)>>,


Why can't we create this on device start?

We can, and it will simplify things - will do that.

This is turning out difficult to do because I have to convert all code to hal code.

I think I found a way to get this working by first initializing it with None and later when we have the Arc<Device> and can use all its methods, I set it to Some.

cwfitzgerald · 2024-05-29T21:00:11Z

wgpu-core/src/command/compute.rs

+                        let aligned_offset = offset
+                            - offset % device.limits.min_storage_buffer_offset_alignment as u64;
+
+                        let (bind_group_id, error) = self.device_create_bind_group::<A>(


When do these bind groups/buffers get dropped? As it looks now they will exist forever.

That's a good point, I was thinking it will happen eventually since we use Arcs but I missed we still have the registry and Ids 😅. I think what I have to do is just remove the id from the registry.

cwfitzgerald · 2024-05-29T21:07:38Z

wgpu-core/src/command/compute.rs

+                        let (dst_buffer_id, error) = self.device_create_buffer::<A>(
+                            device_id,
+                            &crate::resource::BufferDescriptor {
+                                label: None,
+                                size: 4 * 3,
+                                usage: wgt::BufferUsages::INDIRECT | wgt::BufferUsages::STORAGE,
+                                mapped_at_creation: false,
+                            },
+                            None,
+                        );


Creating a buffer and bind group every dispatch is going to be very expensive. In this case we can make a single buffer/bind group per queue and a single bind group per indirect command buffer. We'd trade one bind group for two, but mean we never have to create resources at runtime.

I really wish we had root-uavs/bda so we didn't need a bind group here.

Wouldn't a single dst buffer (with its own bind group) per queue become a bottleneck in cases where a user issues 2 dispatchIndirect commands back to back each with their own indirect buffer? By only having one dst buffer, the 2 internal dispatches we do for validation can't run in parallel and I think nor can the 2nd dispatchIndirect start before the first dispatchIndirect finished.

cwfitzgerald · 2024-05-29T21:11:15Z

wgpu-core/src/device/global.rs

+        &self,
+        device_id: DeviceId,
+    ) -> Result<
+        (id::ComputePipelineId, id::BindGroupLayoutId),


Could we wrap this functionality in an encapsulated type with constructor and execute functions, preferably in its own file?

cwfitzgerald · 2024-05-29T21:25:10Z

wgpu-core/src/device/resource.rs

+            buffer.usage
+                // Allow indirect buffers to be used as storage buffers, allowing
+                // us to bind them and validate their content. Note that we already
+                // pass this usage to hal at buffer creation.
+                | if buffer.usage.contains(wgt::BufferUsages::INDIRECT) {
+                    wgt::BufferUsages::STORAGE
+                } else {
+                    wgt::BufferUsages::empty()
+                },


Do we need to be passing these into check_buffer_usage? If the user will do something wrong with usage, they will get an error message that includes a buffer usage that they did not request. We should just assume our code is right (or strict_assert maybe).

Good point!

teoxoy requested a review from a team as a code owner May 17, 2024 15:38

cwfitzgerald self-requested a review May 18, 2024 02:26

jimblandy requested changes May 20, 2024

View reviewed changes

teoxoy force-pushed the validate-indirect-compute branch from 018b23b to a5bebb0 Compare May 21, 2024 12:14

teoxoy requested a review from jimblandy May 21, 2024 12:20

teoxoy force-pushed the validate-indirect-compute branch 4 times, most recently from ac3f089 to 36281af Compare May 21, 2024 14:35

teoxoy mentioned this pull request May 22, 2024

[d3d12] get num_workgroups builtin working for indirect dispatches #5730

Draft

teoxoy added 4 commits May 27, 2024 16:19

ensure safety of indirect dispatch

21b97a9

by injecting a compute shader that validates the content of the indirect buffer

fix not being able to pass indirect buffer offsets that are not align…

04e9971

…ed to `min_uniform_buffer_offset_alignment` also adds missing indirect buffer offset validation

enable wgc/wgsl when not targeting WebGL

16262d3

fix metal issue

e0bd41a

teoxoy force-pushed the validate-indirect-compute branch from b57350e to e0bd41a Compare May 27, 2024 14:25

jimblandy reviewed May 29, 2024

View reviewed changes

address some of the feedback

c61010f

cwfitzgerald requested changes May 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure safety of indirect dispatch #5714

Ensure safety of indirect dispatch #5714

teoxoy commented May 17, 2024 •

edited

teoxoy commented May 17, 2024

cwfitzgerald commented May 18, 2024

teoxoy commented May 20, 2024

jimblandy left a comment

daxpedda commented May 20, 2024 •

edited

teoxoy commented May 21, 2024

teoxoy commented May 21, 2024

teoxoy commented May 21, 2024

teoxoy commented May 24, 2024

teoxoy commented May 24, 2024

teoxoy commented May 24, 2024

teoxoy commented May 28, 2024

cwfitzgerald commented May 28, 2024

jimblandy left a comment

teoxoy commented May 29, 2024

cwfitzgerald left a comment

cwfitzgerald May 29, 2024

teoxoy May 30, 2024

teoxoy May 30, 2024

teoxoy May 30, 2024

cwfitzgerald May 29, 2024

teoxoy May 30, 2024

cwfitzgerald May 29, 2024

teoxoy May 30, 2024 •

edited

cwfitzgerald May 29, 2024

cwfitzgerald May 29, 2024

teoxoy May 30, 2024

Ensure safety of indirect dispatch #5714

Are you sure you want to change the base?

Ensure safety of indirect dispatch #5714

Conversation

teoxoy commented May 17, 2024 • edited

teoxoy commented May 17, 2024

cwfitzgerald commented May 18, 2024

teoxoy commented May 20, 2024

jimblandy left a comment

Choose a reason for hiding this comment

daxpedda commented May 20, 2024 • edited

teoxoy commented May 21, 2024

teoxoy commented May 21, 2024

teoxoy commented May 21, 2024

teoxoy commented May 24, 2024

teoxoy commented May 24, 2024

teoxoy commented May 24, 2024

teoxoy commented May 28, 2024

cwfitzgerald commented May 28, 2024

jimblandy left a comment

Choose a reason for hiding this comment

teoxoy commented May 29, 2024

cwfitzgerald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teoxoy May 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teoxoy commented May 17, 2024 •

edited

daxpedda commented May 20, 2024 •

edited

teoxoy May 30, 2024 •

edited