-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure safety of indirect dispatch #5714
base: trunk
Are you sure you want to change the base?
Conversation
Hmm, I'm not sure how we should deal with the |
Let's just require it in core, honestly. We could build a naga module in a build script, serialize it and store that in the binary, but that's wayyyy to much work for no real gain |
That's what I thought as well. Requiring it in That PR was prefaced with:
This was also mentioned in #2549 (comment):
@daxpedda were the benefits of #2890 substantial in terms of compile time and binary size? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our office hours meeting this morning, we came up with a better way to test, by requesting a device with a lower limit than the adapter offers. Testing this way would give a more positive indication that the limit is being imposed.
I just tried a minimal example when using GLSL shaders on Wasm with WebGL.
Compile times are similar. Total compile time with O3 goes from 12.21s to 13.48s, so again around a 10% increase. Keep in mind that this is a minimal example and might not be representative for a full blown application. |
I would say 10% is significant especially for web apps but I'd be curious how much it decreases with larger applications. It's unfortunate though that since WebGL doesn't support indirect calls, this increase doesn't bring anything of value for this use case. Maybe we can feature gate the validation on the The downside being that |
018b23b
to
a5bebb0
Compare
@jimblandy I lowered the limit required to 10 since the testing infra is not setup to be able to require limits based on what the adapter supports. I think 10 should be safe, the default value is 65535. |
ac3f089
to
36281af
Compare
I added another test for the |
I wanted to get CI working so I just implemented #5714 (comment) for now. |
Filed #5739 for the metal issue which I've worked around in this PR. |
The And is because the internal call to I was hoping to get away with calling |
by injecting a compute shader that validates the content of the indirect buffer
…ed to `min_uniform_buffer_offset_alignment` also adds missing indirect buffer offset validation
b57350e
to
e0bd41a
Compare
Actually, I don't think the test is correct. Shouldn't our The The test was recently added in #5570.
@Wumpf could you chime in on this? Did we use to crash or raise a validation error? |
No, drop just removes the user's handle to it, doesn't make access to the resource invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts so far:
I see, I will try to convert the code doing bindgroup creation to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance/code simplification thoughts.
Dispatches are unfortunately not trivial to validate all at once :(
#[cfg(feature = "wgsl")] | ||
pub(crate) dispatch_indirect_validation_pipeline: | ||
Mutex<Option<(id::ComputePipelineId, id::BindGroupLayoutId)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we create this on device start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, and it will simplify things - will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is turning out difficult to do because I have to convert all code to hal code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found a way to get this working by first initializing it with None
and later when we have the Arc<Device>
and can use all its methods, I set it to Some
.
let aligned_offset = offset | ||
- offset % device.limits.min_storage_buffer_offset_alignment as u64; | ||
|
||
let (bind_group_id, error) = self.device_create_bind_group::<A>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do these bind groups/buffers get dropped? As it looks now they will exist forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I was thinking it will happen eventually since we use Arc
s but I missed we still have the registry and Id
s 😅. I think what I have to do is just remove the id from the registry.
let (dst_buffer_id, error) = self.device_create_buffer::<A>( | ||
device_id, | ||
&crate::resource::BufferDescriptor { | ||
label: None, | ||
size: 4 * 3, | ||
usage: wgt::BufferUsages::INDIRECT | wgt::BufferUsages::STORAGE, | ||
mapped_at_creation: false, | ||
}, | ||
None, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a buffer and bind group every dispatch is going to be very expensive. In this case we can make a single buffer/bind group per queue and a single bind group per indirect command buffer. We'd trade one bind group for two, but mean we never have to create resources at runtime.
I really wish we had root-uavs/bda so we didn't need a bind group here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a single dst buffer (with its own bind group) per queue become a bottleneck in cases where a user issues 2 dispatchIndirect
commands back to back each with their own indirect buffer? By only having one dst buffer, the 2 internal dispatche
s we do for validation can't run in parallel and I think nor can the 2nd dispatchIndirect
start before the first dispatchIndirect
finished.
&self, | ||
device_id: DeviceId, | ||
) -> Result< | ||
(id::ComputePipelineId, id::BindGroupLayoutId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we wrap this functionality in an encapsulated type with constructor and execute functions, preferably in its own file?
buffer.usage | ||
// Allow indirect buffers to be used as storage buffers, allowing | ||
// us to bind them and validate their content. Note that we already | ||
// pass this usage to hal at buffer creation. | ||
| if buffer.usage.contains(wgt::BufferUsages::INDIRECT) { | ||
wgt::BufferUsages::STORAGE | ||
} else { | ||
wgt::BufferUsages::empty() | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be passing these into check_buffer_usage? If the user will do something wrong with usage, they will get an error message that includes a buffer usage that they did not request. We should just assume our code is right (or strict_assert maybe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
Ensure safety of indirect dispatch by injecting a compute shader that validates the content of the indirect buffer.
Part of #2431.