Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opengl: add support for 16hf formats #265

Closed
wants to merge 3 commits into from
Closed

Conversation

ruihe774
Copy link
Contributor

Fixes #263.

@ruihe774 ruihe774 force-pushed the gl-16hf branch 3 times, most recently from 6be6b6c to 48a0670 Compare May 20, 2024 15:36
@haasn
Copy link
Owner

haasn commented May 21, 2024

Note: We don't currently have a conceivable use case for half float vertex data. It may make sense to reduce the check to just color renderable half float textures (and drop the corresponding V capability), which will also make it a bit less redundant with the existing formats_float.

Looks good either way.

@ruihe774
Copy link
Contributor Author

Note: We don't currently have a conceivable use case for half float vertex data. It may make sense to reduce the check to just color renderable half float textures (and drop the corresponding V capability), which will also make it a bit less redundant with the existing formats_float.

PL_FMT_CAP_VERTEX is compared in cmp_fmt. Formats without this capacity will not take precedence.

@haasn
Copy link
Owner

haasn commented May 21, 2024

Note: We don't currently have a conceivable use case for half float vertex data. It may make sense to reduce the check to just color renderable half float textures (and drop the corresponding V capability), which will also make it a bit less redundant with the existing formats_float.

PL_FMT_CAP_VERTEX is compared in cmp_fmt. Formats without this capacity will not take precedence.

Oh, I see. That sounds like unintended behavior. Give me a moment to prepare a patch

Capabilities like vertex, texel buffer support or readback shouldn't
influence the sorting, because this requirement will typically be
explicitly filtered for when searching for such formats.

This capability sorting only exists to allow prioritizing formats that
have the optional ability to leverage more performant code paths, e.g.
by allowing blits, linear sampling or texture storage. And in either
case, we wouldn't want to pick a worse rendering format just for some
minor side benefit from having it be compute-dispatchable when blending
or some other obscure edge case like that.

(And lastly, there is also the user friendliness consideration of
wanting "primitive" texture formats near the top of the list)
@haasn
Copy link
Owner

haasn commented May 21, 2024

@ruihe774
Copy link
Contributor Author

It is a GLES 2.0 extension, and we do not enable 16f formats with GLES 2.0 at present.
@haasn
Copy link
Owner

haasn commented May 29, 2024

It seems that NaNs do not survive an upload/download round-trip on my end. They get consistently written back as 0xFE00.

We could probably modify our test suite to not generate these NaNs in the first place, but given the fact that it works for "native" formats like 32-bit floats, as well as 16-bit floats on Vulkan, I rather suspect that the issue here is that the 16hf code path on my GL driver is actually emulated internally.

Should we mark them as emulated?

testing texture roundtrip for format rgba16hf
... 1D
upload time: 2760, download time: 2200
... 2D
upload time: 3000, download time: 2760
... 3D
=== FAILED: memcmp(src, dst, bytes) == 0 at ../src/tests/gpu_tests.h:201
at position 160: 0x0d != 0x00

416 bytes of 'src' at offset 0:
c6 16 21 4c ac b9 32 a8 c5 59 c8 e4 7d f5 12 8c
68 82 3f 33 8e 0f 5a b6 46 38 ae 2a 6c a5 c1 33
bc e3 7f 68 9c b2 10 62 0b d9 46 89 ce 59 15 37
dc 55 6a 6a 65 c4 21 ab fc cf d6 69 74 97 9c 31
7a 1b 9a 17 ce ab 79 d9 85 c0 63 54 1a 79 8b f6
ce f5 60 34 b9 81 df b6 51 b5 1f c6 4d bb f7 c8
d7 92 df a5 3d 59 7e c3 1a e2 17 34 5b a2 2b 2a
98 8c 5e 51 0e 3e 08 5f f4 27 25 41 e2 1d 0a b9
af e9 5e ed 43 dd b0 5e bf c8 92 1b 6b be 45 03
4a a4 55 58 e2 5d b7 d7 84 dc 18 67 fa 23 20 a9
0d 7f 97 50 5d 48 ae 1d 10 41 38 7b ff 7e 7e 49
^^^^^                               ^^^^^
22 d3 a1 05 31 58 dc b5 35 f5 1d 2f 18 3e d9 25
bd 70 76 1b b8 24 38 c9 66 71 44 65 f0 c2 ae 13
96 50 18 c7 a9 f5 7d df ea 9a 0f 02 d8 e8 28 96
59 9e b2 12 c3 eb db 29 5d 1f 8e 4d e2 3d 60 79
8e 79 40 37 6e be 17 58 59 26 5b 32 0e 84 c9 68
22 7b 7a e5 66 55 0f c3 74 9d 10 56 db 71 d0 6a
ea 10 a1 58 cf b9 b1 28 df 0d 5b ee 91 24 56 b3
9f d0 99 05 25 a9 c9 9a 47 da f1 22 4c c1 8c 36
d2 2e 8f a1 e7 41 ca c6 4e 25 b4 df 49 0b 93 e9
db 2d ee 00 d6 b8 9a 1d 92 8c 40 df 4d cd 15 1f
fb a5 c0 e3 e6 8b aa 35 b0 5f 14 fa 6a a8 e3 45
d6 d2 46 ac 8b e1 ca 1e 6d 0a fd ba d7 12 da d3
b8 9a b7 9e 26 61 d3 d7 c0 e8 d1 2b 91 b5 71 67
88 b7 13 13 99 de 31 06 e8 2e c1 c0 41 9b 93 f9
36 4a 98 5d ac 6c 34 6d 54 06 98 e6 bb 09 4d 44

416 bytes of 'dst' at offset 0:
c6 16 21 4c ac b9 32 a8 c5 59 c8 e4 7d f5 12 8c
68 82 3f 33 8e 0f 5a b6 46 38 ae 2a 6c a5 c1 33
bc e3 7f 68 9c b2 10 62 0b d9 46 89 ce 59 15 37
dc 55 6a 6a 65 c4 21 ab fc cf d6 69 74 97 9c 31
7a 1b 9a 17 ce ab 79 d9 85 c0 63 54 1a 79 8b f6
ce f5 60 34 b9 81 df b6 51 b5 1f c6 4d bb f7 c8
d7 92 df a5 3d 59 7e c3 1a e2 17 34 5b a2 2b 2a
98 8c 5e 51 0e 3e 08 5f f4 27 25 41 e2 1d 0a b9
af e9 5e ed 43 dd b0 5e bf c8 92 1b 6b be 45 03
4a a4 55 58 e2 5d b7 d7 84 dc 18 67 fa 23 20 a9
00 fe 97 50 5d 48 ae 1d 10 41 38 7b 00 fe 7e 49
^^^^^                               ^^^^^
22 d3 a1 05 31 58 dc b5 35 f5 1d 2f 18 3e d9 25
bd 70 76 1b b8 24 38 c9 66 71 44 65 f0 c2 ae 13
96 50 18 c7 a9 f5 7d df ea 9a 0f 02 d8 e8 28 96
59 9e b2 12 c3 eb db 29 5d 1f 8e 4d e2 3d 60 79
8e 79 40 37 6e be 17 58 59 26 5b 32 0e 84 c9 68
22 7b 7a e5 66 55 0f c3 74 9d 10 56 db 71 d0 6a
ea 10 a1 58 cf b9 b1 28 df 0d 5b ee 91 24 56 b3
9f d0 99 05 25 a9 c9 9a 47 da f1 22 4c c1 8c 36
d2 2e 8f a1 e7 41 ca c6 4e 25 b4 df 49 0b 93 e9
db 2d ee 00 d6 b8 9a 1d 92 8c 40 df 4d cd 15 1f
fb a5 c0 e3 e6 8b aa 35 b0 5f 14 fa 6a a8 e3 45
d6 d2 46 ac 8b e1 ca 1e 6d 0a fd ba d7 12 da d3
b8 9a b7 9e 26 61 d3 d7 c0 e8 d1 2b 91 b5 71 67
88 b7 13 13 99 de 31 06 e8 2e c1 c0 41 9b 93 f9
36 4a 98 5d ac 6c 34 6d 54 06 98 e6 bb 09 4d 44

Full log: testlog.txt

@ruihe774
Copy link
Contributor Author

ruihe774 commented May 29, 2024

It seems that NaNs do not survive an upload/download round-trip on my end. They get consistently written back as 0xFE00.

It is implementation-dependent. e.g. iris fails the test while swrast passes it.

According to OpenGL spec (ver 4.6, sec 2.3.4), passing NaN to GL commands is an UB, despite 16hf or 32f:

Any representable floating-point value is legal as input to a GL command that requires floating-point data. The result of providing a value that is not a floating-point number to such a command is unspecified, but must not lead to GL interruption or termination. In IEEE arithmetic, for example, providing a negative zero or a denormalized number to a GL command yields predictable results, while providing a NaN or an infinity yields unspecified results.

IMO we should not assume the result of an UB in tests.

@haasn
Copy link
Owner

haasn commented May 29, 2024

Okay, then we have no choice but to make our test framework smarter when generating/testing floating point values. I'll write a patch for it.

@ruihe774
Copy link
Contributor Author

@haasn
Copy link
Owner

haasn commented May 29, 2024

@haasn
Copy link
Owner

haasn commented Jun 1, 2024

Rebased + merged this manually, thanks

@haasn haasn closed this Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gpu: suboptimal preference of formats - emulated formats can have better performance
2 participants