Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: suboptimal preference of formats - emulated formats can have better performance #263

Closed
ruihe774 opened this issue May 18, 2024 · 0 comments

Comments

@ruihe774
Copy link
Contributor

ruihe774 commented May 18, 2024

In cmp_fmt(), non-emulated formats with more caps are always preferred. However, Some GPUs, e.g. my Intel Arc A750, and perhaps other Intel GPUs, have better performance with rgba16f, which is an emulated format, than rgba32f, which is non-emulated. This is confirmed by my test.

Content of my gpu->formats:

NAME                 TYPE   SIZE COMP CAPS         EMU DEPTH         HOST_BITS     GLSL_TYPE  GLSL_FMT   FOURCC
r8                   UNORM  1    R    SsLRbBV--HWG n   {8  0  0  0 } {8  0  0  0 } float      r8         R8    
rg8                  UNORM  2    RG   SsLRbBV--HWG n   {8  8  0  0 } {8  8  0  0 } vec2       rg8        GR88  
rgba8                UNORM  4    RGBA SsLRbBV--HWG n   {8  8  8  8 } {8  8  8  8 } vec4       rgba8      AB24  
bgra8                UNORM  4    BGRA SsLRbBV--HWG n   {8  8  8  8 } {8  8  8  8 } vec4       rgba8      AR24  
r16                  UNORM  2    R    SsLRbBV--HWG n   {16 0  0  0 } {16 0  0  0 } float      r16        R16   
rg16                 UNORM  4    RG   SsLRbBV--HWG n   {16 16 0  0 } {16 16 0  0 } vec2       rg16       GR32  
rgba16               UNORM  8    RGBA SsLRbBV--HWG n   {16 16 16 16} {16 16 16 16} vec4       rgba16           
r32f                 FLOAT  4    R    SsLRbBV--HWG n   {32 0  0  0 } {32 0  0  0 } float      r32f             
rg32f                FLOAT  8    RG   SsLRbBV--HWG n   {32 32 0  0 } {32 32 0  0 } vec2       rg32f            
rgba32f              FLOAT  16   RGBA SsLRbBV--HWG n   {32 32 32 32} {32 32 32 32} vec4       rgba32f          
r8u                  UINT   1    R    Ss-R-BV--HWG n   {8  0  0  0 } {8  0  0  0 } uint       r8ui             
rg8u                 UINT   2    RG   Ss-R-BV--HWG n   {8  8  0  0 } {8  8  0  0 } uvec2      rg8ui            
rgba8u               UINT   4    RGBA Ss-R-BV--HWG n   {8  8  8  8 } {8  8  8  8 } uvec4      rgba8ui          
r16u                 UINT   2    R    Ss-R-BV--HWG n   {16 0  0  0 } {16 0  0  0 } uint       r16ui            
rg16u                UINT   4    RG   Ss-R-BV--HWG n   {16 16 0  0 } {16 16 0  0 } uvec2      rg16ui           
rgba16u              UINT   8    RGBA Ss-R-BV--HWG n   {16 16 16 16} {16 16 16 16} uvec4      rgba16ui         
r16f                 FLOAT  4    R    SsLRbB---HWG y   {16 0  0  0 } {32 0  0  0 } float      r16f             
rg16f                FLOAT  8    RG   SsLRbB---HWG y   {16 16 0  0 } {32 32 0  0 } vec2       rg16f            
rgba16f              FLOAT  16   RGBA SsLRbB---HWG y   {16 16 16 16} {32 32 32 32} vec4       rgba16f          
rgb8                 UNORM  3    RGB  S-LRbBV--H-G y   {8  8  8  0 } {8  8  8  0 } vec3                  BG24  
rgb16                UNORM  6    RGB  S-LRbBV--H-G y   {16 16 16 0 } {16 16 16 0 } vec3                        
rgb32f               FLOAT  12   RGB  S-LRbBV--H-G y   {32 32 32 0 } {32 32 32 0 } vec3                        
rgb16f               FLOAT  12   RGB  S-LRbB---H-G y   {16 16 16 0 } {32 32 32 0 } vec3                        
rgb8u                UINT   3    RGB  S-----V--H-G y   {8  8  8  0 } {8  8  8  0 } uvec3                       
rgb16u               UINT   6    RGB  S-----V--H-G y   {16 16 16 0 } {16 16 16 0 } uvec3                       

It is not strange that even though rgba16f is emulated, it performs better in practice. The GPU can do some internal SIMD with 16f.

@ruihe774 ruihe774 changed the title gpu: suboptimal selection of format in pl_find_fmt() gpu: suboptimal preference of formats - emulated formats can have better performance May 18, 2024
@ruihe774 ruihe774 closed this as completed Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant