Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply Pixman WASM SIMD Patch #143

Open
Ghabry opened this issue Mar 17, 2023 · 5 comments
Open

Apply Pixman WASM SIMD Patch #143

Ghabry opened this issue Mar 17, 2023 · 5 comments

Comments

@Ghabry
Copy link
Member

Ghabry commented Mar 17, 2023

Problem is that WASM SIMD support is still a bit experimental and most browsers got it in 2022. So is imo risky to roll this out as it will break likely on some devices.

There is this patch by libreoffice: https://cgit.freedesktop.org/libreoffice/core/commit/?id=d5f5f0984510d6c1b453e31c1ad58fb29fed278b

And here some benchmarks:

lol. nope. Better luck next time node provided by emscripten is too old. System Node works >.>

RuntimeError: Aborted(CompileError: WebAssembly.instantiate():
Compiling function #41284:"_pixman_compute_composite_region32" failed:
invalid simd opcode @+7672059)

I marked the interesting ones with a <-. Top: No SIMD. Bottom: SIMD

Bitmap:

Unfortunately the normal Blit does not become faster :/. This is the typical operation.

------------------------------------------------------------------------                                                  
Benchmark                              Time             CPU   Iterations                                                  
------------------------------------------------------------------------                                                  
BM_FindFormatSingle                 6.22 ns         6.22 ns    111794665                                                  
BM_FindFormat                       23.7 ns         23.7 ns     29602334                                                  
BM_ComputeImageOpacity             25185 ns        25185 ns        27740  <-                                                
BM_ComputeImageOpacityChipset      62613 ns        62613 ns        11134  <-                                              
BM_Create                          12401 ns        12401 ns        56128                                                  
BM_Blit                            43057 ns        43057 ns        16243                                              
BM_BlitFast                        23854 ns        23854 ns        29167                                                  
BM_TiledBlit                         364 ns          364 ns      1922473                                                  
BM_TiledBlitOffset                  6786 ns         6786 ns       102803                                                  
BM_StretchBlit                     62384 ns        62384 ns        11198                                                  
BM_StretchBlitRect                 62219 ns        62219 ns        11224                                                  
BM_FlipBlit                        62320 ns        62320 ns        11207                                                  
BM_ZoomOpacityBlit                  3820 ns         3820 ns       183017                                                  
BM_RotateZoomOpacityBlit            4078 ns         4078 ns       171257                                                  
BM_WaverBlit                      155964 ns       155964 ns         4485  <-                                                
BM_Fill                            25066 ns        25066 ns        27915  <-                                                
BM_FillRect                       293335 ns       293335 ns         2382  <-                                                
BM_Clear                           12309 ns        12309 ns        56928                                                  
BM_ClearRect                       25110 ns        25110 ns        27890  <-                                                
BM_HueChangeBlit                  751022 ns       751022 ns          935  <-                                                
BM_ToneBlit                       125215 ns       125215 ns         5590                                                  
BM_BlendBlit                      431912 ns       431912 ns         1614  <-                                                
BM_Flip                            66692 ns        66692 ns        10486                                                  
BM_MaskedBlit                     492941 ns       492941 ns         1420  <-                                                
BM_MaskedColorBlit                388289 ns       388289 ns         1800  <-                                              
BM_Blit2x                          38527 ns        38527 ns        18163                                              
BM_TransformRectangle               86.0 ns         86.0 ns      8136947                                                  
BM_EffectsBlit                    150537 ns       150537 ns         4368  <-

------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
BM_FindFormatSingle                 6.16 ns         6.16 ns    113345317
BM_FindFormat                       23.6 ns         23.6 ns     29618854
BM_ComputeImageOpacity              7474 ns         7474 ns        94677  <-
BM_ComputeImageOpacityChipset      35757 ns        35757 ns        19544  <-
BM_Create                          12144 ns        12144 ns        57417
BM_Blit                            43108 ns        43108 ns        16242
BM_BlitFast                        23532 ns        23532 ns        29834
BM_TiledBlit                         364 ns          364 ns      1920530
BM_TiledBlitOffset                  6776 ns         6776 ns       103035
BM_StretchBlit                     62228 ns        62228 ns        11196
BM_StretchBlitRect                 62181 ns        62181 ns        11252
BM_FlipBlit                        62331 ns        62331 ns        11226
BM_ZoomOpacityBlit                  3804 ns         3804 ns       183853
BM_RotateZoomOpacityBlit            4085 ns         4085 ns       171303
BM_WaverBlit                      144637 ns       144637 ns         4838  <-
BM_Fill                             9979 ns         9979 ns        70038  <-
BM_FillRect                       120635 ns       120635 ns         5785  <-
BM_Clear                           12000 ns        12000 ns        58375  
BM_ClearRect                        9965 ns         9965 ns        69936  <-
BM_HueChangeBlit                  426308 ns       426308 ns         1665  <-
BM_ToneBlit                       124925 ns       124926 ns         5599
BM_BlendBlit                      188520 ns       188520 ns         3716  <-
BM_Flip                            66193 ns        66193 ns        10576
BM_MaskedBlit                     181776 ns       181776 ns         3975  <-
BM_MaskedColorBlit                144807 ns       144807 ns         4824  <-
BM_Blit2x                          38458 ns        38458 ns        18189
BM_TransformRectangle               86.1 ns         86.1 ns      8135229
BM_EffectsBlit                    144725 ns       144725 ns         4600  <-

Draw:

(Crashes with out of memory in both builds)

Font:

----------------------------------------------------------                                                                
Benchmark                Time             CPU   Iterations                                                                
----------------------------------------------------------                                                                
BM_FontSizeStr        3000 ns         3000 ns       232883                                                                
BM_FontSizeChar       62.3 ns         62.3 ns     11231302                                                                
BM_vRender             195 ns          195 ns      3584303                                                                
BM_Render             8427 ns         8427 ns        79988 <-   


----------------------------------------------------------                                                                
Benchmark                Time             CPU   Iterations                                                                
----------------------------------------------------------                                                                
BM_FontSizeStr        3006 ns         3006 ns       232460                                                                
BM_FontSizeChar       62.3 ns         62.3 ns     11227807                                                                
BM_vRender             188 ns          188 ns      3728400                                                                
BM_Render             7214 ns         7214 ns        96501 <-

Pixel Format:

Interestingly ARGB and ABGR become twice as fast but we already figured this out years ago and use RGBA and BGRA by default.

--------------------------------------------------------                                                                  
Benchmark              Time             CPU   Iterations                                                                  
--------------------------------------------------------                                                                  
BM_BlitBGRA_a      43159 ns        43159 ns        16222                                                                  
BM_BlitRGBA_a      43076 ns        43076 ns        16249                                                                  
BM_BlitABGR_a     208881 ns       208881 ns         3366 <-                                                                
BM_BlitARGB_a     410677 ns       410677 ns         1706 <-                                                               
BM_BlitBGRA_n      23190 ns        23190 ns        30179                                                                  
BM_BlitRGBA_n      23203 ns        23203 ns        30199                                                                  
BM_BlitABGR_n     157621 ns       157622 ns         4454                                                                  
BM_BlitARGB_n      23233 ns        23233 ns        29990   

--------------------------------------------------------                                                                  
Benchmark              Time             CPU   Iterations                                                                  
--------------------------------------------------------                                                                  
BM_BlitBGRA_a      43074 ns        43074 ns        16249                                                                  
BM_BlitRGBA_a      43125 ns        43125 ns        16234                                                                  
BM_BlitABGR_a     130252 ns       130252 ns         5308 <-                                                                  
BM_BlitARGB_a     148160 ns       148160 ns         4688 <-                                                               
BM_BlitBGRA_n      23426 ns        23426 ns        29832                                                                  
BM_BlitRGBA_n      23398 ns        23398 ns        29877                                                                  
BM_BlitABGR_n      60078 ns        60078 ns        11441                                                                  
BM_BlitARGB_n      23456 ns        23456 ns        29836

Text:

------------------------------------------------------------------                                                        
Benchmark                        Time             CPU   Iterations                                                        
------------------------------------------------------------------                                                        
BM_TextDrawStrSystem        334333 ns       334334 ns         2085 <-                                                      
BM_TextDrawStrColor         146541 ns       146541 ns         4754 <-                                                     
BM_TextDrawCharSystem         8454 ns         8454 ns        83365 <-                                                       
BM_TextDrawCharSystemEx       14.5 ns         14.5 ns     48445341                                                        
BM_TextDrawCharColor          3506 ns         3506 ns       200249 <-                                                       
BM_TextDrawCharColorEx        8.70 ns         8.70 ns     80582626

------------------------------------------------------------------                                                        
Benchmark                        Time             CPU   Iterations                                                        
------------------------------------------------------------------                                                        
BM_TextDrawStrSystem        285243 ns       285243 ns         2423 <-                                                     
BM_TextDrawStrColor         128538 ns       128538 ns         5430 <-                                                     
BM_TextDrawCharSystem         7254 ns         7254 ns        96237 <-                                                     
BM_TextDrawCharSystemEx       14.5 ns         14.5 ns     48438472                                                        
BM_TextDrawCharColor          3189 ns         3189 ns       226936 <-                                                     
BM_TextDrawCharColorEx        8.69 ns         8.69 ns     80346327
@Ghabry
Copy link
Member Author

Ghabry commented Apr 24, 2024

About the state of SIMD: This patch is rolled out since a while for ynoproject and they have complaints since months that the Player fails to start.

Though the issue is not SIMD support in the browser, but old CPUs from 10 years ago with lack of certain SIMD instructions. 😅

@Desdaemon
Copy link
Contributor

Desdaemon commented Jun 19, 2024

Apologies if this is off-topic, but recently I've been hearing reports of iOS players not being able to run the SIMD-less version of ynoproject's Player, due to the presence of SIMD code. Is it something fixable in the foreseeable future, or does the team need more feedback from players? Thanks!

@Ghabry
Copy link
Member Author

Ghabry commented Jun 19, 2024

We only accept bug reports when they are reproducable in the official web player: https://easyrpg.org/play/master

@Desdaemon
Copy link
Contributor

Desdaemon commented Nov 1, 2024

I observed that pixman actually doesn't have a lot of fast paths for SSSE3, but rather SSE2 so I opted to patch for SSE2 instead. BM_Blit and other benchmarks saw great improvements at the cost of others. It is also noteworthy that enabling SSSE3 with SSE2 as fallback seems to have a negative effect perhaps due to branching code.

Here are some very unscientific measurements:

bench/bitmap.cpp (baseline)
------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
BM_FindFormatSingle                 3.49 ns         3.49 ns    200809868
BM_FindFormat                       13.0 ns         13.0 ns     53818665
BM_ComputeImageOpacity              4294 ns         4294 ns       162646
BM_ComputeImageOpacityChipset      23996 ns        23996 ns        29018
BM_Create                           6081 ns         6081 ns       117449
BM_Blit                            34395 ns        34395 ns        20275
BM_BlitFast                         5765 ns         5765 ns       118172
BM_TiledBlit                         214 ns          214 ns      3201599
BM_TiledBlitOffset                  3575 ns         3575 ns       195864
BM_StretchBlit                     42705 ns        42705 ns        16307
BM_StretchBlitRect                 43079 ns        43079 ns        16319
BM_FlipBlit                        42844 ns        42844 ns        15974
BM_ZoomOpacityBlit                  2241 ns         2241 ns       311206
BM_RotateZoomOpacityBlit            2422 ns         2422 ns       294152
BM_WaverBlit                       88215 ns        88215 ns         7954
BM_Fill                             8565 ns         8565 ns        81945
BM_FillRect                       171672 ns       171672 ns         4047
BM_Clear                            5845 ns         5845 ns       117351
BM_ClearRect                        8613 ns         8613 ns        81666
BM_HueChangeBlit                  474084 ns       474084 ns         1477
BM_ToneBlit                        85586 ns        85586 ns         8126
BM_BlendBlit                      277076 ns       277077 ns         2548
BM_Flip                           101741 ns       101741 ns         6750
BM_MaskedBlit                     295763 ns       295764 ns         2358
BM_MaskedColorBlit                239980 ns       239980 ns         2894
BM_Blit2x                          26050 ns        26051 ns        26904
BM_TransformRectangle               44.9 ns         44.9 ns     15608992
BM_EffectsBlit                     89569 ns        89569 ns         7856

bench/bitmap.cpp (SSE2)
------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
BM_FindFormatSingle                 3.60 ns         3.60 ns    194575457
BM_FindFormat                       13.3 ns         13.3 ns     52517737
BM_ComputeImageOpacity              4500 ns         4500 ns       156938
BM_ComputeImageOpacityChipset      24523 ns        24523 ns        28682
BM_Create                           6130 ns         6130 ns       118493
BM_Blit                            11356 ns        11356 ns        62753 (faster)
BM_BlitFast                         5917 ns         5917 ns       117949
BM_TiledBlit                         168 ns          168 ns      4167409
BM_TiledBlitOffset                  2182 ns         2182 ns       314725 (faster)
BM_StretchBlit                     61337 ns        61337 ns        11449 (slower)
BM_StretchBlitRect                 61469 ns        61469 ns        11535 (slower)
BM_FlipBlit                        61309 ns        61309 ns        11476 (slower)
BM_ZoomOpacityBlit                  3222 ns         3222 ns       217607 (slower)
BM_RotateZoomOpacityBlit            3355 ns         3355 ns       208259 (slower)
BM_WaverBlit                      107832 ns       107832 ns         6333 (slower)
BM_Fill                             4205 ns         4205 ns       166465 (faster)
BM_FillRect                        46137 ns        46137 ns        15269 (faster)
BM_Clear                            6035 ns         6035 ns       116866 (faster)
BM_ClearRect                        4194 ns         4194 ns       166409 (faster)
BM_HueChangeBlit                  207792 ns       207792 ns         3342 (faster)
BM_ToneBlit                        62345 ns        62346 ns        11146 (faster)
BM_BlendBlit                       67146 ns        67146 ns        10455 (faster)
BM_Flip                           103784 ns       103784 ns         6603
BM_MaskedBlit                      71068 ns        71068 ns         9418 (faster)
BM_MaskedColorBlit                 59258 ns        59258 ns        12253 (faster)
BM_Blit2x                          28626 ns        28626 ns        24228
BM_TransformRectangle               49.1 ns         49.1 ns     14092031
BM_EffectsBlit                    116290 ns       116290 ns         6097 (slower)

bench/font.cpp (baseline)
----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FontSizeStr        1772 ns         1772 ns       402582
BM_FontSizeChar       40.6 ns         40.6 ns     17183091
BM_vRender            97.4 ns         97.4 ns      7058369
BM_Render             3464 ns         3464 ns       198550

bench/font.cpp (SSE2)
----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FontSizeStr        1745 ns         1745 ns       405199
BM_FontSizeChar       39.0 ns         39.0 ns     17993729
BM_vRender            95.3 ns         95.3 ns      7136918
BM_Render             2676 ns         2676 ns       261644 (faster)

bench/pixel_format.cpp (baseline)
--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
BM_BlitBGRA_a      36895 ns        36895 ns        18855
BM_BlitRGBA_a      37173 ns        37173 ns        18761
BM_BlitABGR_a     141954 ns       141954 ns         4850
BM_BlitARGB_a     266456 ns       266456 ns         2640
BM_BlitBGRA_n       6639 ns         6639 ns       109780
BM_BlitRGBA_n       6704 ns         6704 ns       101545
BM_BlitABGR_n      66521 ns        66521 ns        10104
BM_BlitARGB_n       6722 ns         6722 ns       102618

bench/pixel_format.cpp (SSE2)
--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
BM_BlitBGRA_a      11941 ns        11941 ns        59098 (faster)
BM_BlitRGBA_a      12101 ns        12101 ns        58016 (faster)
BM_BlitABGR_a      46923 ns        46923 ns        15021 (faster)
BM_BlitARGB_a      53954 ns        53954 ns        12877 (faster)
BM_BlitBGRA_n       7065 ns         7065 ns        97536 (slower)
BM_BlitRGBA_n       7016 ns         7016 ns        95469 (slower)
BM_BlitABGR_n      27180 ns        27180 ns        25899 (faster)
BM_BlitARGB_n       6612 ns         6612 ns       102371

bench/text.cpp (baseline)
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem        139300 ns       139300 ns         4959
BM_TextDrawStrColor          62378 ns        62378 ns        10875
BM_TextDrawCharSystem         3538 ns         3538 ns       196086
BM_TextDrawCharSystemEx       7.31 ns         7.31 ns     94806000
BM_TextDrawCharColor          1564 ns         1564 ns       446296
BM_TextDrawCharColorEx        5.56 ns         5.56 ns    125895293

bench/text.cpp (SSE2)
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem        120893 ns       120893 ns         5727 (faster)
BM_TextDrawStrColor          57627 ns        57627 ns        11817 (faster)
BM_TextDrawCharSystem         3004 ns         3004 ns       234003 (faster)
BM_TextDrawCharSystemEx       7.37 ns         7.37 ns     93764693
BM_TextDrawCharColor          1427 ns         1427 ns       484388
BM_TextDrawCharColorEx        5.59 ns         5.59 ns    117193273

@Ghabry
Copy link
Member Author

Ghabry commented Nov 1, 2024

Cool. The normal Blit operation is the most common, a speedup in it helps alot :).

Desdaemon added a commit to Desdaemon/easyrpg-buildscripts that referenced this issue Nov 2, 2024
SSE2 chosen based on discussions from EasyRPG#143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants