-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply Pixman WASM SIMD Patch #143
Comments
About the state of SIMD: This patch is rolled out since a while for ynoproject and they have complaints since months that the Player fails to start. Though the issue is not SIMD support in the browser, but old CPUs from 10 years ago with lack of certain SIMD instructions. 😅 |
Apologies if this is off-topic, but recently I've been hearing reports of iOS players not being able to run the SIMD-less version of ynoproject's Player, due to the presence of SIMD code. Is it something fixable in the foreseeable future, or does the team need more feedback from players? Thanks! |
We only accept bug reports when they are reproducable in the official web player: https://easyrpg.org/play/master |
I observed that pixman actually doesn't have a lot of fast paths for SSSE3, but rather SSE2 so I opted to patch for SSE2 instead. BM_Blit and other benchmarks saw great improvements at the cost of others. It is also noteworthy that enabling SSSE3 with SSE2 as fallback seems to have a negative effect perhaps due to branching code. Here are some very unscientific measurements: bench/bitmap.cpp (baseline)
------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------
BM_FindFormatSingle 3.49 ns 3.49 ns 200809868
BM_FindFormat 13.0 ns 13.0 ns 53818665
BM_ComputeImageOpacity 4294 ns 4294 ns 162646
BM_ComputeImageOpacityChipset 23996 ns 23996 ns 29018
BM_Create 6081 ns 6081 ns 117449
BM_Blit 34395 ns 34395 ns 20275
BM_BlitFast 5765 ns 5765 ns 118172
BM_TiledBlit 214 ns 214 ns 3201599
BM_TiledBlitOffset 3575 ns 3575 ns 195864
BM_StretchBlit 42705 ns 42705 ns 16307
BM_StretchBlitRect 43079 ns 43079 ns 16319
BM_FlipBlit 42844 ns 42844 ns 15974
BM_ZoomOpacityBlit 2241 ns 2241 ns 311206
BM_RotateZoomOpacityBlit 2422 ns 2422 ns 294152
BM_WaverBlit 88215 ns 88215 ns 7954
BM_Fill 8565 ns 8565 ns 81945
BM_FillRect 171672 ns 171672 ns 4047
BM_Clear 5845 ns 5845 ns 117351
BM_ClearRect 8613 ns 8613 ns 81666
BM_HueChangeBlit 474084 ns 474084 ns 1477
BM_ToneBlit 85586 ns 85586 ns 8126
BM_BlendBlit 277076 ns 277077 ns 2548
BM_Flip 101741 ns 101741 ns 6750
BM_MaskedBlit 295763 ns 295764 ns 2358
BM_MaskedColorBlit 239980 ns 239980 ns 2894
BM_Blit2x 26050 ns 26051 ns 26904
BM_TransformRectangle 44.9 ns 44.9 ns 15608992
BM_EffectsBlit 89569 ns 89569 ns 7856
bench/bitmap.cpp (SSE2)
------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------
BM_FindFormatSingle 3.60 ns 3.60 ns 194575457
BM_FindFormat 13.3 ns 13.3 ns 52517737
BM_ComputeImageOpacity 4500 ns 4500 ns 156938
BM_ComputeImageOpacityChipset 24523 ns 24523 ns 28682
BM_Create 6130 ns 6130 ns 118493
BM_Blit 11356 ns 11356 ns 62753 (faster)
BM_BlitFast 5917 ns 5917 ns 117949
BM_TiledBlit 168 ns 168 ns 4167409
BM_TiledBlitOffset 2182 ns 2182 ns 314725 (faster)
BM_StretchBlit 61337 ns 61337 ns 11449 (slower)
BM_StretchBlitRect 61469 ns 61469 ns 11535 (slower)
BM_FlipBlit 61309 ns 61309 ns 11476 (slower)
BM_ZoomOpacityBlit 3222 ns 3222 ns 217607 (slower)
BM_RotateZoomOpacityBlit 3355 ns 3355 ns 208259 (slower)
BM_WaverBlit 107832 ns 107832 ns 6333 (slower)
BM_Fill 4205 ns 4205 ns 166465 (faster)
BM_FillRect 46137 ns 46137 ns 15269 (faster)
BM_Clear 6035 ns 6035 ns 116866 (faster)
BM_ClearRect 4194 ns 4194 ns 166409 (faster)
BM_HueChangeBlit 207792 ns 207792 ns 3342 (faster)
BM_ToneBlit 62345 ns 62346 ns 11146 (faster)
BM_BlendBlit 67146 ns 67146 ns 10455 (faster)
BM_Flip 103784 ns 103784 ns 6603
BM_MaskedBlit 71068 ns 71068 ns 9418 (faster)
BM_MaskedColorBlit 59258 ns 59258 ns 12253 (faster)
BM_Blit2x 28626 ns 28626 ns 24228
BM_TransformRectangle 49.1 ns 49.1 ns 14092031
BM_EffectsBlit 116290 ns 116290 ns 6097 (slower)
bench/font.cpp (baseline)
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_FontSizeStr 1772 ns 1772 ns 402582
BM_FontSizeChar 40.6 ns 40.6 ns 17183091
BM_vRender 97.4 ns 97.4 ns 7058369
BM_Render 3464 ns 3464 ns 198550
bench/font.cpp (SSE2)
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_FontSizeStr 1745 ns 1745 ns 405199
BM_FontSizeChar 39.0 ns 39.0 ns 17993729
BM_vRender 95.3 ns 95.3 ns 7136918
BM_Render 2676 ns 2676 ns 261644 (faster)
bench/pixel_format.cpp (baseline)
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
BM_BlitBGRA_a 36895 ns 36895 ns 18855
BM_BlitRGBA_a 37173 ns 37173 ns 18761
BM_BlitABGR_a 141954 ns 141954 ns 4850
BM_BlitARGB_a 266456 ns 266456 ns 2640
BM_BlitBGRA_n 6639 ns 6639 ns 109780
BM_BlitRGBA_n 6704 ns 6704 ns 101545
BM_BlitABGR_n 66521 ns 66521 ns 10104
BM_BlitARGB_n 6722 ns 6722 ns 102618
bench/pixel_format.cpp (SSE2)
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
BM_BlitBGRA_a 11941 ns 11941 ns 59098 (faster)
BM_BlitRGBA_a 12101 ns 12101 ns 58016 (faster)
BM_BlitABGR_a 46923 ns 46923 ns 15021 (faster)
BM_BlitARGB_a 53954 ns 53954 ns 12877 (faster)
BM_BlitBGRA_n 7065 ns 7065 ns 97536 (slower)
BM_BlitRGBA_n 7016 ns 7016 ns 95469 (slower)
BM_BlitABGR_n 27180 ns 27180 ns 25899 (faster)
BM_BlitARGB_n 6612 ns 6612 ns 102371
bench/text.cpp (baseline)
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem 139300 ns 139300 ns 4959
BM_TextDrawStrColor 62378 ns 62378 ns 10875
BM_TextDrawCharSystem 3538 ns 3538 ns 196086
BM_TextDrawCharSystemEx 7.31 ns 7.31 ns 94806000
BM_TextDrawCharColor 1564 ns 1564 ns 446296
BM_TextDrawCharColorEx 5.56 ns 5.56 ns 125895293
bench/text.cpp (SSE2)
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem 120893 ns 120893 ns 5727 (faster)
BM_TextDrawStrColor 57627 ns 57627 ns 11817 (faster)
BM_TextDrawCharSystem 3004 ns 3004 ns 234003 (faster)
BM_TextDrawCharSystemEx 7.37 ns 7.37 ns 93764693
BM_TextDrawCharColor 1427 ns 1427 ns 484388
BM_TextDrawCharColorEx 5.59 ns 5.59 ns 117193273 |
Cool. The normal Blit operation is the most common, a speedup in it helps alot :). |
SSE2 chosen based on discussions from EasyRPG#143
Problem is that WASM SIMD support is still a bit experimental and most browsers got it in 2022. So is imo risky to roll this out as it will break likely on some devices.
There is this patch by libreoffice: https://cgit.freedesktop.org/libreoffice/core/commit/?id=d5f5f0984510d6c1b453e31c1ad58fb29fed278b
And here some benchmarks:
lol. nope. Better luck next timenode provided by emscripten is too old. System Node works >.>I marked the interesting ones with a
<-
. Top: No SIMD. Bottom: SIMDBitmap:
Unfortunately the normal Blit does not become faster :/. This is the typical operation.
Draw:
(Crashes with out of memory in both builds)
Font:
Pixel Format:
Interestingly ARGB and ABGR become twice as fast but we already figured this out years ago and use RGBA and BGRA by default.
Text:
The text was updated successfully, but these errors were encountered: