Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On GLES2.0, Enabling FBE displays a black screen in all games while Disabling it causes Undisplayed graphics #1045

Closed
mckimiaklopa opened this issue Jul 1, 2016 · 64 comments

Comments

@mckimiaklopa
Copy link

Enabling FBE causes a black screen in all games.

Disabling it will either
A) still display a black screen
B) Show only some textures, black or white.

Mario 64 has white textures on the logo and save screen but has no ingame graphics.
Mario kart only displays graphics during the race main menu screen

My device is a Samsung GT-I9082 but based on this http://www.paulscode.com/forum/index.php?topic=8820.30 , it seems to hapoen on other devices too

@loganmc10
Copy link
Contributor

If you are able to compile and run GLideN64 with Debugging on you will probably see some kind of error in gliden64.log or an assert will trigger, I'm not sure how that works on Android devices @fzurita , but typically i don't think a fully black screen will happen without some kind of error being triggered in debug mode

@fzurita
Copy link
Contributor

fzurita commented Jul 2, 2016

I may have to make a build with debug mode enabled to pass around to try to debug this kind of error

@fzurita
Copy link
Contributor

fzurita commented Jul 7, 2016

That phone has a VideoCore GPU, same type as Raspberry pi, but I bet the drivers are way outdated since it's samsung.

@fzurita
Copy link
Contributor

fzurita commented Jul 7, 2016

Good news, I can reproduce this in my old Android phone, and I'm pretty sure this was working correctly before.

@loganmc10
Copy link
Contributor

I don't know if this is the same issue, but after the new blending code I got a black screen on the Raspberry Pi. I documented it here:

#982

And here is the pull request that fixed it:
#993

It doesn't like variable indexing, so I had to make that ugly patch.

Basically look at the stuff wrapped inside #ifdef VC and #ifndef VC in ShaderUtils.cpp, that might help (but I could be completely wrong, I had a black screen even with FBE disabled.

@loganmc10
Copy link
Contributor

@gonetz I should point out, that when you run GLideN64 through apitrace on a device with Intel Graphics you get this warning
Unsupported form of variable indexing in FS; falling back to very inefficient code generation

So it seems like other drivers tolerate variable indexing, but they don't like it either.

@fzurita
Copy link
Contributor

fzurita commented Jul 9, 2016

Shaders are compiling fine on my old GLES 2.0 device. I found the commit that broke my device. It was this one:

b7369e9

@gonetz Everything I read online tells me that discards are a horrible performance penalty for tile renderers. For example, see here: http://cdn.imgtec.com/sdk-documentation/PowerVR+Performance+Recommendations.The+Golden+Rules.pdf and search for discard.

Edit: Here is another interesting article: http://mainroach.blogspot.com/2014/04/dont-alpha-that-pixel.html

@loganmc10
Copy link
Contributor

loganmc10 commented Jul 9, 2016

@fzurita that is easy enough to test, you can just remove all the "discard" lines from the code and run it. I need to get the mupen64plus OSD working on the Raspberry Pi so I can see the framerate and be a bit more scientific about these things...

@gonetz
Copy link
Owner

gonetz commented Jul 9, 2016

@fzurita I don't have time to read the articles now, I'll do it later.
Anyway - what is alternative for 'discard' ?
If alpha test failed, the fragment calculation must finish without writing of any results: both color and depth buffer must remain untouched.

@fzurita
Copy link
Contributor

fzurita commented Jul 9, 2016

Both articles say that alpha blending is the other option. This can be done by calling glEnable(GL_BLEND) and setting the appropriate alpha for the fragColor.

I did try this and it didn't fix the issue though for the device. I didn't see much difference in performance either, but that could be because the graphics are bugged in this device.

@gonetz
Copy link
Owner

gonetz commented Jul 9, 2016

Both articles say that alpha blending is the other option. This can be done by calling glEnable(GL_BLEND) and setting the appropriate alpha for the fragColor.

It will not help. Pixel will be invisible, but its z-value will be stored in depth buffer, which will cause glitches.

@fzurita
Copy link
Contributor

fzurita commented Jul 9, 2016

There is apparently a solution for that. Looks simple:

https://www.sjbaker.org/steve/omniv/alpha_sorting.html

But maybe I'm misinterpreting.

Fortunately, OpenGL has a function that can prevent pixels with a specified set of alpha values from writing the colour or Z buffers. For example:

 glAlphaFunc ( GL_GREATER, 0.1 ) ;
 glEnable ( GL_ALPHA_TEST ) ;

@gonetz
Copy link
Owner

gonetz commented Jul 9, 2016

In OpenGL ES 2.0 glAlphaFunc isn't available, you have to implement it in a fragment shader. Via discard.

@fzurita
Copy link
Contributor

fzurita commented Jul 10, 2016

Ok, back to what actually broke my device. It was the change to this method:

void ShaderCombiner::updateAlphaTestInfo(bool _bForce) {

If I revert that to the previous version, my old device works fine. If I revert any of the other GLES 2.0 changes, the exact same issue remains. For reference, here is what the issue looks like in my phone:

img_20160709_230031

I have to take a picture because the old version of android doesn't support screen capture and the screenshot functionality is broken in this old version of the plugin as well.

It really looks like we are not discarding when we should be discarding. I have the feeling that something is going wrong in the GLSL.

Edit 1: If I force m_uniforms.uEnableAlphaTest.set(1, _bForce); in ShaderCombiner::updateAlphaTestInfo It fixes the trees and butterflies, but the windows remain broken.

Edit 2: In the old code, the alpha test value would only vary between 0.0, 0.125, and 0.5. In the new code, I'm seeing 0.0, 0.5, and this weird 265222766839497874294702080.000000. When this weird number shows up though, uEnableAlphaTest is 0. Edit 3: And that is because we don't set the alpha test value if uEnableAlphaTest is 0. Setting it to zero anyways didn't fix the issue.

Edit 4: Setting uEnableAlphaTest to always be enabled an setting a alpha test value of 0.125 also fixes the issue. It does add some buggy graphics where it would normally be black at the top and bottom of the screen though.

Edit 5: One thing I notice is that the alpha test is remaining disabled most of the time in the new version while it was enabled much more often int the old version. Since this works in newer devices, I'm thinking that there is some sort of lag in setting the uniforms.

Edit 6: OMG! Horrible driver bug! If there is already a discard in the code, any second discard does not get hit no matter what, it's like it doesn't exist. This happens even if the first discard is not even executed! I was able to fix the bug by only having one discard in the GLSL code in this fashion at the end of the GLSL shader:

" if ( ( uCvgXAlpha != 0 && alpha2 == 0.0 ) || ( uEnableAlphaTest != 0 && alphaValue < alphaTestValue) ) discard; \n" This code is also much slower than the previous commit, it loses about 25% in performance.

@fzurita
Copy link
Contributor

fzurita commented Jul 10, 2016

Having no discards in the GLSL code triples performance in this old device!

Edit 1: Is not the discard that is slowing things down. It's the above if statement. Keeping the if statement around and performing any action within that if statement reduces frame rate tremendously.

@loganmc10
Copy link
Contributor

Apple has some good guidelines that apply to shaders on GLES devices:

https://developer.apple.com/library/ios/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/BestPracticesforShaders/BestPracticesforShaders.html#//apple_ref/doc/uid/TP40008793-CH7-SW11

From what I read, if statements in general aren't great in a fragment shader, and if it's possible it can be better to just have different shaders handle the different scenarios, especially on GLES2 devices.. It also says that branching based on a value computed inside the shader is especially slow. In your if statement I believe that is true for alpha2, alphaValue, and alphaTestValue.

@fzurita
Copy link
Contributor

fzurita commented Jul 10, 2016

Yeah, that's a good point but that would mean switching shaders at a very high rate. This is really bad for GLES 2.0 since it has no shader cache.

I was thinking of doing math substitutions for all the if statements.

@loganmc10
Copy link
Contributor

If you want to see what it might be like with optimized GLSL code you could take a look at https://github.com/aras-p/glsl-optimizer.

I got it working with GLideN64, you can see the changes I made here (only did it for GLES2 for testing):
loganmc10@9bc63a8

The performance difference on the Pi was very noticeable. The "Statue" level on GoldenEye used to be constantly choppy and now it's smooth. The only thing I noticed is that the explosions were green, not their regular orange. Other games I tested looked normal. It's probably not the kind of thing that would be easy to integrate into GLideN64, but it can help test how optimized the current GLSL code is.

I don't know how hard it would be to get working on Android, it seems that some people have got it working (aras-p/glsl-optimizer#127)

@fzurita
Copy link
Contributor

fzurita commented Jul 10, 2016

Actually, I did make an attempt at getting this library working. And it seemed like I got it to work, but it produced a lot of corrupt graphics, so I gave up on it.

Edit 1: So I went to the latest version again and applied a similar fix, but I only had to do it in ShaderUtils.cpp since both discards were moved there. This cleaned it up a lot, but there are still issues. These new issues only started showing up after this commit:

0ad53ad

This is where the new blending was introduced. Here it is what is looks like after I make the code only have one discard operation:

img_20160710_160721

The sky is black, water doesn't look right, and shadows are all messed up.

Edit 2: Ok, found a fix! If I include ShaderUtils_VC.h, the problem is fixed! I think I'm going to submit a pull request that sets this as the default implementation in GLES 2.0 mode. I wonder if there is some sort of extension that we can check for that would tell us if we can use the other method.

Edit 3: This device loses 50% performance after the above commit.... So I'm now running at 10 fps!

@fzurita
Copy link
Contributor

fzurita commented Jul 10, 2016

@mckimiaklopa Please try this build with frame buffer emulation disabled:

https://drive.google.com/file/d/0B57Ioy26LWegcnRfd2FOU0JBQ1E/view?usp=sharing

Does it display graphics correctly? I'm sure speed won't be that great though.

@fzurita
Copy link
Contributor

fzurita commented Jul 11, 2016

So replacing:

    _strShader.append(
        "  if ( (uCvgXAlpha != 0 && alpha2 < 0.125) || ( uEnableAlphaTest != 0 && alphaValue < alphaTestValue) ) discard; \n"
    );

with

    _strShader.append(
        "  if ( uCvgXAlpha != 0 || uEnableAlphaTest != 0) discard; \n"
    );

double the frame rate at the cost of incorrect graphics. That tells me what loganmc10 suspects is correct.

@gonetz
Copy link
Owner

gonetz commented Jul 11, 2016

Double?!
I'm starting to think about "lite" version of the plugin: without alpha test, with old blending method etc.

@fzurita
Copy link
Contributor

fzurita commented Jul 11, 2016

That may not be a bad idea. Unless we can figure out how to improve the performance of the new blending on older devices. It could be done through a config parameter I think to avoid having to compile multiple versions, which can be a pain for android since we already have 4 versions.

Edit 1: If we were to use alpha blending instead of alpha test, then the above if statement could be replaced with some math instead, which would be a lot faster.

@gonetz
Copy link
Owner

gonetz commented Jul 11, 2016

If we were to use alpha blending instead of alpha test, then the above if statement could be replaced with some math instead, which would be a lot faster.

No, it could not. Even if we omit the fact that blending is a bad replacement for discard, blending can be used to make pixel invisible. However, condition, which pixels should be invisible still depends on pixel's alpha. It should be something like this:

    _strShader.append(
        "  if ( (uCvgXAlpha != 0 && alpha2 < 0.125) || ( uEnableAlphaTest != 0 && alphaValue < alphaTestValue) ) alpha2 = 0.0; \n"
    );

@fzurita
Copy link
Contributor

fzurita commented Jul 11, 2016

What about this using alpha blending?

    int alpha2_125Test = int(0.125/alpha2);
    int alphaValueTest = int(alphaTestValue/alphaValue);

    alpha2 = alpha2 - clamp(uCvgXAlpha, 0, 1)*clamp(alpha2_125Test, 0, 1)*alpha2;
    alpha2 = alpha2 - clamp(uEnableAlphaTest, 0, 1)*clamp(alphaValueTest, 0,1)*alpha2;
    alpha2 = clamp(alpha2, 0.0, 1.0);

This would replace the boolean logic.

@loganmc10
Copy link
Contributor

@fzurita
I tested that, I had to modify 2 lines to look like this for it to work though:

alpha2 = alpha2 - clamp(float(uCvgXAlpha), 0.0, 1.0)*clamp(float(alpha2_125Test), 0.00, 1.0)*alpha2;
alpha2 = alpha2 - clamp(float(uEnableAlphaTest), 0.0, 1.0)*clamp(float(alphaValueTest), 0.0,1.0)*alpha2;

And I got rid of if (uEnableAlphaTest != 0) { as well

The performance seems around the same on the Raspberry Pi. The interesting thing is that that change fixes the explosion colour in GoldenEye using the GLSL Optimiser, everything looks normal now for me using that library.

I'll play around with it a bit more to make sure I'm not doing something wrong.

@fzurita
Copy link
Contributor

fzurita commented Jul 11, 2016

I have yet to try this on my device. I wrote that while I'm still here at work to illustrate the concept. I think it will make my device faster though.

It's possible that GLSL optimizer is already doing something like this under the hood.

@loganmc10
Copy link
Contributor

I just tested it with Super Smash Bros and there are black boxes around things, I assume that is the alpha not working correctly?

@fzurita
Copy link
Contributor

fzurita commented Jul 11, 2016

You have to enable alpha blending. I think it's getting disabled at times. Try getting rid of all the glDisable(GL_BLEND)

@fzurita
Copy link
Contributor

fzurita commented Jul 11, 2016

I did attempt this in my old device and it double the frame rate. But I don't think I have the GL_BLEND set correctly since I'm not getting all transparencies working correctly.

Edit 1: Never mind, it only provides about a 20% percent performance boost. I wasn't measuring the same spot.

Edit 2: It looks like I need to set the correct glBlendFunc as well. I don't have the understanding to do that.

@fzurita
Copy link
Contributor

fzurita commented Jul 13, 2016

These optimizations leave the plugin with broken graphics. I'm just making them to see the potential improvements in performance that alpha blending could give to older devices.

@fzurita
Copy link
Contributor

fzurita commented Jul 13, 2016

I'm going to test these changes on my newer phone to see what they do to performance later tonight.

Edit 1: I just tried this on my newer Adreno device and it gave zero performance advantage. It doesn't seem to hurt performance either though.

Edit 2: @gonetz Can you try this branch in your device and see if it change performance at all? (it will have broken graphics though due to not having the GL_BLEND or gl blending function set correctly. https://github.com/mupen64plus-ae/mupen64plus-ae/tree/GLideN64_fix_GLES2

@gonetz
Copy link
Owner

gonetz commented Jul 13, 2016

I tested on my Note2 (Mali 400 GPU) - no difference with current master.

@fzurita
Copy link
Contributor

fzurita commented Jul 13, 2016

So it must only affect specific devices. I'm going to try it in my PowerVR device later tonight. I read strong warnings about using discard with them.

Edit 1: I see no significance difference with PowerVR devices either. I don't think it's worth adding alpha blending for an old obsolete device.

@loganmc10
Copy link
Contributor

loganmc10 commented Jul 14, 2016

So what actually fixed the black screen/undisplayed graphics, was it using ShaderUtils_VC.h?

@fzurita
Copy link
Contributor

fzurita commented Jul 14, 2016

Yes and only having 1 discard operation per shader code. I'm not sure about making a pull request with those fixes since it does slow things down a lot for devices that already work correctly.

The other option is run time checks for specific devices, so I would use ShaderUtils_VC.h if a specific device is detected.

@loganmc10
Copy link
Contributor

loganmc10 commented Jul 14, 2016

@fzurita with regards to this line:

if (uCvgXAlpha != 0 && alpha2 < 0.125) discard;

@gonetz asked if this would give better performance:

if (uCvgXAlpha != 0) {
            if (alpha2 < 0.125)
               discard;
          }

And I you said it didn't. I believe that is because GLSL compilers automatically do &&'ing if there are 2 if statements one after the other (the GLSL Optimizer does this, which is based off of Mesa), effectively reversing that change, but if you do something like this:

if (uCvgXAlpha != 0) {
  lowp float testing = clamp(alpha2, 0.0, 0.125);
  if (testing != 0.125) discard;
}

Then it does it properly. I think I did the clamp() correctly? I'm curious if that improves performance on your older device, since it won't be evaluating the "shader computed" variable as much.

@gonetz
Copy link
Owner

gonetz commented Jul 15, 2016

I don't think it's worth adding alpha blending for an old obsolete device.

Well, shader based alpha blending can be made optional with old blending method as the alternative.

@fzurita
Copy link
Contributor

fzurita commented Jul 17, 2016

I believe that is because GLSL compilers automatically do &&'ing if there are 2 if statements one after the other (the GLSL Optimizer does this, which is based off of Mesa), effectively reversing that change

Normally, there are short circuit mechanisms so that if an boolean operation evaluates up to a point where the outcome of the operation cannot change even if the rest operation is performed, then the rest of the operation won't be performed. See this for a more detailed example: http://www.mathworks.com/help/matlab/ref/logicaloperatorsshortcircuit.html?requestedDomain=www.mathworks.com

This has applied to every programming language I have used so far. I would be surprised if GLSL didn't implement boolean short circuits.

This does mean though that reversing the logic may have a positive effect.

@fzurita
Copy link
Contributor

fzurita commented Jul 17, 2016

I think I'm going to work on adding the old blending back as an option. So it can still be used with a configuration parameter.

@loganmc10
Copy link
Contributor

I think the difference with GLSL is that (I don't know if this is true for every piece of hardware) based on what I read, most GPU's will pre-compile the shader based on the uniform values, so they get "hard-coded" into the shader for that run, until the uniform changes. So if you have an if statement with just a uniform in it, and that statement evaluates to false, it just skips over that block of code. However if there is a "shader computed" value in the if statement, it needs to evaluate that statement every time when it gets there.

But I guess it goes back to what you're saying, I don't know if every GPU would be smart enough to short circuit that if statement. I thought maybe it would fix your "double discard" issue, but I guess that's kind of a shot in the dark.

@fzurita
Copy link
Contributor

fzurita commented Jul 17, 2016

Gotcha. I'm think about adding 3 different blending modes. Here is what I think they should be:

  • Accurate ( using code after this commit: 0ad53ad)
  • Fast ( using code before this commit: 0ad53ad)
  • Accurate with device fix ( using code after commit 0ad53ad but with the ShaderUtils_VC.h implementation of the GLSL code)

I wish there was a way to optimize the "Accurate" path, if there is a way, I'm sure the fast path could be removed.

To simplify things, I think a fast blending mode could have fog always enabled, but that would make it less fast, although still faster.

@loganmc10
Copy link
Contributor

Well something to try would be to compile different shaders based on the different "if" options instead of branching based on the uniforms. You could also try commenting out these lines:

https://github.com/gonetz/GLideN64/blob/master/src/ShaderUtils.cpp#L319-L322

To see if the variable indexing is causing the performance problems (commenting out those lines caused messed up graphics obviously)

@fzurita
Copy link
Contributor

fzurita commented Jul 17, 2016

Yeah, compiling different shaders depending on the different If options maybe worth trying for performance.

The bad thing about GLES 2.0 though is that there is no shader cache, so each time we switch shaders, the GLSL needs to be recompiled. These switches may not happen very often though. It's worth trying.

In my old device, variable indexing is causing incorrect graphics. If I switch the variable indexing for the ShaderUtils_VC.h version, graphics look correct, but frame rate is reduced by a good amount. I will try completely commenting out those lines to see how much of a performance hit they are.

@gonetz
Copy link
Owner

gonetz commented Jul 17, 2016

legacy blending:
cd1b134

@gonetz
Copy link
Owner

gonetz commented Jul 17, 2016

@fzurita @loganmc10 please check my legacy blending branch - is it worth to keep that option for slow devices, which have problems with new blending?

@fzurita
Copy link
Contributor

fzurita commented Jul 17, 2016

I will try this tonight. If the option to enable legacy blending is enabled/disabled, will the shader cache be regenerated when switching?

@gonetz
Copy link
Owner

gonetz commented Jul 18, 2016

New option: enableLegacyBlending. I've added that option for list of options checked by shader cache, thus the cache must update when the option switched.

@fzurita
Copy link
Contributor

fzurita commented Jul 18, 2016

Gotcha, see it now in ShaderCombiner::getShaderCombinerOptionsSet

@gonetz
Copy link
Owner

gonetz commented Jul 18, 2016

I forgot to mention, that legacy blending option includes only "legacy blending", without hacks for special blending modes. Thus, several games, e.g Mace, Perfect Dark, Bomberman, will have various glitches. This option is for speed.

@loganmc10
Copy link
Contributor

This works very well on the Raspberry Pi, there is a noticeable improvement in Goldeneye and Perfect Dark, which are both games that have slowdowns for me. I'm okay with some graphical glitches if it improves the speed. For devices that "can't keep up" like GLES2 devices there is going to have to be a trade off between accuracy and speed

@fzurita
Copy link
Contributor

fzurita commented Jul 18, 2016

This is a very big speed up in my old device. Mario 64 can run full speed. There are still issues with having more than one discard in the code at the same time though.

@gonetz
Copy link
Owner

gonetz commented Jul 18, 2016

Thanks for the info. I'll put it to master then.

@fzurita
Copy link
Contributor

fzurita commented Jul 18, 2016

This is a HUGE speed up for PowerVR devices. About 40% improvement in frame rate. @gonetz You should try your devices to see if it fixes GLES 3.0 slow downs.

@fzurita
Copy link
Contributor

fzurita commented Jul 18, 2016

I updated the master branch of Mupen64plus-AE with this change:

https://github.com/mupen64plus-ae/mupen64plus-ae

@fzurita
Copy link
Contributor

fzurita commented Jul 20, 2016

I'm getting reports that this is fixing the issues with older devices, so I think this should be closed.

@gonetz
Copy link
Owner

gonetz commented Jul 20, 2016

Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants