Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float16sr subnormals also rounds somewhere else then up/down #62

Open
milankl opened this issue Jan 12, 2023 · 14 comments · Fixed by #67 or #73
Open

Float16sr subnormals also rounds somewhere else then up/down #62

milankl opened this issue Jan 12, 2023 · 14 comments · Fixed by #67 or #73
Labels
bug Something isn't working float16 subnormals

Comments

@milankl
Copy link
Owner

milankl commented Jan 12, 2023

This test failure keeps coming up

 Test for U(0,1): Test Failed at /home/runner/work/StochasticRounding.jl/StochasticRounding.jl/test/float16sr.jl:114
  Expression: Ndown + Nup == N
   Evaluated: 99998 == 100000

Meaning that there's some stochastic rounding with Float16sr that isn't to either of the prev/next representable number but to something else.

@milankl milankl added the bug Something isn't working label Jan 12, 2023
@milankl milankl added this to the v0.7 milestone Jan 12, 2023
@tomkimpson
Copy link
Contributor

This test works for me locally...

This was referenced Jan 18, 2023
@milankl
Copy link
Owner Author

milankl commented Jan 19, 2023

This test works for me locally...

Yes, it depends on the state of your RNG. There's probably a small subset of Float16s where something odd happens and you'd likely miss them in tests... but apparently not always! Feel free to have a look why and where this happens.

@milankl
Copy link
Owner Author

milankl commented Jun 29, 2023

@avleenk2312 Apparently it's a problem with the subnormals in Float16sr, but only one of them

[ Info: Test failed for -3.874302f-6, 1000000001000001

The bits are Float16 which the Float32 is stochastically rounded to

julia> bitstring(-3.874302f-6,:split)
"1 01101101 00000100000000000000000"

It's a Float32 that is perfectly representable as Float16 subnormal as mantissa bits >10 are all zero. So I reckon it's incorrectly rounded up or down at some very rare chance that we'd need to get to zero to make the tests pass again!

@avleenk2312
Copy link
Contributor

It's a Float32 that is perfectly representable as Float16 subnormal as mantissa bits >10 are all zero. So I reckon it's incorrectly rounded up or down at some very rare chance that we'd need to get to zero to make the tests pass again!

I have tried this test for U(0,1) on my end several times and it works fine.

@testset "Test for U(0,1)" begin
           for x in rand(Float32,10_000)
               Ndown,Nup,N,p = test_chances_round(x)
               @test Ndown + Nup == N
               @test isapprox(Ndown/N,1-p,atol=2e-2)
               @test isapprox(Nup/N,p,atol=2e-2)
           end
       end
Test Summary:   |  Pass  Total  Time
Test for U(0,1) | 30000  30000  6.9s
Test.DefaultTestSet("Test for U(0,1)", Any[], 30000, false, false, true, 1.688065293434874e9, 1.688065300322289e9)

For the following example:

julia> bitstring(-3.874302f-6,:split)
"1 01101101 00000100000000000000000"

Say, x=-3.874302f-6, then Float16_chance_roundup(x)=0.0f0 always as xround==x, soFloat16sr(x)=Float16sr(-3.874302e-6) always. I mean, this particular case shouldn't be rounding up or down and it isn't for me. I am going to try it on my lab machine to see if the results differ.

@avleenk2312
Copy link
Contributor

avleenk2312 commented Jun 29, 2023

It is failing on my lab machine. It keeps failing for the following examples with different values of Ndown and Nup.

For x=1.2934208e-5, round=Float16sr(1.2934208e-5), Ndown=99996, Nup=3, N=100000, Nup+Ndown=99999, p=0.0
For x=5.453825e-5, round=Float16sr(5.453825e-5), Ndown=99995, Nup=3, N=100000, Nup+Ndown=99998, p=0.0
For x=1.2934208e-5, round=Float16sr(1.2934208e-5), Ndown=99998, Nup=1, N=100000, Nup+Ndown=99999, p=0.0
For x=5.453825e-5, round=Float16sr(5.453825e-5), Ndown=99995, Nup=2, N=100000, Nup+Ndown=99997, p=0.0
For x=1.2934208e-5, round=Float16sr(1.2934208e-5), Ndown=99997, Nup=2, N=100000, Nup+Ndown=99999, p=0.0
For x=5.453825e-5, round=Float16sr(5.453825e-5), Ndown=99996, Nup=1, N=100000, Nup+Ndown=99997, p=0.0
For x=5.453825e-5, round=Float16sr(5.453825e-5), Ndown=99991, Nup=7, N=100000, Nup+Ndown=99998, p=0.0

According to the definition, the values must be Nup=0 and Ndown=10000 for all of these cases.

When I redefined the test_chances_round function with the following roundup and rounddown definitions:

    if Float32(f16_round) == f32
        f16_roundup=f16_round
        f16_rounddown=f16_round
    elseif Float32(f16_round) < f32
        f16_rounddown = f16_round
        f16_roundup = nextfloat(f16_round)
    else
        f16_roundup = f16_round
        f16_rounddown = prevfloat(f16_round)
    end

Then, all the tests passed:

@testset "Test for U(0,1)" begin
           for x in rand(Float32,10_000)
               Ndown,Nup,N,p= test_chances_round(x)
               if Nup+Ndown != N
                   println("x=",x, " p=",p," Nup=", Nup, " Ndown=", Ndown," Nup+Ndown=",Nup+Ndown, " N=", N)
               end
               @test Ndown + Nup == N
               @test isapprox(Ndown/N,1-p,atol=2e-2)
               @test isapprox(Nup/N,p,atol=2e-2)
           end
       end
Test Summary:   |  Pass  Total  Time
Test for U(0,1) | 30000  30000  5.2s
Test.DefaultTestSet("Test for U(0,1)", Any[], 30000, false, false, true, 1.688119768151741e9, 1.688119773349186e9)

Do you observe this on your machine as well? Is it okay to define the tests this way for such cases?

@milankl
Copy link
Owner Author

milankl commented Jun 30, 2023

Yes, I agree one could redefine what up/down means in the case the result is already exact in the subnormals. But it shouldn't solve the problem:

The subnormals are currently stochastically rounded (which should be revised see below) with float arithmetic following

$$[x + \epsilon*(r-1.5)]$$

with $r \in [1,2)$ because that's easy to generate with a PRNG, $\epsilon$ the distance between subnormals (its constant) so that we may end up with $x - \epsilon/2, x + \epsilon/2$, and the idea is then to use in practice something slightly smaller than $\epsilon$ to prevent rounding away from $x$ in the exact case where our tests fail. Turns out, this wasn't so smart because this still happens:

With

const ϵ = prevfloat(Float32(nextfloat(zero(Float16))))

function g(x::Float32,r::Float32)
    x + ϵ*(r-1.5f0)
end

function h(x::Float32)
    # deterministically rounded correct value in uint16
    c = Int(reinterpret(UInt16,Float16(x)))

    v = zeros(Int16,2^23)
    r = 1f0

    # loop over all possible random perturbations r in order
    for i in 0:2^23-1
        # subtract the correct c in int arithmetic to have -1 for round down, 0 for no rounding, 1 round up
        v[i+1] = Int(reinterpret(UInt16,Float16(g(x,r)))) - c
        r = nextfloat(r)
    end
    return v
end

I get

julia> sum(h(x) .== -1)
64

julia> sum(h(x) .== 0)
8388479

julia> sum(h(x) .== 1)
63

So of the 2^23 random numbers $r$ we can have, the first 64 actually round $x$ down, the middle 8388479 are fine, but the largest 63 round up.

@milankl
Copy link
Owner Author

milankl commented Jun 30, 2023

If we make $\epsilon$ smaller, i.e.

const ϵ = prevfloat(Float32(nextfloat(zero(Float16))),2^8+1)

5.960373f-8 instead of 5.960464f-8 it works

julia> sum(h(x) .== -1)
0

julia> sum(h(x) .== 0)
8388608

julia> sum(h(x) .== 1)
0

No incorrect round away from x anymore. Sanity check that nextfloat(x), prevfloat(x) have a non-zero chance to be rounded up,down respectively

julia> sum(h(nextfloat(x)) .== 1)
127

julia> sum(h(prevfloat(x)) .== -1)
128

👍🏼

But why it has to be 5.960373f-8 instead of 5.960464f-8? I don't know. That's where float arithmetic gets really complicated and I try to do as much as possible with integer arithmetic because it's just way easier to understand

@avleenk2312
Copy link
Contributor

Yeah, I didn't expect that definition of roundup and rounddown to work, as it should be Float16_stochastic_round should still be rounding things to something else. Somehow I got no error even though I tried several times on my lab machine. I guess I need to keep trying to see if error shows up again. I tried with more number of tests as well and it still works.

Thanks a lot for explaining things so clearly. Probably, you've seen this: JuliaLang/julia#49698

@milankl
Copy link
Owner Author

milankl commented Jun 30, 2023

I've just implemented a float-arithmetic free version as

function Float16_stochastic_round(x::Float32)
    rbits = rand(Xor128[],UInt32)   # create random bits

    ui = reinterpret(UInt32,x)
    shift = ((0x7f800000 & ui) >> 23) - 103    # get shift from exponent bits: 10 for floatmin(Float16), +-1 for every power of 2
    mask = 0x007fffff >> max(0,min(shift,10))    # normal mask 0x0000_1fff but << 1 for every power 2 in the subnormal range

    ui += (rbits & mask)        # add perturbation in [0,u)
    ui &= ~mask                 # round to zero

    # via conversion to Float16 to adjust exponent bits
    return Float16sr(reinterpret(Float32,ui))
end

which seems to pass all tests. While this version gets (most of) the subnormals right, it cannot easily deal with x<minpos(Float16) due to the power-2 logic Float32 in the subnormals of Float16 which however, are uniformly distributed. For example, minpos/2 should round 50% to zero 50% to minpos, but it'll round to zero only

julia> x = Float32(nextfloat(zero(Float16)))/2
2.9802322f-8

julia> Float16_stochastic_round(x)
Float16sr(0.0)

It's also somewhat slower, so I'm considering to stick to the current idea of branching off when abs(x) < floatmin(Float16) and then use float arithmetic.

@milankl milankl changed the title Float16sr also rounds somewhere else then up/down Float16sr subnormals also rounds somewhere else then up/down Jul 7, 2023
@milankl milankl reopened this Jul 11, 2023
@milankl
Copy link
Owner Author

milankl commented Jul 11, 2023

Reopened as the problem apparently perists, see failed test in b4b3068 from #61

@avleenk2312
Copy link
Contributor

I'll try again to see if the problem shows up on my Mac, as last time it never failed on it and I tried on it several times.

@milankl
Copy link
Owner Author

milankl commented Dec 22, 2023

Solved with 20aa678 the random perturbation for Float16sr subnormals is made somewhat smaller with a tweaked mask

function rand_subnormal(rbits::UInt32)
    lz = leading_zeros(rbits)   # count leading zeros for correct probabilities of exponent
    e = ((101 - lz) % UInt32) << 23
    e |= (rbits << 31)          # use last bit for sign

    # combine exponent with random mantissa
    # the mask should be 0x007f_ffff but that can create a
    # Float16-representable number to be rounded away at very low chance
    # hence tweak the mask so that the largest perturbation is tiny bit smaller
    return reinterpret(Float32,e | (rbits & 0x007f_fdff))
end

@milankl milankl linked a pull request Dec 22, 2023 that will close this issue
@milankl
Copy link
Owner Author

milankl commented Dec 28, 2023

Stupid subnormals, there's still something wrong here

[ Info: 5.5491924e-5 although in [Float16(5.55e-5),Float16(5.555e-5)] was rounded to Float16(5.543e-5)
[ Info: 5.5491924e-5 although in [Float16(5.55e-5),Float16(5.555e-5)] was rounded to Float16(5.543e-5)
Test for U(0,1): Test Failed at /home/runner/work/StochasticRounding.jl/StochasticRounding.jl/test/general_tests.jl:184
  Expression: Ndown + Nup == N
   Evaluated: 99998 == 100000

Stacktrace:
 [1] macro expansion
   @ /opt/hostedtoolcache/julia/1.10.0/x64/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/work/StochasticRounding.jl/StochasticRounding.jl/test/general_tests.jl:184 [inlined]
 [3] macro expansion
   @ /opt/hostedtoolcache/julia/1.10.0/x64/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/work/StochasticRounding.jl/StochasticRounding.jl/test/general_tests.jl:182 [inlined]
 [5] macro expansion
   @ /opt/hostedtoolcache/julia/1.10.0/x64/share/julia/stdlib/v1.10/Test/src/Test.jl:1669 [inlined]
 [6] top-level scope
   @ ~/work/StochasticRounding.jl/StochasticRounding.jl/test/general_tests.jl:1690
[ Info: Test failed for 5.5491924e-5, 00111000011010001100000000000000

when rounding 5.5491924f-5 which is a float32 that's already perfectly representable as float16 subnormal (mantissa bit 11-23 are zero) meaning it shouldn't be rounded at all, but somehow this still happens at a tiny chance.

@milankl
Copy link
Owner Author

milankl commented Dec 28, 2023

Reopen due the last post ☝🏼

@milankl milankl reopened this Dec 28, 2023
@milankl milankl removed this from the v0.7 milestone Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working float16 subnormals
Projects
None yet
3 participants