Norm computation issue #31

antoine-morvan · 2023-08-14T14:28:02Z

Hello,

Problem

I tried a configuration that make the computation go way off, (E+03). However, when using --norm to check deviation, the max error combined is 0. This is because 0 is greater than -999.

Still, the computation is wrong and this should get caught.

My wrapper script was missing this deviation because it is focusing solely on the combined result.

Setup

To reproduce:

compile with aocc 4.1.0; OpenMPI 4.1.5; OpenBLAS 0.3.23; FFTW 3.3.10

export CFLAGS="-O3 -march=native -mtune=native"
export CXXFLAGS="$CFLAGS"
export FCFLAGS="$CFLAGS"

could reproduce on latest AMD & Intel CPUs
run ectrans-benchmark-dp --norms -n 5 -l 137 -t 319 --vordiv --scders

Some Leads ?

After few investigation, I spotted 2 potential causes:

The max error is initialized with zmaxerr(:) = -999.0. It would be wiser to initialize the max error with 0.

ectrans/src/programs/ectrans-benchmark.F90

Line 756 in 8d13aa3

zmaxerr(:) = -999.0

When enabling verbosity (and printing the divider), we could observe half of the arrays znormvor(:) znormdiv(:) znormt(:) znormsp(:) comming with NaN values.

ectrans/src/programs/ectrans-benchmark.F90

Lines 757 to 784 in 8d13aa3

    
           do ifld = 1, nflevg 
        
             zerr(3) = abs(znormvor1(ifld)/znormvor(ifld) - 1.0d0) 
        
             zmaxerr(3) = max(zmaxerr(3), zerr(3)) 
        
             if (verbosity >= 1) then 
        
               write(nout,'("norm zspvor( ",i4,")     = ",f20.15,"        error = ",e10.3)') ifld, znormvor1(ifld), zerr(3) 
        
             endif 
        
           enddo 
        
           do ifld = 1, nflevg 
        
             zerr(2) = abs(znormdiv1(ifld)/znormdiv(ifld) - 1.0d0) 
        
             zmaxerr(2) = max(zmaxerr(2),zerr(2)) 
        
             if (verbosity >= 1) then 
        
               write(nout,'("norm zspdiv( ",i4,",:)   = ",f20.15,"        error = ",e10.3)') ifld, znormdiv1(ifld), zerr(2) 
        
             endif 
        
           enddo 
        
           do ifld = 1, nflevg 
        
             zerr(4) = abs(znormt1(ifld)/znormt(ifld) - 1.0d0) 
        
             zmaxerr(4) = max(zmaxerr(4), zerr(4)) 
        
             if (verbosity >= 1) then 
        
               write(nout,'("norm zspsc3a(",i4,",:,1) = ",f20.15,"        error = ",e10.3)') ifld, znormt1(ifld), zerr(4) 
        
             endif 
        
           enddo 
        
           do ifld = 1, 1 
        
             zerr(1) = abs(znormsp1(ifld)/znormsp(ifld) - 1.0d0) 
        
             zmaxerr(1) = max(zmaxerr(1), zerr(1)) 
        
             if (verbosity >= 1) then 
        
               write(nout,'("norm zspsc2( ",i4,",:)   = ",f20.15,"        error = ",e10.3)') ifld, znormsp1(ifld), zerr(1) 
        
             endif 
        
           enddo

Use this to print the divider:

  verbosity=1
  zmaxerr(:) = 0
  do ifld = 1, nflevg
    zerr(3) = abs(znormvor1(ifld)/znormvor(ifld) - 1.0d0)
    zmaxerr(3) = max(zmaxerr(3), zerr(3))
    if (verbosity >= 1) then
      write(nout,'("norm zspvor( ",i4,")     = ",f20.15,f20.15,"        error = ",e10.3)') ifld, znormvor1(ifld), znormvor(ifld), zerr(3)
    endif
  enddo
  do ifld = 1, nflevg
    zerr(2) = abs(znormdiv1(ifld)/znormdiv(ifld) - 1.0d0)
    zmaxerr(2) = max(zmaxerr(2),zerr(2))
    if (verbosity >= 1) then
      write(nout,'("norm zspdiv( ",i4,",:)   = ",f20.15,f20.15,"        error = ",e10.3)') ifld, znormdiv1(ifld), znormdiv(ifld), zerr(2)
    endif
  enddo
  do ifld = 1, nflevg
    zerr(4) = abs(znormt1(ifld)/znormt(ifld) - 1.0d0)
    zmaxerr(4) = max(zmaxerr(4), zerr(4))
    if (verbosity >= 1) then
      write(nout,'("norm zspsc3a(",i4,",:,1) = ",f20.15,f20.15,"        error = ",e10.3)') ifld, znormt1(ifld),znormt(ifld), zerr(4)
    endif
  enddo
  do ifld = 1, 1
    zerr(1) = abs(znormsp1(ifld)/znormsp(ifld) - 1.0d0)
    zmaxerr(1) = max(zmaxerr(1), zerr(1))
    if (verbosity >= 1) then
      write(nout,'("norm zspsc2( ",i4,",:)   = ",f20.15,f20.15,"        error = ",e10.3)') ifld, znormsp1(ifld),znormsp(ifld), zerr(1)
    endif
  enddo

This could come from these arrays being initialized with a function declared as C binding, iterating over non-contiguous segments.

But that would require more investigation to confirm :)

Best.

The text was updated successfully, but these errors were encountered:

wdeconinck · 2023-08-29T11:03:41Z

Thank you for this report. We will look into this.

samhatfield · 2023-08-29T11:04:08Z

Hi Antoine,

I don't understand how it's possible to get a max error of -999. The initial value of -999 is compared against the output of abs which is positive semidefinite. Even when e.g. znormvor(ifld) is NaN, and therefore zerr(3) is NaN, I found that max then also produces a NaN. NaN should then appear as the max error combined. This was for the Intel compiler.

Could this be compiler-specific? Does the operation max(-999, NaN) return -999 for AOCC?

antoine-morvan · 2023-10-03T09:06:45Z

Hello,

I observed this behavior with AOCC 4.1 only (well, I did not try the whole thing with all the SW stacks at hand). Despite the max operation behaving similarly with other compilers (see below), something must be wrong somewhere else too.

Regarding the result of this specific max operation, here is the result with some compilers :

! print NaN too
print *, max(-999, NaN), NaN

With default flags (e.g., gcc input.F90)

aocc:4.1.0 :             0            0
gcc:13.2.0 :   1064675189  1064675189
nvhpc:23.7 :             0            0
llvm:16.0.6 :  0 0
oneapi:2023.1.0 :         -999           0
ifort:2023.1.0 :            0           0

With aggressive flags (e.g., gcc -O3 -fastmath -march=native -mtune=native input.F90)

aocc:4.1.0 :             0            0
gcc:13.2.0 :            0           0
nvhpc:23.7 :          -999     15208769
llvm:16.0.6 :  1423361856 858928177
oneapi:2023.1.0 :         -999           0
ifort:2023.1.0 :            0           0

samhatfield · 2023-10-03T14:02:58Z

Is this pseudo-code, or should I actually be able to compile this?:

print *, max(-999, NaN), NaN

The reason I ask is because I don't recognise NaN as a Fortran keyword. And indeed, ifort gives

main.f90(4): error #6404: This name does not have a type, and must have an explicit type.   [NAN]
    print *, max(-999, NaN), NaN
-----------------------^

Also I'm not sure what result you should expect when comparing -999 (an integer literal) with NaN (a floating-point literal).

Going back to the problem, I would need to be able to reproduce it exactly to figure out what's going wrong. Could you share the modifications you've made to produce these error norms? Perhaps I can reproduce the problem with ifort?

I tried running your benchmark command after building ecTrans with intel/2021.4.0 but it gives this

======= End of spectral transforms  =======

max error zspvor(1:nlev,:)    =  0.999E-14
max error zspdiv(1:nlev,:)    =  0.999E-14
max error zspsc3a(1:nlev,:,1) =  0.173E-13
max error zspsc2(1:1,:)       =  0.173E-13

max error combined =          =  0.173E-13

======= Start of time step stats =======

Again I can't see how it's possible for this calculation to give -0.999E+03:

! MUST be >= 0.0
zerr(3) = abs(znormvor1(ifld)/znormvor(ifld) - 1.0d0)
! Also MUST be >= 0.0
zmaxerr(3) = max(zmaxerr(3), zerr(3))

I think the problem must be related to NaNs somehow but unless I can reproduce it with ifort I'm not much help :(

wdeconinck · 2024-01-08T12:18:06Z

@antoine-morvan is there any update on this issue?

antoine-morvan · 2024-01-09T13:06:52Z

Hello,

I did not have time to work on this lately.

Best regards.

wdeconinck assigned samhatfield Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Norm computation issue #31

Norm computation issue #31

antoine-morvan commented Aug 14, 2023 •

edited

Loading

wdeconinck commented Aug 29, 2023

samhatfield commented Aug 29, 2023

antoine-morvan commented Oct 3, 2023 •

edited

Loading

samhatfield commented Oct 3, 2023

wdeconinck commented Jan 8, 2024

antoine-morvan commented Jan 9, 2024

Norm computation issue #31

Norm computation issue #31

Comments

antoine-morvan commented Aug 14, 2023 • edited Loading

Problem

Setup

Some Leads ?

wdeconinck commented Aug 29, 2023

samhatfield commented Aug 29, 2023

antoine-morvan commented Oct 3, 2023 • edited Loading

samhatfield commented Oct 3, 2023

wdeconinck commented Jan 8, 2024

antoine-morvan commented Jan 9, 2024

antoine-morvan commented Aug 14, 2023 •

edited

Loading

antoine-morvan commented Oct 3, 2023 •

edited

Loading