Inlet test case blowing-up

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
mmahone
Posts: 4
Joined: Fri Feb 11, 2011 2:26 pm
Location: AMEC

Inlet test case blowing-up

#1 Unread post by mmahone »

Hello,
I am a new ROMS user. I have upgraded my code to ROMS 3.4 revision 550 lately on my CYGWIN. I have run successfully several cases both in serial and in parallel (canyon, test_chan, test_head)...until I have tried the inlet test case:
mpirun -np 2 ./oceanM coupling_inlet_test.in
The coupled model (ROMS + SWAN) has run 19062 time steps, and then it is blowing up, saving latest model state info RESTART file.
Looking at the bed thickness (m) in this restart file, the min appears to be equal to 5.1000e-4, while the max is equal to 1.000e+37....
I would need some help to solve my problem :(
Thanks,

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Inlet test case blowing-up

#2 Unread post by m.hadfield »

mmahone wrote:Hello,
I am a new ROMS user. I have upgraded my code to ROMS 3.4 revision 550 lately on my CYGWIN. I have run successfully several cases both in serial and in parallel (canyon, test_chan, test_head)...until I have tried the inlet test case:
mpirun -np 2 ./oceanM coupling_inlet_test.in
The coupled model (ROMS + SWAN) has run 19062 time steps, and then it is blowing up, saving latest model state info RESTART file.
Looking at the bed thickness (m) in this restart file, the min appears to be equal to 5.1000e-4, while the max is equal to 1.000e+37....
I would need some help to solve my problem :(
Thanks,
Same here. So probably not a problem with your or my setup, just an indication that the test case needs tweaking.

The value of 1.000e+37 means that the ROMS variable has gone out of the bounds supported by floating point values on your computer, i.e., the model has blown up (but you already knew that). Caution: when models go unstable, all the variables tend to increase without bounds together, so an increase in bed thickness is not necessarily the root cause of the crash.

jcwarner
Posts: 1172
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Inlet test case blowing-up

#3 Unread post by jcwarner »

just did an svn update, saw many files wing past, and then compiled inlet_test and it ran fine.

The magic 1E37 number that you have is most likely the fantastic FillValue that I am soooo fond of. It really helps to debug the code, doesn't it? In my version I have a cpp option to deactivate FillValues, so when the code crashes i can see the difference between the fill locations and the bugs.

So in short, i can not reproduce your error.
What compiler are u using?

-john

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Inlet test case blowing-up

#4 Unread post by m.hadfield »

My ROMS source is up-to-date, I think. (I last downloaded changes on 6 May. I can't check for more recent updates as the svn client on the machine with the ROMS source cannot connect to the server, though my Web browser and a client on another machine can. Odd :? )

I used Gfortran 4.1.2 on SUSE Linux. Mmahone said he used Cygwin, so it's likely that the compiler was Gfortran as well.

Mine crashed at time step 19072. Mmahone said 19062.

I haven't had a look at the location and nature of the crash yet.

I may try another compiler & machine.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Inlet test case blowing-up

#5 Unread post by m.hadfield »

The crash is associated with a spike of large negative temperatures (-2500 degC) extending from surface to bottom at location (43,35) on the rho grid (i.e. adjacent to the land mask on the eastern side of the mouth of the inlet). Temperature is the only variable that appears to have unreasonable values.

At the time of the crash (time step 19072, t=1 02:29:20) the oscillating flow through the mouth has gone through several oscillations and the horizontal velocities during the final one are no larger than during the previous ones. However at the time of the crash a checkerboard pattern is evident in the horizontal and vertical velocities.

The time step printout shows that Cw exceeds 1 at this location at time step 18642 and reaches as high as 2.

mmahone
Posts: 4
Joined: Fri Feb 11, 2011 2:26 pm
Location: AMEC

Re: Inlet test case blowing-up

#6 Unread post by mmahone »

Good morning to both of you,
@Mark: yes the choice of bed thickness was random, as I had no idea where to start to debug this by the time I posted this message.
@John: I have been using gfortran for compiling ROMS+SWAN, and mpich2.
I have noticed however that I haven't acitved this option enable-fast when compiling mpich2, and I was wondering how sensitive the MCT library would be to this mpi option ?

Any other feedback is most welcome.
Many many thanks for replying to both of you.

jcwarner
Posts: 1172
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Inlet test case blowing-up

#7 Unread post by jcwarner »

i dont have access to gfortran. one issue that I have had lately with ifort is the need to use /fp:precise /fp:source to ensure that the compiler maintains the accuracy level of the variables during computations. Sometimes, it seemed, that ifort would like to make the code faster by not maintaining the floating point accuracy during math. i am not sure if this is even a related issue for gfortran, but you may want to look at the flags used for the code. i dont think it will matter much for mct, as that is just transferring data (inlet test is not using the sparse matrix interpolator).

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Inlet test case blowing-up

#8 Unread post by m.hadfield »

I have carried out a couple more runs of INLET_TEST:
  • Rebuilt it with the PGI compiler (pgf90 7.1-6 64-bit target on x86-64 Linux -tp k8-64e): it crashes at the approximately the same point as the Gfortran run.
  • Halved DT (and also NFAST since it's nowhere near the barotropic CFL limit): it crashes at the same simulated time (time step ~ 38000).
I suspect a model set-up issue rather than a compiler issue. It is developing fine-scale velocity structure in one of the areas of strongest flow. A bit more damping should keep it in check.

I am using this script to build the model

https://www.myroms.org/svn/src/test/inl ... build.bash (revision 562)

and the input files in the same directory to run it.

Are you building & running it in exactly the same way, John?

mmahone
Posts: 4
Joined: Fri Feb 11, 2011 2:26 pm
Location: AMEC

Re: Inlet test case blowing-up

#9 Unread post by mmahone »

Hello:
So I have worked on the options for compiling with gfortran. As a first step I have activated 'export USE_DEBUG=on' in the build.bash. This has inactivated these options -O3 and -ffast-math in the CYGWIN-gfortran.mk, and the inlet test case has run to the end (34560 time-steps)...like a charm!
I am going to push the number of time-steps a bit to know if this deal is solid enough, before i dig further in these options for compiling with gfortran.
Many thanks again to both of you for all this rightful feedback.
Cheers,
maud

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Inlet test case blowing-up

#10 Unread post by m.hadfield »

I have repeated mmahone's experiment of turning off optimisation with Gfortran (via USE_DEBUG=on). I too find that the INLET_TEST case runs to completion.

However I can also avoid a blow-up without turning off optimisation by increasing VISC2 from 1.0E-3 to 1.0E-2. This has very little effect on the flow overall, but does slightly reduce the grid-scale velocity structure at the neck of the inlet.

So I think this is more a model set-up issue than a compiler issue.

jcwarner
Posts: 1172
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Inlet test case blowing-up

#11 Unread post by jcwarner »

thanks for continuing to look at this, but just because you change the viscosity does not mean the compiler issue is not still valid. In my opinion, you should not get a different answer if you use debug vs optimized. But you do, depending on the optimized flags chosen. For optimized mode, if you select /fp:precise and fp:/source (for ifort, not sure about gfortran) then you should get the 'same' answer as debug mode.

By changing the viscosity, you changed the problem. Kobayashi Maru.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Inlet test case blowing-up

#12 Unread post by m.hadfield »

jcwarner wrote:thanks for continuing to look at this, but just because you change the viscosity does not mean the compiler issue is not still valid. In my opinion, you should not get a different answer if you use debug vs optimized. But you do, depending on the optimized flags chosen. For optimized mode, if you select /fp:precise and fp:/source (for ifort, not sure about gfortran) then you should get the 'same' answer as debug mode.

By changing the viscosity, you changed the problem. Kobayashi Maru.
You're right.

The culprit is almost certainly the -ffast-math option, which is described by the GCC manual thus:

Code: Select all

-ffast-math
    Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range.

    This option causes the preprocessor macro __FAST_MATH__ to be defined.

    This option is not turned on by any -O option besides -Ofast since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. 
With this option removed (leaving -O3), the inlet_test case runs to completion.

jcwarner
Posts: 1172
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Inlet test case blowing-up

#13 Unread post by jcwarner »

good. I am glad it worked for you.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Inlet test case blowing-up

#14 Unread post by m.hadfield »

m.hadfield wrote:You're right.
Perhaps I spoke too soon.

The inlet_test case, as implemented in the ROMS test suite

https://www.myroms.org/svn/src/test/inlet_test/

(with preprocessor options determined by a combination of build.bash and inlet_test.h) is characterised by large values of Cw at grid point (43,35,7) beginning at around 18500 time steps:
Cw_inlet_test_nearshore_mellor_05.png
Cw_inlet_test_nearshore_mellor_05.png (9.49 KiB) Viewed 8123 times
This leads (I think) to a tendency to blow up with large negative temperatures in this area. Whether or not the blow-up actually occurs is affected by changes in the compiler, the compiler options, the viscosity formulation and the viscosity values, but the high Cw values are there on every run.

When the inlet_test case is compiled using the source code on the trunk, with preprocessor options determined by the standard configuration file

https://www.myroms.org/svn/src/trunk/RO ... let_test.h

and run with the same input files it does not develop large Cw values and it does not blow up (in the tests I've run so far). Here is the plot of Cw
Cw_inlet_test_nearshore_mellor_08.png
Cw_inlet_test_nearshore_mellor_08.png (14.38 KiB) Viewed 8123 times
Examining the output in the two case shows that the only difference in preprocessor options is in the nearshore radiation stress formulation (as you may have guessed from the names I've given those graphics files). In the test branch this is NEARSHORE_MELLOR05 and in the trunk it is NEARSHORE_MELLOR08. I'm not familiar with these parameterisations (I haven't read either of the papers yet) but I am tempted to conclude that NEARSHORE_MELLOR08 is more stable.

jcwarner
Posts: 1172
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Inlet test case blowing-up

#15 Unread post by jcwarner »

I did not know that Mellor 08 was now distributed on Rutgers site. I am not supporting this on the Rutgers site, and dont really appreciate my work being distributed without my knowledge. I DO NOT approve the way that there is a radiation_stress.F subroutine that takes different .h files. I am not going to support different versions of the Mellor method on the Rutgers site. I have a word in with "the management."

Post Reply