two solutions for sst with the same configuration

Message

xiaocongM · #1 Unread post by **xiaocongM** » Sat Oct 14, 2023 2:36 pm

Dear ROMS users,
I run my application on different platforms. Since I using the same configurations(cpp, initial, boundary, forcing), I expect the solutions to be the same or at least not far from each other. However, I got two very different solutions. The only difference between the two cases is the number of tiles. I give the relevant information in the following pictures. I would be very appreciate if anyone could give me some guidance.
the first figure is SST, the second is information about the platforms.

pmaccc · #2 Unread post by **pmaccc** » Sun Oct 15, 2023 10:52 pm

The image on the right looks like it is having problems getting open boundary information into the domain. Are you using exactly the same open boundary conditions in the dot-in, and the same values in the boundary file? Also if you are using nudging to climatology, is the climatology the same, and is the file of nudging timescales the same?

ckharris · #3 Unread post by **ckharris** » Mon Oct 16, 2023 5:11 pm

I had a similar problem several years ago where we got different solutions depending on how many tiles we used. It turned out to be a bug in the compiler that we were using at the time. The differences went away when we did not use the optimization in the compiler command (the -O3) but then the model ran very slowly. For us: we switched compilers and that fixed the problem.

In your case:
The two "compiler commands" are different, one is mpif90 and the other is mpiifort. Have you tried testing the tiling but using the same compiled code?

Good luck.

xiaocongM · Wed Oct 18, 2023 12:41 pm

Thanks for your promptly reply! Yes, I have checked all the files you referred, and find out that they are the same.

pmaccc wrote: Sun Oct 15, 2023 10:52 pm The image on the right looks like it is having problems getting open boundary information into the domain. Are you using exactly the same open boundary conditions in the dot-in, and the same values in the boundary file? Also if you are using nudging to climatology, is the climatology the same, and is the file of nudging timescales the same?

Thanks for your kindly reply. Yes, I had run the same case with different number of tiles, but the issue still exist. A teacher told me that the nodes which is available to many people may be in trouble. I will try private nodes to see if it will be fine.

ckharris wrote: Mon Oct 16, 2023 5:11 pm I had a similar problem several years ago where we got different solutions depending on how many tiles we used. It turned out to be a bug in the compiler that we were using at the time. The differences went away when we did not use the optimization in the compiler command (the -O3) but then the model ran very slowly. For us: we switched compilers and that fixed the problem.

In your case:
The two "compiler commands" are different, one is mpif90 and the other is mpiifort. Have you tried testing the tiling but using the same compiled code?
Good luck.

jivica · #5 Unread post by **jivica** » Thu Oct 19, 2023 8:21 am

It is not the problem of running on the different nodes on the cluster.
They are working or crashing the model in case memory problem etc.

You need to recompile your model with less aggressive optimization flags (i.e. do not use -O3) and then try again with the exactly the same inputs.
Different compilers (gfortran, ifort, cray) have different flags, so be careful about mix and match.

For example, I had a problem with compiling ARPACK/PARPACK and using -O3 flag (needed when you run 4D-VAR).

It is still striking to me that the difference is so large after not so long simulation.

Cheers,
Ivica

xiaocongM · #6 Unread post by **xiaocongM** » Thu Jun 13, 2024 3:10 am

Thanks for your kindly guidance. I tried several cpp flags, but none of them work. I gave up working on this server. Apologize for reply too late.

xiaocongM · #7 Unread post by **xiaocongM** » Thu Jun 13, 2024 3:11 am

Thanks for your kindly guidance. I tried several cpp flags, but none of them work. I gave up working on this server. Apologize for reply too late.

jivica wrote: Thu Oct 19, 2023 8:21 am It is not the problem of running on the different nodes on the cluster.
They are working or crashing the model in case memory problem etc.

You need to recompile your model with less aggressive optimization flags (i.e. do not use -O3) and then try again with the exactly the same inputs.
Different compilers (gfortran, ifort, cray) have different flags, so be careful about mix and match.

For example, I had a problem with compiling ARPACK/PARPACK and using -O3 flag (needed when you run 4D-VAR).

It is still striking to me that the difference is so large after not so long simulation.

Cheers,
Ivica

Ocean Modeling Discussion

two solutions for sst with the same configuration

two solutions for sst with the same configuration

Re: two solutions for sst with the same configuration

Re: two solutions for sst with the same configuration

Re: two solutions for sst with the same configuration

Re: two solutions for sst with the same configuration

Re: two solutions for sst with the same configuration

Re: two solutions for sst with the same configuration