Amarel

From WikiROMS
Jump to navigationJump to search
Using Amarel

Getting Started

This wiki page is a brief "Getting Started" introduction to running the ROMS ocean model, and analyzing model output, on the Rutgers Office of Advanced Research Computing (OARC) cluster computer Amarel. There is a comprehensive user guide at https://sites.google.com/view/cluster-user-guide to which you can refer for more detailed information.

Rutgers VPN

If you are not on campus, you will need to connect to the Rutgers VPN. To use Rutgers VPN, you must first enroll in Duo 2 Factor Authentication (2FA) (NetID+). Most users have probably already done this. If not, navigate to https://netid.rutgers.edu/setupTwoFactorAuthentication.htm and follow the instructions.

Once you are enrolled in NetID+ you will need to activate the VPN service, if you have not already. Navigate to https://soc.rutgers.edu/vpn/ and click the gray button titled Service Activation. If you are not already logged in you may be asked to login and/or approve your login with the Duo Mobile app. To activate the VPN service, click the checkbox next to Remote Access VPN, Cisco AnyConnect Access for Rutgers and click the Activate Services button.

You are now ready to download the Cisco VPN client and connect to the Rutgers VPN. Complete instructions can be found here by clicking the red button titled General VPN Instructions to download the PDF. The instructions are geared towards Windows so if you are using a Mac you might find this page more helpful. In most cases, regardless of your operating system, pointing your browser to https://vpn.rutgers.edu/ and logging in with your NetID will lead you to downloading the correct VPN client.

Once installed, open the Cisco AnyConnect client and type vpn.rutgers.edu in the box and click Connect:

AnyConnect Connect.png

In the next window, the Username will be your NetID, the password field for either will be your NetID Password and for 2FA, you have 4 options to enter in the Second Password/Duo Action field:

  1. Enter a 6 digit Duo Passcode. These are generated either by a Hard Token, showing the passcode in the Duo Mobile App, or from a previous “SMS” request. Simply type in the 6 numbers and hit OK.
  2. Type the word “push”. This will send a push notification to the primary device you have enrolled with Duo through NetID+ with the option to Accept or Deny.
  3. Type the word “phone”. You will receive a phone call to the primary device you have enrolled with Duo through NetID+ with touch tone options to Accept or Deny.
  4. Type the word “sms”. You will receive a text message to the primary device you have enrolled with Duo through NetID+ containing passcodes you can use to logon.

AnyConnect Credentials.png

Click OK and you should be connected to Rutgers VPN. You should have a small AnyConnect icon with a lock on it in the task tray (Windows AnyConnect win connected.jpg) or in the menu bar (Mac OS AnyConnect mac connected.png).

Connecting to Amarel with SSH

In order to connect to Amarel and compile and run ROMS you will need an SSH/terminal client. If you already use an SSH/terminal application you are comfortable with then stick with that and adapt the instructions below accordingly. For these instructions to work you need to either be on campus or connected to the Rutgers VPN.

Mac OS

If you are on a Mac, open Terminal (found in the Applications -> Utilities folder) or iTerm2 and type:

ssh fakeuser@amarel.rutgers.edu

replacing fakeuser with your NetID. When it asks if you want continue connecting enter yes, then enter your NetID password.

Having to type the above command and netid password every time you login is moderately tedious, but this username/password authentication can become very annoying and time consuming when repeatedly using the scp command to copy files to Amarel. Passwordless access and file transfer can be enabled using SSH keys by following the instructions below.

  1. If you are still connected to Amarel, disconnect by entering exit
  2. At the prompt for your local machine, enter nano ~/.ssh/config and paste the block below at the end of the file:
    Host amarel
    Hostname amarel.rutgers.edu
    HostKeyAlias amarel
    User [NetID]
    replacing [NetID] with your own NetID. Save and exit (Ctrl+o Ctrl+x). This allows you to type ssh amarel to connect.
  3. Enter ls ~/.ssh. If there is a file listed called id_rsa.pub skip to 5
  4. Enter ssh-keygen -t rsa, then hit return to accept the default location, enter a passphase twice (or leave blank for no passphrase)
  5. Copy the public portion of the key to Amarel by entering scp ~/.ssh/id_rsa.pub amarel:.
  6. SSH to Amarel (ssh amarel) and check if you already have a .ssh folder:
    [rjdave@amarel1 ~]$ ls -l .ssh
    total 33
    -rw------- 1 rjdave rjdave 1069 Sep 8 2017 authorized_keys
    -rw-r--r-- 1 rjdave rjdave 434 Jun 4 2021 config
    -rw-r--r-- 1 rjdave rjdave 5992 Dec 2 13:47 known_hosts
    You probably won't have all the files above but if you instead get the message ls: cannot access .ssh: No such file or directory, enter the following command: mkdir .ssh && chmod 700 .ssh. This will create the .ssh folder and set the required permissions.
  7. Once you have a .ssh folder with the proper permissions, enter the following cat ~/id_rsa.pub >> ~/.ssh/authorized_keys && chmod 640 ~/.ssh/authorized_keys then exit back to your local machine and execute ssh amarel again. It should no longer ask for your password.

Windows

For Windows, we recommend MobaXterm installer edition.

  1. Once installed open it, choose light or dark theme and click the Session button in the upper left.
  2. Choose SSH and enter amarel.rutgers.edu for Remote host, check the Specify username box and enter your NetID in the box and click OK.
  3. You will be asked to type your NetID password and then asked if you want to save the password. If you choose yes you will be asked to set a master password to encrypt all your saved passwords.
  4. In the future you can click Sessions (not Session) and select amarel.rutgers.edu and not have to enter your password.

Setting up your .bashrc

The default software setup by OARC on Amarel does not include everything needed to compile and run ROMS. Ocean Modeling Group computing specialist David Robertson has configured, and maintains and updates, a repository of what you need for ROMS and other models that will be automatically loaded for every login session once you add a couple lines to the login script (.bashrc, .cshrc, etc.) in your home directory. Log in to Amarel with Terminal/iTerm2 or MobaXterm and edit your login script as described below.

Unless otherwise requested your default shell will be bash and the following three lines (shown in red) should be added (with nano or your preferred editor) near the top of your .bashrc after the import of the global definitions as shown:

# Source global definitions

if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

ulimit -s unlimited

export MODULEPATH=/projects/dmcs_1/sw/modulefiles/Core:${MODULEPATH}
export SQUEUE_FORMAT=“%.18i %.9P %.8j %.8u %.2t %.10M %.10l %.4C %.6D %R”

The first line allows you to take full advantage of computing resources (may be obsolete now but it won't hurt anything). The first export line will tell Lmod where to find the modules that will load the custom software. The second export line makes the squeue command (explained later) more useful. Once you have added these lines, log out and back in for them to take effect.

Checking Out the ROMS Source Code

Register as a ROMS user

If you are not already a ROMS user, you will need to fill out the registration form (found here) and wait for approval. Once approved, you will be able to checkout the ROMS source code. There is a git repository available but Subversion (SVN) is our recommended way to obtain the ROMS source code. If you prefer git, the initial process is a little more involved but you can follow the directions found on the WikiROMS git page.

Checking out the ROMS source code

There is nothing special about checking out the ROMS source code on Amarel. The same svn commands you’re used to will work on Amarel. However, the first time you check the code out you will need to use the ‘--username’ flag unless your NetID matches you ROMS username. Navigate to the directory where you want your ROMS source code to reside and execute this svn checkout command:

svn --username <user> co https://www.myroms.org/svn/src/trunk <my_src_dir>

replacing <user> with your ROMS username and <my_src_dir> with what you want the source code directory to be called. After typing your password it will ask you if you want to store your password. We recommend answering yes but it’s up to you. If you answer no you will have to type your ROMS password every time you do an svn checkout or svn update. After your first checkout you will no longer need the --username flag for svn operations to any of the myroms.org subversion repositories.

Loading and Unloading Modules

Like many clusters, Amarel uses environment modules to load and unload software and configure the environment. Some commands you will find useful are:

module help Display help message
module help <m1> Show help information for module <m1>
module available Show all modules currently available
module whatis <m1> Show brief information about module <m1>
module spider List all possible modules even if not currently available
module list List currently loaded modules
module load Load the specified module(s)
module unload Unload the specified module(s)
module swap <m1> <m2> Unload <m1> and then load <m2> (for switching versions of the same software)
module purge Unload all loaded modules

Loading the ROMS Module

Setting up your environment to compile and run ROMS is as simple as loading the roms module.

module load roms
module list
Currently Loaded Modules:
1) intel/17.0.4 2) mvapich2/2.2 3) mct-roms/2.6.0 4) netcdf/4.6.2
5) esmf/8.0.0_nc4 6) parpack-roms/2.1 7) hdf5/1.10.4 8) roms/intel_nc4

Notice that loading the roms module will actually load 7 other modules.

This will setup your environment to use the Intel compiler so remember to set FORT to ifort in your build script.

Configuring and Compiling ROMS

Using the build script is the recommended method for compiling ROMS on Amarel. Some of the modules that load with the roms module also set environment variables that help simplify your roms build script. Starting from the latest build_roms.sh you will likely only need to change ROMS_APPLICATION and MY_ROMS_SRC and make sure FORT is set to ifort.

Warning It is important to make sure USE_MY_LIBS is set to no or your compilation will fail.

Remembering the name you gave above for <my_src_dir>, you will find latest build_roms.sh script in subdirectory ROMS/Bin. Copy that script to the directory you will work in to run ROMS.

Configure build_roms.sh by setting the line MY_ROMS_SRC to point to your choice for <my_src_dir>.

If this is your first time working with ROMS, a good starting place is to compile the default UPWELLING test case that is indicated by the build_roms.sh setting ROMS_APPLICATION=UPWELLING.

Copy to your working directory the file upwelling.h from subdirectory ROMS/Include of your source code.

Once your build script is configured and you have upwelling.h in your working directory you can compile ROMS by typing:

./build_roms.sh -j 4

where the number after -j indicates the number of compute cores to use in parallel to execute the compilation. The greater the number, the faster it goes.

However, the login node you will be compiling on is shared for the entire Amarel system. If you use a number larger than 4, or omit it altogether (which says use all cores on the login node) your build might be terminated by an administrator. Be a considerate user and keep the number low.

If compilation was successful, there will be a file named romsM that is the model executable.

Running on the Amarel Compute Nodes

When you ssh to Amarel you are connected to one of the login nodes. These nodes are to be used for file editing, transfers output and input files to/from your local computer, and modest code compiling tasks and analysis, but not for compute intensive tasks. Here we explain how to connect to the compute nodes for these larger tasks.

Using the SLURM batch job scheduler

Amarel uses SLURM workload manager to schedule compute intensive tasks. The user configures a SLURM job script for each model run and submits this script with the sbatch command. The job script declares the resources required, such as number of CPUs for parallel jobs, maximum memory required, etc.

We have configured a simple template job script (for the ROMS UPWELLING example) that you can copy from /projects/dmcs_1/courses/job.sh into the directory that you will work from. The contents are shown below:

#!/bin/bash
#SBATCH --partition=dmcs_1 # Partition (job queue)
#SBATCH --job-name=upwelling # Assign a short name to your job
#SBATCH --nodes=1 # Number of nodes you require
#SBATCH --ntasks=4 # Total number of tasks you'll launch
#SBATCH --ntasks-per-node=4 # Number of tasks you'll launch on each node
#SBATCH --cpus-per-task=1 # Cores per task (>1 if multithread tasks)
#SBATCH --mem=6400 # Real memory (RAM) required (MB) per node
#SBATCH --time=00-00:05 # Total run time limit (DD-HH:MM)
#SBATCH --output=out.%N.%j.log # STDOUT output file
#SBATCH --error=err.%N.%j.log # STDERR output file (optional but recommended)
#SBATCH --export=ALL # Export you current env to the job env

## It is important to have --mpi=pmi2 here or ROMS will not run
srun --mpi=pmi2 ./romsM roms_upwelling.in

Running the ROMS Upwelling Example

You may have noticed that the srun command above includes a file named roms_upwelling.in. You will need to copy this file and varinfo.dat from the ROMS/External directory of your <my_src_dir> ROMS source code to the directory where you compiled ROMS.

After you copy the files, you will need to make a couple of small edits to roms_upwelling.in to get this to work.

  1. Change the line with VARNAME to read VARNAME = varinfo.dat (i.e. delete the ROMS/External part)
  2. Set NtileI == 2 and NtileJ == 2 to 2.

The product of NtileI and NtileJ, i.e. 2 x 2 = 4, is the number of cores the model will run on in parallel. This number must match the number in the SLURM job script at

Now you can submit your job to sbatch with the command sbatch job.sh

Checking Job Status with squeue

Detailed documentation for monitoring your SLURM jobs can be found here. The easiest way to check whether your job is running is with the command squeue -p dmcs_1 command. You should see output something like this:

JOBID PARTITION NAME USER ST TIME TIME_LIMIT CPUS NODES NODELIST(REASON)
17197219 dmcs_1 watl_psas rjdave R 3-04:07:00 3-20:00:00 4 1 hal0035
17146609 dmcs_1 upwelling rjdave R 00:01:10 00:05:00 4 1 hal0035

You can see that the job is running (the ‘R’ under ‘ST’), has been running for 1 minute, 10 seconds, and is running on node 35.

Checking Progress with tail

You can check ROMS progress by using the tail command on the output file. For the upwelling job above the file would be called out.hal0035.17146609.log so the command tail out.hal0035.17146609.log will show you the most recent 10 lines written to the output log and will most likely tell you what timestep the model is on.

Cancelling a Job

To safest way cancel a queued or running job is to use it’s jobid. A job can be canceled by name but is not recommended. To cancel the upwelling job from above you would issue the command scancel 17146609. If the job has not yet started, it will be removed from the queue. If it is running all child processes will be killed and the job will be removed from the queue. You will only be able cancel jobs that you own.


Running an interactive session on a compute node

It is possible to conduct your work interactively on one of the compute nodes (instead of the login node). For most work we will be doing in class this is not necessary, but if for some reason you have a job that needs many processors or a large amount of memory, and you want to run it briefly to make sure it starts, there are instructions in the Cluster User Guide here.

For compute intensive interactive work, such as model analysis using Python or MATLAB, we recommend using the OnDemand interface to launch an interactive session on a set of compute nodes.

Using OnDemand to launch a Personal Jupyter Notebook

To plot model output you can use Matlab or Python through the Rutgers OnDemand system. Navigate your browser to https://ondemand.hpc.rutgers.edu/pun/sys/dashboard and login with your NetID and password. At the top of the page click My Interactive Sessions.

For Matlab:

  1. Select the MATLAB option in the left column choose your time, cores, memory, MATLAB version, and enter dmcs_1 in partition.
  2. Click Launch and wait. This can take a couple minutes.
  3. Once the Launch noVNC in New Tab button appears, click it and a MatLab GUI will open.

For Python:

  1. Select Personal Jupyter and choose time, cores, memory, and enter dmcs_1 in partition. Leave Reservation and Slurm feature blank, enter /projects/dmcs_1/miniconda3 for conda path, and /projects/dmcs_1/sw/packages/xroms/py38 for conda environment.
  2. Click Launch and wait. This can take a couple minutes.
  3. Once the Connect to Jupyter, Anaconda version 5.1.0 button appears, click it and wait again.
  4. Near the top-right click New -> Python 3 (ipykernel)
  5. Enter import xroms then hold down the shift key and hit the return/enter key. You will see an asterisk [*] in the square brackets.
  6. Once that asterisk changes to a 1, the xroms python module has been loaded.