Enabling numerical weather forecasting based on the Arm architecture on the cloud

In the current storm, everyone is of one mind. Weather forecasting is a difficult problem in the technology world, and we have been trying hard, using new cloud technologies to keep trying.

Background Introduction

The Weather Research and Forecasting Model (WRF) is known as the next-generation mesoscale weather forecast model. Many meteorological organizations use WRF for meteorological research and forecasting. Due to the huge amount of geographic information and real-time meteorological data and the complex computing logic, high-performance computing clusters are required as infrastructure. Amazon can provide rich, elastically scalable high-performance computing resources , such as 64-bit Arm Neoverse kernels customized to provide more cost-effective Graviton2 instances for cloud workloads running on Amazon EC2.

Amazon Graviton2 Compared with the first generation Amazon Graviton processor, the Amazon Graviton2 processor has achieved a huge leap in performance and functionality. They all support Amazon EC2 T4g, M6g, C6g, and R6g instances, and their variants with native NVMe-based SSD storage, and these instances are for a variety of workloads compared to the current generation of x86-based instances1 ( including application servers, micro-services, high performance computing, electronic design automation, games, open-source database and cache memory) provide up to 40% cost-effective upgrade .

Leveraging the elasticity of the public cloud can make meteorological research and forecasting both efficient, cost-effective, and available to customers in a more flexible manner.

This article will detail to introduce you to use on Amazon Amazon Graviton2 instance build Amazon ParallelCluster clusters, build WRF, and describes how to parallel computing to do the weather forecast of the entire process by WRF , making it easier to study meteorology in the Amazon of China open and forecast.

📢 To learn more about the latest technology releases and practical innovations of Amazon Cloud Technology, please pay attention to the 2021 Amazon Cloud Technology China Summit held in Shanghai, Beijing and Shenzhen! The first day of Shanghai Station has ended successfully, and there will be more guests tomorrow to bring wonderful sharing, so stay tuned~

Building an Amazon ParallelCluster cluster

Amazon ParallelCluster is an Amazon-backed open source cluster management tool built on the open source CfnCluster project to automatically provision and manage computing resources and shared file systems based on tasks you submit. On Amazon ParallelCluster v2.8.0 or higher, ParallelCluster enables Arm instance support on Ubuntu 18.04 and Amazon Linux 2, allowing us to achieve more cost-effective High Performance Computing (HPC).

In addition, Amazon ParallelCluster supports various batch schedulers, such as Amazon Batch, SGE, Torque, and Slurm. This article will use Slurm as the batch program. ParallelCluster also supports automated integration of Amazon FSx for Lustre Parallel File System and DCV Remote Desktop. The following is the ParallelCluster architecture diagram:

1. Start the springboard instance

Open the Amazon management console, select the EC2 service, and click [Start Instance] to start a new instance. This instance corresponds to the springboard machine in the figure above, which is used for the cluster configuration and management of ParallelCluster. It does not have high performance requirements, and T2.micro or similar models can be used.

2. Install Amazon CLI and ParallelCluster

$ pip3 install awscli -U --user
$ pip3 install aws-parallelcluster -U --user

Before installation, you can use pip-version to check whether pip has been installed. If it is not installed, you can use the following command to install it, see:

https://pip.pypa.io/en/stable/installing

3. Configure IAM Credentials

Open the Amazon console, select the IAM service, select an IAM user with sufficient permissions (such as Administrator permissions), create a new access key, and record the ID and secret access key (Access Key and Secret Key) of the created access key ) to configure access permissions for the instance.

Go back to the control instance, configure IAM Credentials, and fill in the previously generated key ID and secret access key for Access Key ID and Secret Access Key respectively.

$ aws configure
AWS Access Key ID [None]: ABCD***********
AWS Secret Access Key [None]: wJalrX********
Default region name [us-east-1]: cn-northwest-1
Default output format [None]: json

4. Initialize ParallelCluster

Typically, to configure a ParallelCluster cluster, you use the command pcluster configure, then provide the requested information such as Region, Scheduler, and EC2 instance type, and finally generate the ~/.parallelcluster/config configuration file.

Alternatively, you can achieve quick configuration by creating a base configuration file and then customizing the file to include ParallelCluster-specific options.

The following commands generate a new key pair, query the springboard instance's metadata to obtain the subnet ID, VPC ID, and finally generate a configuration file. Additionally, you can edit this configuration file directly to add and change configuration options.

Set the default region Amazon_DEFAULT_REGION

$ export AWS_DEFAULT_REGION=$(curl --silent http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/[a-z]$//')

Generate a new key pair

$ aws ec2 create-key-pair --key-name lab-key --query KeyMaterial --output text > ~/.ssh/lab-key
$ chmod 600 ~/.ssh/lab-key

Get Amazon network information (ParallelCluster cluster will be deployed in this VPC)

$ IFACE=$(curl --silent http://169.254.169.254/latest/meta-data/network/interfaces/macs/)
$ SUBNET_ID=$(curl --silent http://169.254.169.254/latest/meta-data/network/interfaces/macs/${IFACE}/subnet-id)
$ VPC_ID=$(curl --silent http://169.254.169.254/latest/meta-data/network/interfaces/macs/${IFACE}/vpc-id)
$ REGION=$(curl --silent http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/[a-z]$//')

5. Create an initial Amazon ParallelCluster configuration file

$ cat > wrf-on-graviton2.ini << EOF
[aws]
aws_region_name = ${REGION}

[global]
cluster_template = default
update_check = true
sanity_check = true

[cluster default]
key_name = lab-key
base_os = alinux2
vpc_settings = public
ebs_settings = myebs
fsx_settings = myfsx
master_instance_type = c6g.16xlarge
post_install = s3://YOUR_S3_BUCKET_NAME/pcluster_postinstall.sh
s3_read_write_resource = arn:aws-cn:s3:::YOUR_S3_BUCKET_NAME/*
scheduler = slurm
queue_settings = od-queue,spot-queue

[queue od-queue]
compute_resource_settings = c6g-od
compute_type = ondemand
placement_group = DYNAMIC

[compute_resource c6g-od]
instance_type = c6g.16xlarge
min_count = 0
max_count = 16
initial_count = 0

[queue spot-queue]
compute_resource_settings = c6g-spot
compute_type = spot
placement_group = DYNAMIC

[compute_resource c6g-spot]
instance_type = c6g.16xlarge
min_count = 0
max_count = 16
initial_count = 0

[vpc public]
vpc_id = ${VPC_ID}
master_subnet_id = ${SUBNET_ID}

[ebs myebs]
shared_dir = /shared
volume_type = gp2
volume_size = 20

[fsx myfsx]
shared_dir = /fsx
storage_capacity = 1200
deployment_type = SCRATCH_2

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

EOF

6. Edit parallelcluster configuration

View or edit the ParallelCluster configuration with the following commands

$ view wrf-on-graviton2.ini

The script is divided into regions, cluster information configuration, automatic expansion settings, shared data volume settings, VPC and subnet settings and other parts. Focus on the following parameters:

post_install sets the location of the script that runs when the cluster node starts. Modify YOUR_S3_BUCKET_NAME to be your own bucket name. Since WRF cannot benefit from hyperthreading, this script will turn off the hyperthreading of the EC2 instance and run it according to the physical core.
s3_read_write_resource sets the S3 bucket accessed by the cluster. Modify YOUR_S3_BUCKET_NAME to your own bucket name.
compute_instance_type sets the type of computing node. It is recommended to use a larger instance type as much as possible to improve more efficiency in parallel computing scenarios. Here it is set to c6g.16xlarge; the default new account has fewer instance types, it is recommended Open a support case in advance to increase the limit.
The master_instance_type sets the type of the master node. The master node is used to install software and download data and does not participate in parallel computing, so it does not need to be too large. It is set to m5.xlarge here.
The scheduler sets the batch schedule, and slurm is used here.
queue_settings specifies that the cluster uses queues instead of homogeneous compute queues, which allows us to submit tasks using different queues. Only available when scheduler is set to slurm.
[queue] Defines the configuration settings for a single queue, in the configuration file above, I define the queues for On-Demand and Spot.
placement_group defines the cluster placement group.
The FSx for Lustre parallel file system set by fsx_settings and [fsx myfsx], the master and computer nodes will automatically mount fsx after the cluster deployment is completed.
master_subnet_id specifies the ID of the existing subnet in which to provision the master node. Compute nodes use the master node's subnet by default.

Using single-availability zone deployment and using placement groups can reduce the latency of communication between cluster nodes and improve the efficiency of parallel computing, which can be set in this script.

7. Create a ParallelCluster cluster

$ pcluster create wrf-on-graviton2 -c wrf-on-graviton2.ini

Use the command to create the cluster and wait for the cluster creation to complete. If the cluster creation fails, check whether the EC2 limit of the corresponding region is less than the set maximum number of nodes in the cluster. If it is limited by the EC2 limit, you can open a support case to increase the limit, or modify the settings to reduce the maximum number of nodes.

Install WRF Mode System Dependent Components

As can be seen from the following flow chart, the main components of the WRF mode system are:

WRF Preprocessing System（WPS）
OBSGRID
WRF-DA
ARW Solver (WRF calculation main program)
Post-processing and visualization

The following steps contain the installation of WRF main program, WPS, WRFDA, OBSGRID:

First of all, WRF depends on the gfortan compiler and gcc, cpp libraries. On this basis, it depends on the basic library NetCDF and the library MPICH for parallel computing. Before running the WRF task, you also need to pass the WPS (WRF Pre-processing System) Do data preprocessing.

Therefore, in the installation process of WRF, you must first update the dependent compilers and libraries, then install NetCDF and MPICH, then install and compile WRF, and install and compile WPS after setting the directory.

In this experiment, for better performance, we will compile with GCC 10.2. GCC 10 includes a number of new architectural features that provide better performance than older versions of GCC. The figure below shows the performance improvement of GCC 8/9/10 relative to GCC7. (Data from: https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/gcc-10-better-and-faster ).

1. Log in to the master node

Open the Amazon Web Service console, select the EC2 service, find the cluster master node (the default label is Master), and log in with ssh. After logging in, you can see that the Amazon FSx for Lustre high-performance file system has been mounted on /fsx. In this experiment, the source code of the software will be downloaded to /fsx/tools, and the compiled binary file will be stored in /fsx/wrf-arm. Create relevant directories

$ mkdir /fsx/wrf-arm$ mkdir /fsx/tools

For the convenience of installation, we write the relevant variables during the installation process in the file /fsx/wrf-arm/wrf-install.env

$ view /fsx/wrf-arm/wrf-install.env

export DOWNLOAD=/fsx/tools
export WRF_INSTALL=/fsx/wrf-arm
export WRF_DIR=${WRF_INSTALL}/WRF-4.2.2
export GCC_VERSION=10.2.0
export OPENMPI_VERSION=4.1.0
export PATH=${WRF_INSTALL}/gcc-${GCC_VERSION}/bin:$PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/gcc-${GCC_VERSION}/lib64:$LD_LIBRARY_PATH
export CC=gcc
export CXX=g++
export FC=gfortran
export PATH=${WRF_INSTALL}/openmpi-${OPENMPI_VERSION}/bin:$PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/openmpi-${OPENMPI_VERSION}/lib:$LD_LIBRARY_PATH
export CC=mpicc
export CXX=mpic++
export FC=mpifort
export F90=mpifort
export CXX=mpicxx
export FC=mpif90
export F77=mpif90
export F90=mpif90
export CFLAGS="-g -O2 -fPIC"
export CXXFLAGS="-g -O2 -fPIC"
export FFLAGS="-g -fPIC -fallow-argument-mismatch"
export FCFLAGS="-g -fPIC -fallow-argument-mismatch"
export FLDFLAGS="-fPIC"
export F90LDFLAGS="-fPIC"
export LDFLAGS="-fPIC"
export HDF5=${WRF_INSTALL}/hdf5
export PNET=${WRF_INSTALL}/pnetcdf
export ZLIB=${WRF_INSTALL}/zlib
export CPPFLAGS="-I$HDF5/include -I${PNET}/include"
export CFLAGS="-I$HDF5/include -I${PNET}/include"
export CXXFLAGS="-I$HDF5/include -I${PNET}/include"
export FCFLAGS="-I$HDF5/include -I${PNET}/include"
export FFLAGS="-I$HDF5/include -I${PNET}/include"
export LDFLAGS="-I$HDF5/include -I${PNET}/include -L$ZLIB/lib -L$HDF5/lib -L${PNET}/lib"
export NCDIR=${WRF_INSTALL}/netcdf
export LD_LIBRARY_PATH=${NCDIR}/lib:${LD_LIBRARY_PATH}
export CPPFLAGS="-I$HDF5/include -I$NCDIR/include"
export CFLAGS="-I$HDF5/include -I$NCDIR/include"
export CXXFLAGS="-I$HDF5/include -I$NCDIR/include"
export FCFLAGS="-I$HDF5/include -I$NCDIR/include"
export FFLAGS="-I$HDF5/include -I$NCDIR/include"
export LDFLAGS="-L$HDF5/lib -L$NCDIR/lib"
export PHDF5=${WRF_INSTALL}/hdf5
export NETCDF=${WRF_INSTALL}/netcdf
export PNETCDF=${WRF_INSTALL}/pnetcdf
export PATH=${WRF_INSTALL}/netcdf/bin:${PATH}
export PATH=${WRF_INSTALL}/pnetcdf/bin:${PATH}
export PATH=${WRF_INSTALL}/hdf5/bin:${PATH}
export LD_LIBRARY_PATH=${WRF_INSTALL}/netcdf/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/pnetcdf/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/hdf5/lib:$LD_LIBRARY_PATH
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
export NETCDF_classic=1
export F77=mpifort
export FFLAGS="-g -fPIC"
export FCFLAGS="-g -fPIC"
export JASPERLIB=${WRF_INSTALL}/jasper/lib
export JASPERINC=${WRF_INSTALL}/jasper/include

Install GCC 10.2

$ source /fsx/wrf-arm/wrf-install.env
$ wget https://ftp.gnu.org/gnu/gcc/gcc-${GCC_VERSION}/gcc-${GCC_VERSION}.tar.gz
$ tar -xzvf gcc-${GCC_VERSION}.tar.gz
$ cd gcc-${GCC_VERSION}
$ ./contrib/download_prerequisites
$ mkdir obj.gcc-${GCC_VERSION}
$ cd obj.gcc-${GCC_VERSION}
$ ../configure --disable-multilib --enable-languages=c,c++,fortran --prefix=${WRF_INSTALL}/gcc-${GCC_VERSION}
$ make -j $(nproc) && make install

The compilation process takes about 30 minutes.

3. Install OpenMPI

Browser open: https://www.open-mpi.org/

Check the download address of the latest version of open-mpi source code, the following command download and compile OpenMPI4.1.0

$ source /fsx/wrf-arm/wrf-install.env
$ cd $DOWNLOAD
$ wget -N https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz
$ tar -xzvf openmpi-4.1.0.tar.gz
$ cd openmpi-4.1.0
$ mkdir build
$ cd build
$ ../configure --prefix=${WRF_INSTALL}/openmpi-${OPENMPI_VERSION} --enable-mpirun-prefix-by-default
$ make -j$(nproc) && make install

4. Install ZLIB

$ source /fsx/wrf-arm/wrf-install.env
$ cd $DOWNLOAD
$ wget -N http://www.zlib.net/zlib-1.2.11.tar.gz
$ tar -xzvf zlib-1.2.11.tar.gz
$ cd zlib-1.2.11
$ ./configure --prefix=${WRF_INSTALL}/zlib
$ make check && make install

5. Install HDF5

Browser opens: https://www.hdfgroup.org/downloads/hdf5/source-code/

View the download address of the latest version of the HDF5 source code

For example, the download address of is: 161ea13a91a8ae https://www.hdfgroup.org/package/hdf5-1-12-0-tar-gz/?wpdmdl=14582

$ source /fsx/wrf-arm/wrf-install.env
$ cd $DOWNLOAD
$ curl -o hdf5-1.12.0.tar.gz -J -L https://www.hdfgroup.org/package/hdf5-1-12-0-tar-gz/?wpdmdl=14582
$ tar -xzvf hdf5-1.12.0.tar.gz
$ cd hdf5-1.12.0
$ ./configure --prefix=${WRF_INSTALL}/hdf5 --with-zlib=${WRF_INSTALL}/zlib --enable-parallel --enable-shared --enable-hl --enable-fortran
$ make -j$(nproc) && make install

6. Install Parallel-NETCDF

Parallel-NETCDF can be found on the official website:

https://parallel-netcdf.github.io/wiki/Download.html

Download the latest version.

$ source /fsx/wrf-arm/wrf-install.env
$ cd $DOWNLOAD
$ wget -N https://parallel-netcdf.github.io/Release/pnetcdf-1.12.2.tar.gz
$ tar -xzvf pnetcdf-1.12.2.tar.gz
$ cd pnetcdf-1.12.2
$ ./configure --prefix=${WRF_INSTALL}/pnetcdf --enable-fortran --enable-large-file-test --enable-shared
$ make -j$(nproc) && make install

7. Install NETCDF

NETCDF can be found on the official website: https://www.unidata.ucar.edu/downloads/netcdf/

Download the latest version.

7.1 Install NetCDF-C

$ source /fsx/wrf-arm/wrf-install.env
$ cd $DOWNLOAD
$ wget -N https://www.unidata.ucar.edu/downloads/netcdf/ftp/netcdf-c-4.7.4.tar.gz
$ tar -xzvf netcdf-c-4.7.4.tar.gz
$ cd netcdf-c-4.7.4
$ ./configure --prefix=$NCDIR CPPFLAGS="-I$HDF5/include -I$PNET/include" CFLAGS="-DHAVE_STRDUP -O3 -march=armv8.2-a+crypto+fp16+rcpc+dotprod" LDFLAGS="-L$HDF5/lib -L$PNET/lib" --enable-pnetcdf --enable-large-file-tests --enable-largefile  --enable-parallel-tests --enable-shared --enable-netcdf-4  --with-pic --disable-doxygen --disable-dap
$ make -j$(nproc) && make install

7.2 Install NetCDF-F

$ source /fsx/wrf-arm/wrf-install.env
$ wget -N https://www.unidata.ucar.edu/downloads/netcdf/ftp/netcdf-fortran-4.5.3.tar.gz
$ tar -xzvf netcdf-fortran-4.5.3.tar.gz
$ cd netcdf-fortran-4.5.3
$ ./configure --prefix=$NCDIR --disable-static --enable-shared --with-pic --enable-parallel-tests --enable-large-file-tests --enable-largefile
$ make -j$(nproc) && make install

8. Install WRF

8.1 Download WRF

WRF can be downloaded on Github:

https://github.com/wrf-model/WRF/tree/master

$ source /fsx/wrf-arm/wrf-install.env
$ cd ${WRF_INSTALL}
$ curl -o WRF-v4.2.2.zip -J -L https://github.com/wrf-model/WRF/archive/v4.2.2.zip
$ unzip WRF-v4.2.2.zip
$ cd WRF-4.2.2

8.2 Compiling the configuration file

Edit the file arch/configure.defaults using an editor you are familiar with, e.g. view and add the following between the lines #insert new stanza here and #ARCH Fujitsu FX10/FX100...:

###########################################################
#ARCH    Linux aarch64, GCC compiler OpenMPI # serial smpar dmpar dm+sm
#
DESCRIPTION     =      GCC ($SFC/$SCC): Aarch64
DMPARALLEL      =
OMPCPP          =      -fopenmp
OMP             =      -fopenmp
OMPCC           =      -fopenmp
SFC             =      gfortran
SCC             =      gcc
CCOMP           =      gcc
DM_FC           =      mpif90
DM_CC           =      mpicc -DMPI2_SUPPORT
FC              =      CONFIGURE_FC
CC              =      CONFIGURE_CC
LD              =      $(FC)
RWORDSIZE       =      CONFIGURE_RWORDSIZE
PROMOTION       =
ARCH_LOCAL      =
CFLAGS_LOCAL    =      -w -O3 -c
LDFLAGS_LOCAL   =      -fopenmp
FCOPTIM         =      -Ofast -march=armv8.2-a+fp16+rcpc+dotprod+crypto -fopenmp -frecursive -funroll-loops
FCREDUCEDOPT    =      $(FCOPTIM)
FCNOOPT         =      -O0 -fopenmp -frecursive
FCDEBUG         =      -g $(FCNOOPT)
FORMAT_FIXED    =      -ffixed-form -ffixed-line-length-0 -fallow-argument-mismatch -fallow-invalid-boz
FORMAT_FREE     =      -ffree-form -ffree-line-length-0 -fallow-argument-mismatch -fallow-invalid-boz
FCSUFFIX        =
BYTESWAPIO      =      -fconvert=big-endian -frecord-marker=4
FCBASEOPTS      =      -w $(FORMAT_FREE) $(BYTESWAPIO)
MODULE_SRCH_FLAG=      -I$(WRF_SRC_ROOT_DIR)/main
TRADFLAG        =      -traditional-cpp
CPP             =      /lib/cpp CONFIGURE_CPPFLAGS
AR              =      ar
ARFLAGS         =      ru
M4              =      m4 -B 14000
RANLIB          =      ranlib
RLFLAGS         =
CC_TOOLS        =      $(SCC)

8.3 Compilation options

$ ./configure

编译选项选择
8. (dm+sm) GCC (gfortran/gcc): Aarch64

Select Option 1 -Compile for nesting? (1=basic, 2=preset moves, 3=vortex following) [default 1]:

8.4 执行编译

8.4 Execute compilation

$ ./compile -j $(nproc) em_real 2>&1 | tee compile_wrf.out

After the compilation is successful, you can find the EXE file of WRF in WRF-4.2.2/main in the main directory:

main/ndown.exe

main/real.exe

main/tc.exe

main/wrf.exe

9. Install WPS 4.2

9.1 Install Jasper

Download Jasper

$ source /fsx/wrf-arm/wrf-install.env
$ cd $DOWNLOAD
$ wget https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/jasper-1.900.1.tar.gz
$ tar -xzvf jasper-1.900.1.tar.gz

Before compiling, we need to http://git.savannah.gnu.org/gitweb/?p=config.git ;a=blob_plain;f=config.guess;hb=HEAD and overwrite acaux/config.guess in the directory jasper-1.900.1 execute the following command to download and overwrite the original config.guess file

$ cd $DOWNLOAD/jasper-1.900.1
$ wget -N -O acaux/config.guess "http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD"

Install Jasper

$ cd $DOWNLOAD/jasper-1.900.1
$ ./configure --prefix=${WRF_INSTALL}/jasper

$ make -j$(nproc) install

9.2 Install WPS

Download the latest version of WPS https://github.com/wrf-model/WPS/releases/

The following example downloads and installs version 4.2

$ source /fsx/wrf-arm/wrf-install.env
$ cd $DOWNLOAD
$ curl -o WPS-v4.2.tar.gz -J -L https://github.com/wrf-model/WPS/archive/refs/tags/v4.2.tar.gz

$ tar -xzvf WPS-v4.2.tar.gz -C ${WRF_INSTALL}
$ cd ${WRF_INSTALL}/WPS-4.2

Add the following at the top of the file arch/configure.defaults using a text editor

########################################################################################################################
#ARCH Linux aarch64, Arm compiler OpenMPI # serial smpar dmpar dm+sm#
COMPRESSION_LIBS    = CONFIGURE_COMP_L
COMPRESSION_INC     = CONFIGURE_COMP_I
FDEFS               = CONFIGURE_FDEFS
SFC                 = gfortran
SCC                 = gcc
DM_FC               = mpif90
DM_CC               = mpicc
FC                  = CONFIGURE_FC
CC                  = CONFIGURE_CC
LD                  = $(FC)
FFLAGS              = -ffree-form -O -fconvert=big-endian -frecord-marker=4 -ffixed-line-length-0 -fallow-argument-mismatch -fallow-invalid-boz
F77FLAGS            = -ffixed-form -O -fconvert=big-endian -frecord-marker=4 -ffree-line-length-0 -fallow-argument-mismatch -fallow-invalid-boz
FCSUFFIX            =
FNGFLAGS            = $(FFLAGS)
LDFLAGS             =
CFLAGS              =
CPP                 = /usr/bin/cpp -P -traditional
CPPFLAGS            = -D_UNDERSCORE -DBYTESWAP -DLINUX -DIO_NETCDF -DBIT32 -DNO_SIGNAL CONFIGURE_MPI
RANLIB              = ranlib

Compile and install

$ ./configure

Compile option selection

Linux aarch64, Arm compiler OpenMPI (dmpar)

If you encounter the following output prompt, just ignore it.

Your versions of Fortran and NETCDF are not consistent.

Before compiling, we also need to modify the WRF_LIB value in the file configure.wps in the WPS-4.2 directory, and change -L$(NETCDF)/lib -lnetcdf to -L$(NETCDF)/lib -lnetcdff -lnetcdf -lgomp

As follows:

WRF_LIB = -L$(WRF_DIR)/external/io_grib1 -lio_grib1 \
-L$(WRF_DIR)/external/io_grib_share -lio_grib_share \
-L$(WRF_DIR)/external/io_int -lwrfio_int \
-L$(WRF_DIR)/external/io_netcdf -lwrfio_nf \
-L$(NETCDF)/lib -lnetcdff -lnetcdf -lgomp

execute compilation

$ ./compile | tee compile_wps.out

After successful compilation, you can see 3 executable files in the current directory:

geogrid.exe→geogrid/src/geogrid.exe

ungrib.exe→ungrib/src/ungrib.exe

metgrid.exe→metgrid/src/metgrid.exe

10. Install WRFDA

WRFDA is a unified model space data assimilation system that can be global or regional, multi-model, 3/4D-Var. Its composition and the relationship between each composition are shown in the figure.

In versions after 4.0, the compilation and installation of WRFDA can be carried out on the WRF source code.

Extract WRF to directory WRFDA-4.2.2

$ source /fsx/wrf-arm/wrf-install.env
$ cd ${WRF_INSTALL}
$ unzip -d /tmp WRF-v4.2.2.zip
$ mv /tmp/WRF-4.2.2 ${WRF_INSTALL}/WRFDA-4.2.2
$ cd ${WRF_INSTALL}/WRFDA-4.2.2

compile

$ ./configure wrfda

Compile option selection

3.（dmpar）

execute compilation

$ ./compile all_wrfvar 2>&1 | tee compile_wrfda.out

After the compilation is successful, you can find the EXE file da_wrfvar.exe of WRFDA in the ${WRF_INSTALL}/WRFDA-4.2.2/var/build/ directory

11. Install OBSGRID

Download OBSGRID

OBSGRID can be downloaded on Github:

https://github.com/wrf-model/OBSGRID

$ source /fsx/wrf-arm/wrf-install.env
$ cd ${WRF_INSTALL}
$ git clone https://github.com/wrf-model/OBSGRID
$ cd ${WRF_INSTALL}/OBSGRID

Build the configuration file using an editor you are familiar with, for example view edit the file arch/configure.defaults and add the following at the top:

###########################################################
#ARCH Linux aarch64,  gfortran compiler
#
FC              =       gfortran
FFLAGS          = -ffree-form -O -fconvert=big-endian -frecord-marker=4 -ffixed-line-length-0 -fallow-argument-mismatch -fallow-invalid-boz
F77FLAGS        = -ffixed-form -O -fconvert=big-endian -frecord-marker=4 -ffree-line-length-0 -fallow-argument-mismatch -fallow-invalid-boz
FNGFLAGS        =       $(FFLAGS)
LDFLAGS         =
CC              =       gcc
CFLAGS          =
CPP             =       /usr/bin/cpp -P -traditional
CPPFLAGS        = -D_UNDERSCORE -DBYTESWAP -DLINUX -DIO_NETCDF -DBIT32 -DNO_SIGNAL

compile

$ ./configure

choose

Select 1. Linux aarch64, gfortran compiler

Modify the value of NETCDF_LIBS in the file configure.oa

NETCDF_LIBS = -L${NETCDF}/lib -lnetcdff -lnetcdf

execute compilation

$ ./compile 2>&1 | tee -a compile_oa.out

There is currently a lack of NCAR Graphics Library, so you will receive a failure message when compiling the program for plot (but does not affect the compilation of OBSGRID)

gfortran-o plot_soundings.exe plot_soundings.o module_mapinfo.o module_report.o module_skewt.o date_pack_module.o -L/lib -lncarg -lncarg_gks -lncarg_c -lX11 -lm -lcairo -L/fsx/wrf-arm/netcdf/lib -lnetcdff -lnetcdf -I/fsx/wrf-arm/netcdf/include/usr/bin/ld:cannot find-lncarg/usr/bin/ld:cannot find-lncarg_gks/usr/bin/ld:cannot find-lncarg_c/usr/bin/ld:cannot find-lcairocollect2:error:ld returned 1 exit status

After the compilation is successful, you can find the EXE file obsgrid.exe of OBSGRID in ${WRF_INSTALL}/OBSGRID in the current directory of OBSGRID

WPS data preprocessing and WRF parallel computing

Before the WRF task runs, data needs to be prepared and preprocessed. The data includes static geographic data and real-time meteorological data, which can be obtained from the official website of NCEP; then use WPS geogrid, ungrib, and metgrid to preprocess the data to generate corresponding files. , and then you can execute the WRF task, as shown in the following diagram:

Download static geographic data

Create a new folder data in the /fsx directory and download it to it. You can get it from the official website: http://www2.mmm.ucar.edu/wrf/users/download/get_sources_wps_geog.html

$ cd /fsx
$ mkdir data
$ cd data
$  wget https://www2.mmm.ucar.edu/wrf/src/wps_files/geog_high_res_mandatory.tar.gz

Then decompress the static geographic data, and cancel the tar file, the 2.6G file will eventually become a 29G file. The file is large and it will take a while. The unzipped folder name is WPS_GEOG

$ gunzip geog_high_res_mandatory.tar.gz
$ tar -xf geog_high_res_mandatory.tar

Then modify the &geogrid section in the namelist.wps file to provide the static file directory to the geogrid program.

$ cd /fsx/wrf-arm/WPS-4.2
$ view namelist.wps

geog_data_path ='/fsx/data/WPS_GEOG/'

2. Download real-time weather data

Real-time weather data is available from the official website: ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod

Create a directory weather_data in the /fsx/data directory and download the real-time data to weather_data. In this example, three data of f000, f006, and f012 on June 22, 2021 are downloaded as test data. You can select other real-time data for testing according to your needs.

$ cd /fsx/data
$ mkdir weather_data
$ cd weather_data
$ wget ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20210622/00/atmos/gfs.t00z.pgrb2.0p25.f000
$ mv gfs.t00z.pgrb2.0p25.f000 GFS_00h
$ wget ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20210622/00/atmos/gfs.t00z.pgrb2.0p25.f006
$ mv gfs.t00z.pgrb2.0p25.f006 GFS_06h
$ wget ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20210622/00/atmos/gfs.t00z.pgrb2.0p25.f012
$ mv gfs.t00z.pgrb2.0p25.f012 GFS_12h

3. Run geogrid

Go to the WPS directory and run geogrid

$ cd /fsx/wrf-arm/WPS-4.2
$ ./geogrid.exe>＆log.geogrid

The success of this step is the creation of geo_em.* files, in this case geo_em.d01.nc and geo_em.d02.nc

4. Run ungrib

Run ungrib, first modify the link to the correct location of GFS and Vtables

$ ./link_grib.csh /fsx/data/weather_data/
$ ln -sf ungrib/Variable_Tables/Vtable.GFS Vtable

Then modify the start_date and end_date of the namelist.wps file to match the real-time data

start_date = '2021-06-22_00:00:00','2021-06-22_00:00:00',
end_date   = '2021-06-22_12:00:00','2021-06-22_12:00:00'

then run ungrib

$ ./ungrib.exe

The sign of successful operation of this step is the creation of FILE:* files, in this case FILE:2021-06-22_00, FILE:2021-06-22_06, FILE:2021-06-22_12

5. Run metgrid

$ ./metgrid.exe>＆log.metgrid

The sign of successful operation of this step is that the met_em* file is created

Copy data to WRF working directory

Go to the WRF directory and copy the met_em.* files to the working directory

$ cd ${WRF_INSTALL}/WRF-4.2.2
$ cp ${WRF_INSTALL}/WPS-4.2/met_em* ${WRF_INSTALL}/WRF-4.2.2/run/

Modify the namelist.input file

Modify the start and end times in the namelist.input file, set the three items in each line to the same time, and the start and end times are consistent with the real-time data; modify the num_metgrid_levels parameter to 34, which is consistent with the real-time data.

start_year                          = 2021, 2021, 2021,
start_month                         = 06,   06,   06,
start_day                           = 22,   22,   22,
start_hour                          = 00,   00,   00,
end_year                            = 2021, 2021, 2021,
end_month                           = 06,   06,   06,
end_day                             = 22,   22,   22,
end_hour                            = 12,   12,   12,

num_metgrid_levels                  = 34,

8. Run the initializer real

$ mpirun -np 1 ./real.exe

Check the output file to make sure the run was successful, you will see the wrfbdy_d01 and wrfinput_d0* files for each domain. If there is an error, modify the parameters in the namelist.input file according to the prompts in the file.

$ tail rsl.error.0000

9. Run WRF

You can modify the np parameter by yourself, but it must be smaller than the number of physical cores of the master node instance.

$ mpirun -np 8 ./wrf.exe

The sign of successful operation is that there is SUCCESS in the rsl.out.0000 file, and the wrfout* file is generated.

submits WRF parallel computing task

1. Download the test dataset

$ cd /fsx/data
$ wget https://dcn1tgfn79vvj.cloudfront.net/conus_2.5km_v4.tar.gz
$ tar -xzvf conus_2.5km_v4.tar.gz

In order to facilitate the execution of the wrf.exe test, soft link wrf.exe to the data directory

$ ln -s ${WRF_INSTALL}/WRF-4.2.2/main/wrf.exe /fsx/data/conus_2.5km_v4/wrf.exe

2. Write and save the test script

$ vi wrf.sbatch

#!/bin/bash
#SBATCH --wait-all-nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=8
#SBATCH --nodes=2
#SBATCH --ntasks-per-core=1
#SBATCH --export=ALL
#SBATCH --partition=od-queue
#SBATCH --exclusive
#SBATCH -o /fsx/slurm.out

#ENV VARIABLES#

#---------------------Run-time env-----------------------------------------
ulimit -s unlimited

export OMP_STACKSIZE=12G
export OMP_NUM_THREADS=8
export KMP_AFFINITY=scatter,verbose

#WRF ENV
export WRF_INSTALL=/fsx/wrf-arm
export GCC_VERSION=10.2.0
export OPENMPI_VERSION=4.1.0
export PATH=${WRF_INSTALL}/gcc-${GCC_VERSION}/bin:$PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/gcc-${GCC_VERSION}/lib64:$LD_LIBRARY_PATH
export PATH=${WRF_INSTALL}/openmpi-${OPENMPI_VERSION}/bin:$PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/openmpi-${OPENMPI_VERSION}/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/netcdf/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/pnetcdf/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${WRF_INSTALL}/hdf5/lib:$LD_LIBRARY_PATH
#--------------------------------------------------------------------------

echo "Running WRF on $(date)"
cd /fsx/data/conus_2.5km_v4/
mpirun --report-bindings ./wrf.exe &>> wrf.out
echo nstasks=$SLURM_NTASKS
date -u +%Y-%m-%d_%H:%M:%S >> wrf.times

3. Submit the assignment

Submit your job with the following command on the master node:

$ sbatch wrf.sbatch

Use the command squeue to check the status of the queue. The job will be marked as pending (PD state) first because the resource is being created (or in the down/drained state). If you check the EC2 dashboard, you should see the node starting up.

$ squeue

You can also use the sinfo command to check the number of available nodes in the cluster.

$ sinfo

You can also use the scontrol command to view detailed job information.

$ scontrol show jobid -dd

The WRF calculation results are saved in the directory dataset directory /fsx/data/conus_2.5km_v4 by default.

4. View the WRF running process and output results

$ cd /fsx/data/conus_2.5km_v4/$ tail -f rsl.out.0000

After the run is complete, view the output result file:

1$ ls -lh wrfout*
-rw-rw-r-- 1 ec2-user ec2-user 2.1G Apr 1 14:01 wrfout_d01_2018-06-17_00:00:00

Use software such as ncview or Panoply to view the output variables and visual display of the result file. The following is the graph of the 10m wind speed after three hours of simulation:

ncview

https://cirrus.ucsd.edu/ncview/

Panoply

https://www.giss.nasa.gov/tools/panoply/

After the task is submitted, ParallelCluster will automatically start the computing instance according to the task requirements, add it to the cluster, and execute the task in parallel; after the task is completed, if there is no task running on the computing node for a period of time, ParallelCluster will terminate the computing node to save costs.

Summary

To sum up, the whole process components of the WRF mode system can run on Amazon Graviton2-based Arm architecture instances, and at the same time, the flexible management of ParallelCluster makes the operation of WRF tasks both efficient and economical. Using WRF for meteorological analysis and prediction on the cloud can not only greatly improve your efficiency, but also make the cost flexible and controllable, and it also makes the display and use of the results more convenient.

References

1. Amazon Graviton official website:

https://aws.amazon.com/cn/ec2/graviton/

2.Amazon ParallelCluster：

https://docs.Amazon.amazon.com/zh_cn/parallelcluster/latest/ug/what-is-Amazon-parallelcluster.html

3.WRF User Manual:

https://www2.mmm.ucar.edu/wrf/users/docs/user_guide_v4/v4.2/contents.html

4. Geographic data download from WRF official website:

http://www2.mmm.ucar.edu/wrf/users/download/get_sources_wps_geog.html

5. NCEP meteorological real-time data download:

ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod

Author of this article

Yang

Amazon Cloud Technology Solutions Architect

Responsible for the consulting and architecture design of Amazon-based cloud computing solutions, currently focusing on the new energy power industry. Committed to promoting the application of HPC and IoT technology in wind power, photovoltaic and other new energy power industries.

Amazon Cloud Technology Hybrid Cloud Solution Architect

Responsible for consulting and design of hybrid cloud solution architecture based on Amazon cloud technology. Before joining Amazon Cloud Technology, he worked in a large group company. Responsible for the design and construction of private cloud data centers, and has many years of experience in data center infrastructure, virtualization, high-performance computing and hybrid cloud.

Enabling numerical weather forecasting based on the Arm architecture on the cloud

Background Introduction

Building an Amazon ParallelCluster cluster

Install WRF Mode System Dependent Components

submits WRF parallel computing task

Summary

References

亚马逊云开发者

引用和评论

利用 Amazon Bedrock Data Automation（BDA）对视频数据进行自动化处理与检索

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

在 ApeCloud （云猿生数据）实习是怎样的体验？跟行业大佬练技术修为的一年小记

基于 KubeBlocks 的 PikiwiDB(原Pika) 云化下一站

阿里云 ESA 游戏行业解决方案｜安全防护、加速、低延时的技术融合

Linux系统安装更新Python3.x版本详细步骤

K3s + KubeSphere + DeepSeek 全流程部署指南：轻量 K8s 与 AI 大模型私有化实践