AMATH 483 / 583 (roche) - HW6
Due Friday May 31, 11:59pm PT
May 24, 2024
Homework 6 (80 points, 0 EC points)

  1. (+20) Complex double linear system solver. Plot both the log of the residual and the log of the
    normalized error ( kbAzk2
    kAk1 kzk2 ✏machine ) versus the square matrix dimensions 16,32,64,...,8192 for the following
    LAPACK routine. It is supported in the OpenBLAS build on Hyak. Submit your plot, and label it
    accordingly.
    l a p a c k i n t LAPACKE zgesv( int matrix orde r ,
    l a p a c k i n t n ,
    l a p a c k i n t nrhs ,
    lapack compl ex doubl e ∗ a ,
    l a p a c k i n t lda ,
    l a p a c k i n t ∗ ipiv ,
    lapack compl ex doubl e ∗ b ,
    l a p a c k i n t ldb );
    Use the following snippet code to initialize your matrices and rhs vectors and note the headers I use:

    include <ios t ream>

    include <complex>

    include <c s t d l i b >

    include <c s t r i n g >

    include <cmath>

    include <ve c tor>

    include <chrono>

    include <l i m it s >

    include <c b l a s . h>

    include <lapacke . h>

    . . .
    int main () {
    . . .
    a =( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ ma ∗ na ) ;
    b = ( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ ma ) ;
    z = ( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ na ) ;
    . . .
    s rand ( 0 );
    int k =0;
    for ( int j = 0 ; j < na ; j++) {
    for ( int i = 0 ; i < ma ; i++) {
    a [ k ] = 0 . 5 − (double ) rand () / (double )RANDMAX

  2. s td : : complex<double>(0 , 1)
    ∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ;
    i f ( i==j ) a [ k]∗= s tat ic cas t<double>(ma ) ;
    k++;
    }
    }
    s rand ( 1 );
    for ( int i = 0 ; i < ma; i++) {
    b [ i ] = 0 . 5 − (double ) rand () / (double )RANDMAX
  3. s td : : complex<double>(0 , 1)
    ∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ;
    }
    . . .
  4. (+20) CPU-GPU data copy speed on HYAK. Write a C++ code to measure the data copy performance
    between the host CPU and GPU (host to device), and between the GPU and the host CPU (device to host). Copy
  5. bytes to 256MB increasing in multiples of 2. Plot the bandwidth for both directions: (bytes per second) on the
    y-axis and the bu↵er size in bytes on the x-axis. Submit your plot and test code.
  6. (+20) Compare FFTW to CUFFT on HYAK. Measure and plot the performance of calculating the gradient
    of a 3D double complex plane wave defined on cubic lattices of dimension n3 from 163 to n = 2563, stride n⇤ = 2
    for both the FFTW and CUDA FFT (CUFFT) implementations on HYAK. Let each n be measured ntrial times
    and plot the average performance for each case versus n, ntrial 3. Submit your performance plot which should
    have ’FLOPs’ on the y-axis (or some appropriate unit of FLOPs) and the dimension of the cubic lattices (n) on
    the x-axis. You will need to estimate the operation count of computing the derivative using FFT on a lattice.
  7. (+20) Fourier transforms. Evaluate the Fourier transform of the following functions by hand. Use the definitions
    I provided (includes p1
    2⇡ , this is common in physics but also now the default used in WolframAlpha - a powerful
    math AI tool) as well as the definition for Dirac delta I used in lecture if needed.
    WX:codinghelp

kidukokr
1 声望0 粉丝