AMATH 483 / 583 linear system solver

AMATH 483 / 583 (roche) - HW6
Due Friday May 31, 11:59pm PT
May 24, 2024
Homework 6 (80 points, 0 EC points)

(+20) Complex double linear system solver. Plot both the log of the residual and the log of the
normalized error ( kbAzk2
kAk1 kzk2 ✏machine ) versus the square matrix dimensions 16,32,64,...,8192 for the following
LAPACK routine. It is supported in the OpenBLAS build on Hyak. Submit your plot, and label it
accordingly.
l a p a c k i n t LAPACKE zgesv( int matrix orde r ,
l a p a c k i n t n ,
l a p a c k i n t nrhs ,
lapack compl ex doubl e ∗ a ,
l a p a c k i n t lda ,
l a p a c k i n t ∗ ipiv ,
lapack compl ex doubl e ∗ b ,
l a p a c k i n t ldb );
Use the following snippet code to initialize your matrices and rhs vectors and note the headers I use:
include <ios t ream>
include <complex>
include <c s t d l i b >
include <c s t r i n g >
include <cmath>
include <ve c tor>
include <chrono>
include <l i m it s >
include <c b l a s . h>
include <lapacke . h>
. . .
int main () {
. . .
a =( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ ma ∗ na ) ;
b = ( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ ma ) ;
z = ( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ na ) ;
. . .
s rand ( 0 );
int k =0;
for ( int j = 0 ; j < na ; j++) {
for ( int i = 0 ; i < ma ; i++) {
a [ k ] = 0 . 5 − (double ) rand () / (double )RANDMAX
s td : : complex<double>(0 , 1)
∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ;
i f ( i==j ) a [ k]∗= s tat ic cas t<double>(ma ) ;
k++;
}
}
s rand ( 1 );
for ( int i = 0 ; i < ma; i++) {
b [ i ] = 0 . 5 − (double ) rand () / (double )RANDMAX
s td : : complex<double>(0 , 1)
∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ;
}
. . .
(+20) CPU-GPU data copy speed on HYAK. Write a C++ code to measure the data copy performance
between the host CPU and GPU (host to device), and between the GPU and the host CPU (device to host). Copy
bytes to 256MB increasing in multiples of 2. Plot the bandwidth for both directions: (bytes per second) on the
y-axis and the bu↵er size in bytes on the x-axis. Submit your plot and test code.
(+20) Compare FFTW to CUFFT on HYAK. Measure and plot the performance of calculating the gradient
of a 3D double complex plane wave deﬁned on cubic lattices of dimension n3 from 163 to n = 2563, stride n⇤ = 2
for both the FFTW and CUDA FFT (CUFFT) implementations on HYAK. Let each n be measured ntrial times
and plot the average performance for each case versus n, ntrial 3. Submit your performance plot which should
have ’FLOPs’ on the y-axis (or some appropriate unit of FLOPs) and the dimension of the cubic lattices (n) on
the x-axis. You will need to estimate the operation count of computing the derivative using FFT on a lattice.
(+20) Fourier transforms. Evaluate the Fourier transform of the following functions by hand. Use the deﬁnitions
I provided (includes p1
2⇡ , this is common in physics but also now the default used in WolframAlpha - a powerful
math AI tool) as well as the deﬁnition for Dirac delta I used in lecture if needed.
WX：codinghelp

AMATH 483 / 583 linear system solver

include <ios t ream>

include <complex>

include <c s t d l i b >

include <c s t r i n g >

include <cmath>

include <ve c tor>

include <chrono>

include <l i m it s >

include <c b l a s . h>

include <lapacke . h>

kidukokr

引用和评论

Network application Laboratory

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

OpenWebUI：一站式 AI 应用构建平台体验

Spring 数据校验：@Validated 与@Valid 注解全面对比与应用