X1,··,X5的边际分布是什么?(b) 给定X3=2,X4=3和X5=,(X1,X2)的条件分布是什么?1.(你应该使用R或其他软件来计算您需要的任何矩阵逆。)(c) 使用c的逆,给出X中依赖关系的图结构。(再次使用R计算C?1,但请注意,数值计算存在舍入误差!)2.Quercus上的marks.txt文件包含讲座中考虑的考试分数数据。因为
分析,我们假设数据来自多变量正态分布。数据可以读取为RAssignment #1 STA437H1S/2005H1Sdue Friday February 3, 2023Instructions: Solutions to problems 1–4 are to be submitted on Quercus (PDF files only).1. Suppose that X = (X1, · · · , X5)T ~ N5(μ, C) whereμ =(a) What are the marginal distributions of X1, · · · , X5?(b) What is the conditional distribution of (X1, X2) given X3 = 2, X4 = 3 and X5 = ?1? (Youshould use R or some other software to compute whatever matrix inverses you need.)(c) Using the inverse of C, give the graph structure of the dependence in X. (Again use R tocompute C?1 but note that the numerical computation is subject to roundoff error!)2. The file marks.txt on Quercus contains the exam marks data considered in lecture. In thatanalysis, we assumed that the data came from a multivariate Normal distribution. The data canbe read into R as follows:> exam <- scan("marks.txt",what=list(0,0,0,0,0))> mec <- exam[[1]]
> vec <- exam[[2]]
> alg <- exam[[3]]
> ana <- exam[[4]]
> sta <- exam[[5]]
(a) Look at Normal quantile-quantile plots of the 5 variables separately using qqnorm. You canjudge the “goodness-of-fit” using the Shapiro-Wilk test, which can be implemented in R usingshapiro.test. Comment on the results.(b) Use the function qqmultinorm (which is in the file qqmultinorm.txt on Quercus) to assess themultivariate normality of the data. The following R code will look at 100 Normal quantile-quantileplots of 100 randomly chosen projections and compute p-values for the Shapiro-Wilk test for eachprojection:> r <- qqmultinorm(cbind(mec,vec,alg,ana,sta),nproj=100,plot.edf=T)The plot produced will be the empirical distribution function of the 100 p-values compared to thedistribution function of a uniform distribution on [0, 1]. Based on this plot, do the data seem to be(at least approximately) multivariate Normal?(c) The other approach to testing multivariate normality given in lecture is to compare the empiricaldistribution of the Mahalanobis distances {(xi ? xˉ)TS?1(xi ? xˉ) : i = 1, · · · , n} to a χ2(5) distri-bution. If the data are contained in an n× p matrix, the Mahalonobis distances can be computedin R as follows:> mdist <-mahalanobis(x,colMeans(x),var(x))Does this plot confirm your conclusion from part (b)?3. The file crabs.txt on Quercus contains data on two species of rock crabs, which are distinguishedby their colour (blue or orange); the columns of the file are species (B or O), sex (M or F), index(1-50 within each species-sex combination), width of the frontal lip (LP), the rear width of theshell (RW), length along the midline of the shell (CL), the maximum width of the shell (CW), andthe body depth (BD). Ultimately, we would like to use the latter 5 variables to classify the speciesand sex of a crab but at this stage, we will simply look at the structure of the data to see whichvariables might be useful in classifying the species and sex of a rock crab.The data can ne read into R using the following code:> x <- scan("crabs.txt",skip=1,what=list("c","c",0,0,0,0,0,0))> colour <- ifelse(x[[1]]=="B","blue","orange")> sex <- x[[2]]
> FL <- x[[4]]
> RW <- x[[5]]
> CL <- x[[6]]
> CW <- x[[7]]
> BD <- x[[8]]
Use the following code to look at pairwise scatterplots of the 5 variables:> pairs(cbind(FL,RW,CL,CW,BD),pch=sex,col=colour)The males and females are indicated on the plots by M and F respectively with the two speciesbeing indicated by the colour of the points.(a) Which pairwise scatterplots are particularly effective for “separating” the two species?(b) Which pairwise scatterplots are effective for “separating” the two sexes?4. In class, we stated that we can assess whether multivariate data can be modeled by a multivariateNormal distribution by checking the normality of a collection of one dimensional projections (forexample, by using normal quantile-quantile plots). When p is very large, this procedure breaksdown – almost every one dimensional projection appears to be Normal.(a) Consider n=100 observations of a 1000-variate distribution whose components are independentexponential random variables with mean 1; the joint density of this distribution isf(x1, · · · , x1000) = exp(for x1, · · · , x1000 ≥ 0We can simulate the 100 observations in R as follows:> x <- matrix(rgamma(100000,1),ncol=1000)Now use the function qqmultinorm (available on Quercus) to look at normal quantile-quantile plotsof 20 one dimensional projections:> r <-qqmultinorm(x,nproj=20,plot.qq=T,plot.edf=T)How do these quantile-quantile plots compare to the quantile-quantile plots of each variable (forexample, qqnorm(x[,1]))?(b) Can you explain this phenomenon? (Hint: What is the approximate distribution of aTX =∑pj=1
WX:codehelp
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。