Description
Multivariate Analysis
Comprehensive Reference Work on Multivariate Analysis and its Applications
The first edition of this book, by Mardia, Kent and Bibby, has been used globally for over 40 years. This second edition brings many topics up to date, with a special emphasis on recent developments.
A wide range of material in multivariate analysis is covered, including the classical themes of multivariate normal theory, multivariate regression, inference, multidimensional scaling, factor analysis, cluster analysis and principal component analysis. The book also now covers modern developments such as graphical models, robust estimation, statistical learning, and high-dimensional methods. The book expertly blends theory and application, providing numerous worked examples and exercises at the end of each chapter. The reader is assumed to have a basic knowledge of mathematical statistics at an undergraduate level together with an elementary understanding of linear algebra. There are appendices which provide a background in matrix algebra, a summary of univariate statistics, a collection of statistical tables and a discussion of computational aspects. The work includes coverage of:
- Basic properties of random vectors, copulas, normal distribution theory, and estimation
- Hypothesis testing, multivariate regression, and analysis of variance
- Principal component analysis, factor analysis, and canonical correlation analysis
- Discriminant analysis, cluster analysis, and multidimensional scaling
- New advances and techniques, including supervised and unsupervised statistical learning, graphical models and regularization methods for high-dimensional data
Although primarily designed as a textbook for final year undergraduates and postgraduate students in mathematics and statistics, the book will also be of interest to research workers and applied scientists.
Table of Contents
Epigraph
Preface xiii
Preface to the Second Edition xiii
Preface to the First Edition xv
Acknowledgements from First Edition xviii
Notation, abbreviations and key ideas xix
1 Introduction 1
1.1 Objects and Variables 1
1.2 Some Multivariate Problems and Techniques 2
1.2.1 Generalizations of univariate techniques 2
1.2.2 Dependence and regression 2
1.2.3 Linear combinations 2
1.2.4 Assignment and dissection 5
1.2.5 Building configurations 6
1.3 The Data Matrix 6
1.4 Summary Statistics 7
1.4.1 The mean vector and covariance matrix 8
1.4.2 Measures of multivariate scatter 11
1.5 Linear Combinations 11
1.5.1 The scaling transformation 12
1.5.2 Mahalanobis transformation 12
1.5.3 Principal component transformation 12
1.6 Geometrical Ideas 13
1.7 Graphical Representation 14
1.7.1 Univariate scatters 14
1.7.2 Bivariate scatters 16
1.7.3 Harmonic curves 16
1.7.4 Parallel coordinates plot 19
1.8 Measures of Multivariate Skewness and Kurtosis 19
2 Basic Properties of Random Vectors 27
2.1 Cumulative Distribution Functions and Probability Density Functions 27
2.2 Population Moments 29
2.2.1 Expectation and correlation 29
2.2.2 Population mean vector and covariance matrix 29
2.2.3 Mahalanobis space 31
2.2.4 Higher moments 31
2.2.5 Conditional moments 33
2.3 Characteristic Functions 33
2.4 Transformations 35
2.5 The Multivariate Normal Distribution 36
2.5.1 Definition 36
2.5.2 Geometry 38
2.5.3 Properties 38
2.5.4 Singular multivariate normal distribution 43
2.5.5 The matrix normal distribution 44
2.6 Random Samples 45
2.7 Limit Theorems 47
3 Non-normal Distributions 53
3.1 Introduction 53
3.2 Some Multivariate Generalizations of Univariate Distributions 53
3.2.1 Direct generalizations 53
3.2.2 Common components 54
3.2.3 Stochastic generalizations 55
3.3 Families of Distributions 56
3.3.1 The exponential family 56
3.3.2 The spherical family 57
3.3.3 Elliptical distributions 60
3.3.4 Stable distributions 62
3.4 Insights into skewness and kurtosis 62
3.5 Copulas 64
3.5.1 The Gaussian Copula 66
3.5.2 The Clayton-Mardia copula 67
3.5.3 Archimedean Copulas 68
3.5.4 Fr´echet-H¨offding Bounds 69
4 Normal Distribution Theory 77
4.1 Characterization and Properties 77
4.1.1 The central role of multivariate normal theory 77
4.1.2 A definition by characterization 78
4.2 Linear Forms 79
4.3 Transformations of Normal Data Matrices 81
4.4 The Wishart Distribution 83
4.4.1 Introduction 83
4.4.2 Properties of Wishart matrices 83
4.4.3 PartitionedWishart matrices 86
4.5 The Hotelling T 2 Distribution 89
4.6 Mahalanobis Distance 92
4.6.1 The two-sample Hotelling T 2 statistic 92
4.6.2 A decomposition of Mahalanobis distance 93
4.7 Statistics Based on the Wishart Distribution 95
4.8 Other Distributions Related to the Multivariate Normal 99
5 Estimation 111
5.1 Likelihood and Sufficiency 111
5.1.1 The likelihood function 111
5.1.2 Efficient scores and Fisher’s information 112
5.1.3 The Cram´er-Rao lower bound 114
5.1.4 Sufficiency 115
5.2 Maximum Likelihood Estimation 116
5.2.1 General case 116
5.2.2 Multivariate normal case 117
5.2.3 Matrix normal distribution 122
5.3 Robust Estimation of Location and Dispersion for Multivariate Distributions 123
5.3.1 M-Estimates of location 123
5.3.2 Minimum covariance determinant 124
5.3.3 Multivariate trimming 124
5.3.4 Stahel-Donoho estimator 125
5.3.5 Minimum volume estimator 125
5.3.6 Tyler’s estimate of scatter 127
5.4 Bayesian inference 127
6 Hypothesis Testing 137
6.1 Introduction 137
6.2 The Techniques Introduced 139
6.2.1 The likelihood ratio test (LRT) 139
6.2.2 The union intersection test (UIT) 143
6.3 The Techniques Further Illustrated 146
6.3.1 One-sample hypotheses on
6.3.2 One-sample hypotheses on _ 148
6.3.3 Multi-sample hypotheses 152
6.4 Simultaneous Confidence Intervals 156
6.4.1 The one-sample Hotelling T 2 case 156
6.4.2 The two-sample Hotelling T 2 case 157
6.4.3 Other examples 157
6.5 The Behrens-Fisher Problem 157
6.6 Multivariate Hypothesis Testing: Some General Points 158
6.7 Non-normal Data 159
6.8 Mardia’s Non-parametric Test for the Bivariate Two-sample Problem 162
7 Multivariate Regression Analysis 169
7.1 Introduction 169
7.2 Maximum Likelihood Estimation 170
7.2.1 Maximum likelihood estimators for B and _ 170
7.2.2 The distribution of ˆB and ˆ_ 172
7.3 The General Linear Hypothesis 173
7.3.1 The likelihood ratio test (LRT) 173
7.3.2 The union intersection test (UIT) 175
7.3.3 Simultaneous confidence intervals 175
7.4 Design Matrices of Degenerate Rank 176
7.5 Multiple Correlation 178
7.5.1 The effect of the mean 178
7.5.2 Multiple correlation coefficient 178
7.5.3 Partial correlation coefficient 180
7.5.4 Measures of correlation between vectors 181
7.6 Least Squares Estimation 182
7.6.1 Ordinary least squares (OLS) estimation 182
7.6.2 Generalized least squares 183
7.6.3 Application to multivariate regression 183
7.6.4 Asymptotic consistency of least squares estimators 184
7.7 Discarding of Variables 184
7.7.1 Dependence analysis 184
7.7.2 Interdependence analysis 186
8 GraphicalModels 195
8.1 Introduction 195
8.2 Graphs and Conditional independence 196
8.3 Gaussian Graphical Models 201
8.3.1 Estimation 202
8.3.2 Model selection 207
8.4 Log-linear Graphical Models 208
8.4.1 Notation 209
8.4.2 Log-linear models 210
8.4.3 Log-linear models with a graphical interpretation 213
8.5 Directed and Mixed Graphs 215
9 Principal Component Analysis 221
9.1 Introduction 221
9.2 Definition and Properties of Principal Components 221
9.2.1 Population principal components 221
9.2.2 Sample principal components 224
9.2.3 Further properties of principal components 225
9.2.4 Correlation structure 229
9.2.5 The effect of ignoring some components 229
9.2.6 Graphical representation of principal components 232
9.2.7 Biplots 232
9.3 Sampling Properties of Principal Components 236
9.3.1 Maximum likelihood estimation for normal data 236
9.3.2 Asymptotic distributions for normal data 239
9.4 Testing Hypotheses about Principal Components 242
9.4.1 Introduction 242
9.4.2 The hypothesis that (_1 + · · · + _k)/(_1 + · · · + _p) = 244
9.4.3 The hypothesis that (p − k) eigenvalues of _ are equal 245
9.4.4 Hypotheses concerning correlation matrices 246
9.5 Correspondence Analysis 247
9.5.1 Contingency tables 247
9.5.2 Gradient analysis 253
9.6 Allometry— the Measurement of Size and Shape 255
9.7 Discarding of variables 258
9.8 Principal Component Regression 259
9.9 Projection Pursuit and Independent Component Analysis 261
9.9.1 Projection pursuit 261
9.9.2 Independent component analysis 263
9.10 PCA in high dimensions 266
10 Factor Analysis 277
10.1 Introduction 277
10.2 The Factor Model 278
10.2.1 Definition 278
10.2.2 Scale invariance 279
10.2.3 Non-uniqueness of factor loadings 279
10.2.4 Estimation of the parameters in factor analysis 280
10.2.5 Use of the correlation matrix R in estimation 281
10.3 Principal Factor Analysis 282
10.4 Maximum Likelihood Factor Analysis 284
10.5 Goodness of Fit Test 287
10.6 Rotation of Factors 288
10.6.1 Interpretation of factors 288
10.6.2 Varimax rotation 289
10.7 Factor Scores 293
10.8 Relationships Between Factor Analysis and Principal Component Analysis 294
10.9 Analysis of Covariance Structures 295
11 Canonical Correlation Analysis 299
11.1 Introduction 299
11.2 Mathematical Development 300
11.2.1 Population canonical correlation analysis 300
11.2.2 Sample canonical correlation analysis 304
11.2.3 Sampling properties and tests 305
11.2.4 Scoring and prediction 306
11.3 Qualitative Data and Dummy Variables 307
11.4 Qualitative and Quantitative Data 309
12 Discriminant Analysis and Statistical Learning 317
12.1 Introduction 317
12.2 Bayes’ Discriminant Rule 319
12.3 The error rate 320
12.3.1 Probabilities of misclassification 320
12.3.2 Estimation of error rate 323
12.3.3 Confusion matrix 324
12.4 Discrimination Using the Normal Distribution 324
12.4.1 Population discriminant rules 324
12.4.2 The sample discriminant rules 326
12.4.3 Is discrimination worthwhile? 334
12.5 Discarding of Variables 334
12.6 Fisher’s Linear Discriminant Function 336
12.7 Nonparametric Distance-based Methods 339
12.7.1 Nearest neighbor classifier 339
12.7.2 Large sample behavior of the Nearest Neighbor Classifier 341
12.7.3 Kernel classifiers 344
12.8 Classification Trees 346
12.8.1 Splitting criteria 348
12.8.2 Pruning 351
12.9 Logistic Discrimination 354
12.9.1 Logistic regression model 354
12.9.2 Estimation and inference 356
12.9.3 Interpretation of the parameter estimates 356
12.9.4 Extensions 360
12.10Neural Networks 360
12.10.1Motivation 360
12.10.2Multi-layer perceptron 361
12.10.3 Radial basis functions 363
12.10.4 Support Vector Machines 366
13 Multivariate Analysis of Variance 379
13.1 Introduction 379
13.2 Formulation of Multivariate One-way Classification 379
13.3 The Likelihood Ratio Principle 380
13.4 Testing Fixed Contrasts 382
13.5 Canonical Variables and a Test of Dimensionality 383
13.5.1 The problem 383
13.5.2 The LR test (_ known) 383
13.5.3 Asymptotic distribution of the likelihood ratio criterion 385
13.5.4 The estimated plane 386
13.5.5 The LR test (unknown _) 387
13.5.6 The estimated plane (unknown _) 387
13.5.7 Profile analysis 393
13.6 The union intersection approach 394
13.7 Two-way Classification 395
14 Cluster Analysis and Unsupervised Learning 405
14.1 Introduction 405
14.2 Probabilistic membership models 406
14.3 Parametric mixture models 410
14.4 Partitioning Methods 412
14.5 Hierarchical Methods 418
14.5.1 Agglomerative algorithms 418
14.5.2 Minimum spanning tree and single linkage 421
14.5.3 Properties of different agglomerative algorithms 423
14.6 Distances and Similarities 425
14.6.1 Distances 425
14.6.2 Similarity coefficients 430
14.7 Grouped Data 432
14.8 Mode Seeking 434
14.9 Measures of agreement 436
15 Multidimensional Scaling 449
15.1 Introduction 449
15.2 Classical solution 451
15.2.1 Some theoretical results 451
15.2.2 An algorithm for the classic MDS solution 454
15.2.3 Similarities 456
15.3 Duality Between Principal Coordinate Analysis and Principal Component
Analysis 459
15.4 Optimal Properties of the Classical Solution and Goodness of Fit 460
15.5 Seriation 467
15.5.1 Description 467
15.5.2 Horseshoe effect 468
15.6 Non-metric methods 469
15.7 Goodness of Fit Measure: Procrustes Rotation 472
15.8 Multi-sample Problem and Canonical Variates 475
16 High-dimensional Data 481
16.1 Introduction 481
16.2 ShrinkageMethods in Regression 483
16.2.1 The multiple linear regression model 483
16.2.2 Ridge regression 484
16.2.3 Least absolute selection and shrinkage operator (LASSO) 486
16.3 Principal Component Regression 488
16.4 Partial Least Squares Regression 490
16.4.1 Overview 490
16.4.2 The PLS1 algorithm to construct the PLS loading matrix for p = 1
response variable 490
16.4.3 The PLS2 algorithm to construct the PLS loading matrix for p > 1
response variables 493
16.4.4 The predictor envelope model 494
16.4.5 PLS regression 494
16.4.6 Joint envelope models 496
16.5 Functional Data 498
16.5.1 Functional principal component analysis 499
16.5.2 Functional linear regression models 503
A Matrix Algebra 509
A.1 Introduction 509
A.2 Matrix Operations 512
A.2.1 Transpose 512
A.2.2 Trace 513
A.2.3 Determinants and cofactors 513
A.2.4 Inverse 515
A.2.5 Kronecker products 516
A.3 Further Particular Matrices and Types of Matrix 517
A.3.1 Orthogonal matrices 517
A.3.2 Equicorrelation matrix 518
A.3.3 Centering matrix 519
A.4 Vector Spaces, Rank, and Linear Equations 519
A.4.1 Vector spaces 519
A.4.2 Rank 521
A.4.3 Linear equations 522
A.5 Linear Transformations 523
A.6 Eigenvalues and Eigenvectors 523
A.6.1 General results 523
A.6.2 Symmetric matrices 525
A.7 Quadratic Forms and Definiteness 531
A.8 Generalized Inverse 533
A.9 Matrix Differentiation and Maximization Problems 535
A.10 Geometrical Ideas 538
A.10.1 n-dimensional geometry 538
A.10.2 Orthogonal transformations 538
A.10.3 Projections 539
A.10.4 Ellipsoids 539
B Univariate Statistics 543
B.1 Introduction 543
B.2 Normal Distribution 543
B.3 Chi-squared Distribution 544
B.4 F and Beta Variables 544
B.5 t distribution 545
B.6 Poisson distribution 546
C R commands and data 547
C.1 Basic R Commands Related to Matrices 547
C.2 R Libraries and Commands Used in Exercises and Figures 548
C.3 Data Availability 549
D Tables 551
References 560
Index