Rutgers University
School of Arts and Sciences
Department of Statistics
Department of Statistics, Rutgers University, New Brunswick, NJ
Author Email: dtyler@stat.rutgers.edu
Abstract: When sampling from a multivariate normal distribution, the sample mean vector and sample covariance matrix provide a sufficient summary of the data set. To protect against nonnormality, and in particular against longer tailed distributions and outliers, one can replace the sample mean vector and covariance matrix with robust estimates of multivariate location and scatter. Outliers can often be detected by examining the corresponding robust Mahalanobis distances. Such an approach is appropriate if the bulk of the data arises from a multivariate normal distribution or more generally from an elliptically symmetric distribution. However, if the data arises otherwise, then different location/scatter estimates do no estimate the same population quantities, but rather are reflecting different aspects of the underlying distribution.
Invariant Coordinate Selection [1] is a general multivariate method based on the idea that examining differences between scatter estimates may uncover interesting structures in multivariate data, ones which may not be apparent from a plot of robust Mahalanobis distances. ICS is based on the eigenvalue-eigenvector decomposition of one estimate of scatter relative to another. An important property of this decomposition is that the corresponding eigenvectors generate an affine invariant coordinate system (ICS) for the multivariate data. This leads to new affine equivariant multivariate statistical and graphical methods. By plotting the data with respect to this new invariant coordinate system, various data structures can be revealed. For example, under certain independent component analysis models, which are popular within computer science and engineering disciplines, the invariant coordinates correspond to the independent components. Also, if the data arises from a mixture of elliptical distributions, then a subset of the invariant coordinates correspond to Fishers linear discriminant subspace, even though the class identification of the data points are unknown.
The goal of this talk is to review ICS, to discuss its robustness features, and to explain why the method is able to be resistant to outlier. In particular, it is noted that ICS passes the “wheelbarrow” test for multivariate robustness.
Keywords: Clustering, Discriminant analysis, ICA, ICS, M-estimation, Unmixing
References: [1] Tyler, D.E., Critchley, F., Dümbgen, L. and Oja, H. (2009). Invariant co-ordinate selection (with discussion). JRSS-B, 71(3) pp. 549–592.
