Geometric quantile-based measures of multivariate distributional characteristics
Abstract
Several new geometric quantile-based measures for multivariate dispersion, skewness, kurtosis, and spherical asymmetry are defined. These measures differ from existing measures, which use volumes and are easy to calculate. Some theoretical justification is given, followed by experiments illustrating that they are reasonable measures of these distributional characteristics and computing confidence regions with the desired coverage.
Keywords: Geometric quantiles, multivariate kurtosis, multivariate skewness, spherical symmetry
1 Introduction
Denote the open ball in with center and radius by . Chaudhuri, (1996) defined a geometric notion of quantiles for multidimensional data by defining the -quantile, , to be the value of that minimizes , which generalizes the univariate quantile loss function. However, a common criticism of these geometric quantiles is that the isoquantile contours need not follow the shape of the distribution very well when it is not spherically symmetric, especially for extreme quantiles, which also need not be contained in the convex hull of the distribution, as shown by Girard and Stupfler, (2017). To mitigate this problem, Chakraborty, (2003) devised a procedure to transform the data to have roughly isotropic covariance.
A potential use of geometric quantiles that does not need this procedure is in defining robust measures of multivariate centrality, dispersion, skewness, and kurtosis. For real-valued data, the first four moments are used to define measures of centrality, dispersion, skewness, and kurtosis, respectively. Meanwhile, some quantile-based measures of these quantities also exist in the literature. Denoting the -quantile for univariate data by , , the median is a centrality measure, and
for measures dispersion; this is the interquartile range when . A standard quantile-based measure of skewness is
where is typically set to (Groeneveld and Meeden,, 1984), and kurtosis (or tailedness) can be measured by
for (Balanda and MacGillivray,, 1988).
For multivariate data, Chaudhuri, (1996) defined measures of multivariate dispersion, skewness, and kurtosis using the volume of a region enclosed by an isoquantile contour, by which we mean the set of all -quantiles for fixed . However, these measures are only briefly mentioned in that paper and are computationally complex for two reasons. First, since geometric quantiles cannot, in general, be found analytically, it is not feasible to have an exact description of an entire isoquantile contour as that would require numerically calculating the -quantile for every of a given size. Second, computing the volume of the region enclosed by an arbitrary hypersurface is non-trivial even in just two dimensions, especially considering that the enclosed region may not be convex.
In Section 2, we define two new measures for each of these characteristics, one based on averages and the other on suprema, which bypass the second issue altogether by avoiding volumes, while circumventing the first one. Furthermore, we introduce a measure of spherical asymmetry and investigate some theoretical underpinnings of these measures. Section 3 describes the use of these measures and calculates confidence regions and coverage.
2 Quantile-based measures of multivariate dispersion, skewness, kurtosis, and spherical asymmetry
For an -dimensional random vector with unique -quantiles for all , define a map that sends to the -quantile of , and denote by . For such an , we introduce two measures of dispersion as follows
where is the surface area of the unit -sphere. For skewness, we suggest the following two measures
and for kurtosis,
for . When , it is clear that , , and . Unlike and , can be positive or negative, so its generalization to higher dimensions should be a vector as is. However, we have also included , which is a scalar and thus, strictly speaking, generalizes rather than itself.
Under the following assumption, Fact 2.1.1 of Chaudhuri, (1996) guarantees that a unique -quantile of exist for all .
Assumption 2.1.
The support of the -dimensional random vector , , is not contained on a single line.
Proposition 6.1 in Konen and Paindaveine, (2022) shows that for such an , is continuous. In the rest of this section, we assume that satisfies Assumption 2.1.
We now consider the spherical asymmetry of a distribution, which is a relevant property for multivariate distributions. We propose the spherical asymmetry of a distribution as
By transforming the data as in Chakraborty, (2001) to have roughly isotropic covariance, can also be used to measure elliptical asymmetry. All of our measures can be used to test properties of distributions, such as whether a distribution is skewed or spherically symmetric or whether one distribution is more spread out than another.
Note that all of the integrands used to define these measures are continuous thanks to Assumption 2.1 and thus have finite integrals because the domain of integration is the compact . Next, consider the following two assumptions.
Assumption 2.2.
There exists some for which .
Assumption 2.3.
For all , .
Clearly and are defined in if and only if Assumption 2.2 holds. In fact, and are defined in under the same condition and only under this condition since the continuity of implies that the map is positive on a non-null set (according to the standard volume measure on ). On the other hand, is defined in if and only if Assumption 2.3 holds. Both of these assumptions can be guaranteed when has a non-atomic distribution not supported on a single line, in which case is a homeomorphism and hence injective; see Theorem 6.2 in Konen and Paindaveine, (2022).
is said to have a non-skewed distribution if and have identical distributions and a spherically symmetric distribution if and have identical distributions for all orthogonal matrices . Then, we can show that the skewness and spherical asymmetry measures behave as desired when the distribution of is non-skewed or spherically symmetric, respectively. We can also show that our measures have desirable invariance and equivariance properties with respect to scaling, translation, and orthogonal transformations like rotation and reflection.
Proposition 2.1.
Let and Assumption 2.1 hold for .
Proof.
(a) Fact 2.2.1 of Chaudhuri, (1996) details the equivariance of geometric quantiles to translation and orthogonal transformations. The result follows immediately from
where the five equalities follow from translation equivariance, non-skewedness, orthogonal equivariance, non-skewedness, and translation equivariance, respectively.
(b) For any , there exists some orthogonal for which . Then,
where the five equalities follow from translation equivariance, spherical symmetry, orthogonal equivariance, spherical symmetry, and translation equivariance, respectively. Thus, the orthogonality of implies that for all , from which the result follows.
(c) Note that Assumption 2.1 holds for . By the aforementioned equivariance of geometric quantiles with respect to translation and orthogonal transformations, in addition to equivariance with respect to scaling,
(1) |
see Facts 2.2.1 and 2.2.2, of Chaudhuri, (1996). This implies that Assumption 2.2 holds for whenever it does so for , and similarly for Assumption 2.3. Given , is in if and only if for some orthogonal . The results are then easily derived from this fact and (1). β
For each positive integer , define a set of cardinality . Then, our measures can be approximated by
(2) |
For each , define to be the empirical measure corresponding to ; that is, the probability measure that assigned a mass of to each point in . Define to the uniform probability measure on , or equivalently the normed volume measure of ; that is, for any measurable , is the -dimensional volume of according to the standard volume measure on divided by . The sequence of sets is called equidistributed if the corresponding sequence of empirical measures converge weakly to .
Proposition 2.2.
Proof.
For a continuous function , there exists some in the compact set for which . By continuity, for any , there is some open for which implies . Then, by the Portmanteau theorem, as , implying that is non-empty for sufficiently large . Therefore, for sufficiently large , while since . This can be done for any , so . The analogous result replacing with and with can be shown similarly, and because is continuous, , , , and converge to the appropriate terms as .
The aforementioned continuous is also bounded because its domain is compact, and so the Portmanteau theorem also implies that . Then, the rest of the statement immediately follows from the continuity of and each of its component functions. β
Remark 2.1.
When , letting clearly results in an equidistributed sequence. The situation is more complicated when . However, by letting , where are independent and identically distributed random elements drawn from the uniform distribution on for each positive integer , we can guarantee that is almost surely equidistributed (and therefore that the convergences in Proposition 2.2 happen almost surely) because is separable; see for example Problem 3.1 on page 38 of Billingsley, (1999). One can easily draw an element from the uniform distribution on by generating a vector from an -variate normal distribution with non-zero, isotropic variance and dividing the vector by its norm.
3 Numerical experiments
The code for the experiments in this section, implemented in Python with PyTorch, can be found at https://github.com/hayoungshin1/Quantile-based-measures.
3.1 Illustration of these measures
In this section, we want to explore the performance of each of the seven measures. To do so, we generated 16 datasets, four for each of the four properties (variance, skewness, kurtosis, and spherical asymmetry) in the Euclidean plane. For a given property, each dataset has an associated level , and the datasets are generated in such a way that a plausible measure of that property should decrease as increases. Next, we will describe precisely how the data were generated.
First, we generated vectors from a bivariate normal distribution , and for each of the 16 datasets, we created a new vector from . To be specific, for the dispersion datasets,
for the skewness datasets,
for the kurtosis datasets,
and for the spherical asymmetry datasets,
Defining to be the Euclidean geometric median of , the set of vectors was centered by letting for each . Doing so ensures that the Euclidean geometric median of the βs is 0. This makes it easier to directly compare isoquantile contours defined by for some for different datasets. The 16 datasets are shown in Figure 1. For a given dataset, the whose distributional characteristics we are measuring follows an empirical distribution, placing a mass of at each , .
Because of the concerns about geometric quantiles for non-isotropic distribution, for the dispersion datasets, we decided to decrease dispersion in only the second coordinate of to show that and work well even when the data do not have isotropic covariance. We divided the second coordinate of by 2 for the skewness and kurtosis datasets for the same reason. In contrast, for the spherical asymmetry datasets, we have not done so to show that can detect changes in spherical asymmetry even when the covariance remains roughly isotropic.
The dispersion datasets become less dispersed as increases. The rationale for generating the skewness and kurtosis datasets is that the element-wise square of a non-skewed centered real dataset is positively skewed, and the element-wise cube of a centered real dataset has fatter tails than the original dataset. Finally, cubing each coordinate for a spherically symmetric dataset gives a dataset that is no longer spherically symmetric but still has isotropic covariance.
Table 1 lists the univariate quantile-based measures of dispersion, skewness, and kurtosis defined in the introduction for each coordinate of from the appropriate datasets. It shows this decrease in the properties more precisely. These depend on the coordinate system and are imperfect as measures for multivariate data. Still, they do give a rough sense of the dispersion, skewness, and kurtosis of their respective data sets. For the kurtosis measures , and otherwise . We approximated our seven measures using (2) with for as described in Remark 2.1. Table 2 lists the results for these measures according to . We used for the kurtosis measures and for the others. Each of the measures decreases as increases, and moreover, the sizes of the drops generally comport with what we would expect from Table 1.
0 | 0.6719, 2.6402 | 0.3357, 0.4266 | 44.4432, 112.7463 |
---|---|---|---|
1 | 0.6719, 1.3201 | 0.2358, 0.4034 | 6.5622, 8.4033 |
2 | 0.6719, 0.6601 | 0.2510, 0.1990 | 4.6026, 5.7469 |
3 | 0.6719, 0.3300 | 0.0915, 0.0320 | 3.9035, 4.8311 |
0 | 2.7163 | 2.1863 | 0.2973 | 0.0555 | 24.7410 | 26.2747 | 0.3234 |
---|---|---|---|---|---|---|---|
1 | 1.4354 | 1.2754 | 0.2779 | 0.0501 | 7.1674 | 7.9259 | 0.1421 |
2 | 0.8747 | 0.8524 | 0.1599 | 0.0168 | 5.3794 | 5.9430 | 0.1092 |
3 | 0.7567 | 0.6622 | 0.0669 | 0.0015 | 4.6571 | 5.1550 | 0.1025 |
To alleviate concerns about extreme geometric quantiles, we also performed the same experiments when for the kurtosis measures and otherwise. The results are shown in Table 4, and Table 3 is analogous to Table 1. The concerns about (extreme) geometric quantiles do not have any bearing on how our measures perform. For example, suppose a dataset is transformed to maintain its median but becomes less dispersed. In that case, it is sufficient, though not necessary, for the contours are pulled inward toward the median, and the specific shapes of the isoquantile contours do not matter. This is observed for our dispersion data sets: see the first column of Figure 2, which shows how the isoquantile contours change with . The second column of Figure 2 clearly shows reductions in skewness as increases. The other two columns are more challenging to decipher visually because some contours are too dense, but they are included for completeness.
0 | 2.3028, 9.2975 | 0.8502, 0.8561 | 269.4612, 702.7808 |
---|---|---|---|
1 | 2.3028, 4.6488 | 0.7933, 0.8541 | 24.3179, 31.8565 |
2 | 2.3028, 2.3244 | 0.5423, 0.5602 | 11.6365, 14.7756 |
3 | 2.3028, 1.1622 | -0.0414, -0.0276 | 7.1126, 8.8870 |
0 | 18.5634 | 14.4095 | 0.4242 | 0.0711 | 152.8021 | 162.3169 | 0.3463 |
---|---|---|---|---|---|---|---|
1 | 9.3980 | 7.8012 | 0.3387 | 0.0513 | 31.0124 | 35.2529 | 0.2456 |
2 | 5.1021 | 5.0053 | 0.1856 | 0.0165 | 19.9837 | 21.1333 | 0.1376 |
3 | 4.8817 | 4.0176 | 0.0298 | 0.0002 | 15.7785 | 15.4955 | 0.0640 |
3.2 Confidence regions
We now use bootstrapping to compute confidence regions and, hence, do testing. Recall that for a parameter , an estimate , and bootstrap estimates , a possible % bootstrap confidence region is , where is some region containing about % of the values of , . Here, we let with chosen for appropriate coverage.
Setting and , for following the standard bivariate normal distribution, we calculated a confidence region for (precisely, for the same as in the previous section) based on draws from this distribution. For this , for all thanks to Proposition 1(a), so we checked whether 0 was contained in the confidence region. To estimate confidence region coverage, this process was repeated 1000 times for and 0.98. Our confidence regions had excellent coverage, with 96% and 94.8% of confidence regions containing 0 when and 0.98, respectively.
Because