-
Nonlinear dispersion relation of dust acoustic waves using the Korteweg-de Vries model
Authors:
Farida Batool,
Ajaz Mir,
Sanat Tiwari,
Abhijit Sen
Abstract:
In this brief communication, we present an exact analytic nonlinear dispersion relation (NLDR) for the dust acoustic waves using the Korteweg-de Vries (KdV) model. The NLDR agrees with the spectrum of spatio-temporal evolution obtained from an exact solution as in Mir~\textit{et al.}~[Phys. Plasmas \textbf{27}, 113701 (2020)]. The NLDR also shows a reasonable match with the experimental data of Th…
▽ More
In this brief communication, we present an exact analytic nonlinear dispersion relation (NLDR) for the dust acoustic waves using the Korteweg-de Vries (KdV) model. The NLDR agrees with the spectrum of spatio-temporal evolution obtained from an exact solution as in Mir~\textit{et al.}~[Phys. Plasmas \textbf{27}, 113701 (2020)]. The NLDR also shows a reasonable match with the experimental data of Thompson~\textit{et al.}~[Phys. Plasmas \textbf{4}, 2331 (1997)] in the long wavelength limit ($k λ_D \ll 1$). We suggest that such nonlinear corrections should be incorporated in the dispersion relation along with damping, streaming, and correlation effects in order to provide a more realistic interpretation of experimental data.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Disentangling the effects of traits with shared clustered genetic predictors using multivariable Mendelian randomization
Authors:
Fatima Batool,
Ashish Patel,
Dipender Gill,
Stephen Burgess
Abstract:
When genetic variants in a gene cluster are associated with a disease outcome, the causal pathway from the variants to the outcome can be difficult to disentangle. For example, the chemokine receptor gene cluster contains genetic variants associated with various cytokines. Associations between variants in this cluster and stroke risk may be driven by any of these cytokines. Multivariable Mendelian…
▽ More
When genetic variants in a gene cluster are associated with a disease outcome, the causal pathway from the variants to the outcome can be difficult to disentangle. For example, the chemokine receptor gene cluster contains genetic variants associated with various cytokines. Associations between variants in this cluster and stroke risk may be driven by any of these cytokines. Multivariable Mendelian randomization is an extension of standard univariable Mendelian randomization to estimate the direct effects of related exposures with shared genetic predictors. However, when genetic variants are clustered, a Goldilocks dilemma arises: including too many highly-correlated variants in the analysis can lead to ill-conditioning, but pruning variants too aggressively can lead to imprecise estimates or even lack of identification. We propose multivariable methods that use principal component analysis to reduce many correlated genetic variants into a smaller number of orthogonal components that are used as instrumental variables. We show in simulations that these methods result in more precise estimates that are less sensitive to numerical instability due to both strong correlations and small changes in the input data. We apply the methods to demonstrate the most likely causal risk factor for stroke at the chemokine gene cluster is monocyte chemoattractant protein-1.
△ Less
Submitted 2 October, 2021; v1 submitted 25 September, 2021;
originally announced September 2021.
-
Clustering with the Average Silhouette Width
Authors:
Fatima Batool,
Christian Hennig
Abstract:
The Average Silhouette Width (ASW; Rousseeuw (1987)) is a popular cluster validation index to estimate the number of clusters. Here we address the question whether it also is suitable as a general objective function to be optimized for finding a clustering. We will propose two algorithms (the standard version OSil and a fast version FOSil) and compare them with existing clustering methods in an ex…
▽ More
The Average Silhouette Width (ASW; Rousseeuw (1987)) is a popular cluster validation index to estimate the number of clusters. Here we address the question whether it also is suitable as a general objective function to be optimized for finding a clustering. We will propose two algorithms (the standard version OSil and a fast version FOSil) and compare them with existing clustering methods in an extensive simulation study covering the cases of a known and unknown number of clusters. Real data sets are also analysed, partly exploring the use of the new methods with non-Euclidean distances. We will also show that the ASW satisfies some axioms that have been proposed for cluster quality functions (Ackerman and Ben-David (2009)). The new methods prove useful and sensible in many cases, but some weaknesses are also highlighted. These also concern the use of the ASW for estimating the number of clusters together with other methods, which is of general interest due to the popularity of the ASW for this task.
△ Less
Submitted 21 November, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Initialization methods for optimum average silhouette width clustering
Authors:
Fatima Batool
Abstract:
A unified clustering approach that can estimate number of clusters and produce clustering against this number simultaneously is proposed. Average silhouette width (ASW) is a widely used standard cluster quality index. A distance based objective function that optimizes ASW for clustering is defined. The proposed algorithm named as OSil, only, needs data observations as an input without any prior kn…
▽ More
A unified clustering approach that can estimate number of clusters and produce clustering against this number simultaneously is proposed. Average silhouette width (ASW) is a widely used standard cluster quality index. A distance based objective function that optimizes ASW for clustering is defined. The proposed algorithm named as OSil, only, needs data observations as an input without any prior knowledge of the number of clusters. This work is about thorough investigation of the proposed methodology, its usefulness and limitations. A vast spectrum of clustering structures were generated, and several well-known clustering methods including partitioning, hierarchical, density based, and spatial methods were consider as the competitor of the proposed methodology. Simulation reveals that OSil algorithm has shown superior performance in terms of clustering quality than all clustering methods included in the study. OSil can find well separated, compact clusters and have shown better performance for the estimation of number of clusters as compared to several methods. Apart from the proposal of the new methodology and it's investigation the paper offers a systematic analysis on the estimation of cluster indices, some of which never appeared together in comparative simulation setup before. The study offers many insightful findings useful for the selection of the clustering methods and indices for clustering quality assessment.
△ Less
Submitted 26 February, 2021; v1 submitted 18 October, 2019;
originally announced October 2019.
-
An agglomerative hierarchical clustering method by optimizing the average silhouette width
Authors:
Fatima Batool
Abstract:
An agglomerative hierarchical clustering (AHC) framework and algorithm named HOSil based on a new linkage metric optimized by the average silhouette width (ASW) index is proposed. A conscientious investigation of various clustering methods and estimation indices is conducted across a diverse verities of data structures for three aims: a) clustering quality, b) clustering recovery, and c) estimatio…
▽ More
An agglomerative hierarchical clustering (AHC) framework and algorithm named HOSil based on a new linkage metric optimized by the average silhouette width (ASW) index is proposed. A conscientious investigation of various clustering methods and estimation indices is conducted across a diverse verities of data structures for three aims: a) clustering quality, b) clustering recovery, and c) estimation of number of clusters. HOSil has shown better clustering quality for a range of artificial and real world data structures as compared to k-means, PAM, single, complete, average, Ward, McQuitty, spectral, model-based, and several estimation methods. It can identify clusters of various shapes including spherical, elongated, relatively small sized clusters, clusters coming from different distributions including uniform, t, gamma and others. HOSil has shown good recovery for correct determination of the number of clusters. For some data structures only HOSil was able to identify the correct number of clusters.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.