-
Gini-stable Lorenz curves and their relation to the generalised Pareto distribution
Authors:
Lucio Bertoli-Barsotti,
Marek Gagolewski,
Grzegorz Siudem,
Barbara Żogała-Siudem
Abstract:
We introduce an iterative discrete information production process where we can extend ordered normalised vectors by new elements based on a simple affine transformation, while preserving the predefined level of inequality, G, as measured by the Gini index.
Then, we derive the family of empirical Lorenz curves of the corresponding vectors and prove that it is stochastically ordered with respect t…
▽ More
We introduce an iterative discrete information production process where we can extend ordered normalised vectors by new elements based on a simple affine transformation, while preserving the predefined level of inequality, G, as measured by the Gini index.
Then, we derive the family of empirical Lorenz curves of the corresponding vectors and prove that it is stochastically ordered with respect to both the sample size and G which plays the role of the uncertainty parameter. We prove that asymptotically, we obtain all, and only, Lorenz curves generated by a new, intuitive parametrisation of the finite-mean Pickands' Generalised Pareto Distribution (GPD) that unifies three other families, namely: the Pareto Type II, exponential, and scaled beta distributions. The family is not only totally ordered with respect to the parameter G, but also, thanks to our derivations, has a nice underlying interpretation. Our result may thus shed a new light on the genesis of this family of distributions.
Our model fits bibliometric, informetric, socioeconomic, and environmental data reasonably well. It is quite user-friendly for it only depends on the sample size and its Gini index.
△ Less
Submitted 15 January, 2024; v1 submitted 15 April, 2023;
originally announced April 2023.
-
Equivalence of inequality indices: Three dimensions of impact revisited
Authors:
Lucio Bertoli-Barsotti,
Marek Gagolewski,
Grzegorz Siudem,
Barbara Żogała-Siudem
Abstract:
Inequality is an inherent part of our lives: we see it in the distribution of incomes, talents, resources, and citations, amongst many others. Its intensity varies across different environments: from relatively evenly distributed ones, to where a small group of stakeholders controls the majority of the available resources. We would like to understand why inequality naturally arises as a consequenc…
▽ More
Inequality is an inherent part of our lives: we see it in the distribution of incomes, talents, resources, and citations, amongst many others. Its intensity varies across different environments: from relatively evenly distributed ones, to where a small group of stakeholders controls the majority of the available resources. We would like to understand why inequality naturally arises as a consequence of the natural evolution of any system. Studying simple mathematical models governed by intuitive assumptions can bring many insights into this problem. In particular, we recently observed (Siudem et al., PNAS 117:13896-13900, 2020) that impact distribution might be modelled accurately by a time-dependent agent-based model involving a mixture of the rich-get-richer and sheer chance components. Here we point out its relationship to an iterative process that generates rank distributions of any length and a predefined level of inequality, as measured by the Gini index.
Many indices quantifying the degree of inequality have been proposed. Which of them is the most informative? We show that, under our model, indices such as the Bonferroni, De Vergottini, and Hoover ones are equivalent. Given one of them, we can recreate the value of any other measure using the derived functional relationships. Also, thanks to the obtained formulae, we can understand how they depend on the sample size. An empirical analysis of a large sample of citation records in economics (RePEc) as well as countrywise family income data, confirms our theoretical observations. Therefore, we can safely and effectively remain faithful to the simplest measure: the Gini index.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Community detection in complex networks via node similarity, graph representation learning, and hierarchical clustering
Authors:
Łukasz Brzozowski,
Grzegorz Siudem,
Marek Gagolewski
Abstract:
Community detection is a critical challenge in analysing real graphs, including social, transportation, citation, cybersecurity, and many other networks. This article proposes three new, general, hierarchical frameworks to deal with this task. The introduced approach supports various linkage-based clustering algorithms, vertex proximity matrices, and graph representation learning models. We compar…
▽ More
Community detection is a critical challenge in analysing real graphs, including social, transportation, citation, cybersecurity, and many other networks. This article proposes three new, general, hierarchical frameworks to deal with this task. The introduced approach supports various linkage-based clustering algorithms, vertex proximity matrices, and graph representation learning models. We compare over a hundred module combinations on the Stochastic Block Model graphs and real-life datasets. We observe that our best pipelines (Wasserman-Faust and the mutual information-based PPMI proximity, as well as the deep learning-based DNGR representations) perform competitively to the state-of-the-art Leiden and Louvain algorithms. At the same time, unlike the latter, they remain hierarchical. Thus, they output a series of nested partitions of all possible cardinalities which are compatible with each other. This feature is crucial when the number of correct partitions is unknown in advance.
△ Less
Submitted 23 May, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Power Laws, the Price Model, and the Pareto type-2 Distribution
Authors:
Grzegorz Siudem,
Przemysław Nowak,
Marek Gagolewski
Abstract:
We consider a version of D. Price's model for the growth of a bibliographic network, where in each iteration a constant number of citations is randomly allocated according to a weighted combination of accidental (uniformly distributed) and preferential (rich-get-richer) rules. Instead of relying on the typical master equation approach, we formulate and solve this problem in terms of the rank-size…
▽ More
We consider a version of D. Price's model for the growth of a bibliographic network, where in each iteration a constant number of citations is randomly allocated according to a weighted combination of accidental (uniformly distributed) and preferential (rich-get-richer) rules. Instead of relying on the typical master equation approach, we formulate and solve this problem in terms of the rank-size distribution. We show that, asymptotically, such a process leads to a Pareto-type 2 distribution with an appealingly interpretable parametrisation. We prove that the solution to the Price model expressed in terms of the rank-size distribution coincides with the expected values of order statistics in an independent Paretian sample. We study the bias and the mean squared error of three well-behaving estimators of the underlying model parameters. An empirical analysis of a large repository of academic papers yields a good fit not only in the tail of the distribution (as it is usually the case in the power law-like framework), but also across the whole domain. Interestingly, the estimated models indicate higher degree of preferentially attached citations and smaller share of randomness than previous studies.
△ Less
Submitted 23 August, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Modelling railway delay propagation as diffusion-like spreading
Authors:
Mark M. Dekker,
Alexey N. Medvedev,
Jan Rombouts,
Grzegorz Siudem,
Liubov Tupikina
Abstract:
Railway systems form an important means of transport across the world. However, congestions or disruptions may significantly decrease these systems' efficiencies, making predicting and understanding the resulting train delays a priority for railway organisations. Delays are studied in a wide variety of models, which usually simulate trains as discrete agents carrying delays. In contrast, in this p…
▽ More
Railway systems form an important means of transport across the world. However, congestions or disruptions may significantly decrease these systems' efficiencies, making predicting and understanding the resulting train delays a priority for railway organisations. Delays are studied in a wide variety of models, which usually simulate trains as discrete agents carrying delays. In contrast, in this paper, we define a novel model for studying delays, where they spread across the railway network via a diffusion-like process. This type of modelling has various advantages such as quick computation and ease of applying various statistical tools like spectral methods, but it also comes with limitations related to the directional and discrete nature of delays and the trains carrying them. We apply the model to the Belgian railways and study its performance in simulating the delay propagation in severely disrupted railway situations. In particular, we discuss the role of spatial aggregation by proposing to cluster the Belgian railway system into sets of stations and adapt the model accordingly. We find that such aggregation significantly increases the model's performance. For some particular situations, a non-trivial optimal level of spatial resolution is found on which the model performs best. Our results show the potential of this type of delay modelling to understand large-scale properties of railway systems.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Combinatorial origins of the canonical ensemble
Authors:
Kornelia Ufniarz,
Grzegorz Siudem
Abstract:
The Darwin-Fowler method in combination with the steepest descent approach is a common tool in the asymptotic description of many models arising from statistical physics. In this work, we focus rather on the non-asymptotic behavior of the Darwin-Fowler procedure. By using a combinatorial approach based on Bell polynomials, we solve it exactly. Due to that approach, we also show relationships of ty…
▽ More
The Darwin-Fowler method in combination with the steepest descent approach is a common tool in the asymptotic description of many models arising from statistical physics. In this work, we focus rather on the non-asymptotic behavior of the Darwin-Fowler procedure. By using a combinatorial approach based on Bell polynomials, we solve it exactly. Due to that approach, we also show relationships of typical models with combinatorial Lah and Stirling numbers.
△ Less
Submitted 1 August, 2020;
originally announced August 2020.
-
Bell polynomials in the series expansions of the Ising model
Authors:
Grzegorz Siudem,
Agata Fronczak
Abstract:
Through applying Bell polynomials to the integral representation of the free energy of the Ising model for the triangular and hexagonal lattices we obtain the exact combinatorial formulas for the number of spin configurations at a given energy (i.e. low-temperature series expansion of the partition function or, alternatively, the number of states). We also generalize this approach to the wider cla…
▽ More
Through applying Bell polynomials to the integral representation of the free energy of the Ising model for the triangular and hexagonal lattices we obtain the exact combinatorial formulas for the number of spin configurations at a given energy (i.e. low-temperature series expansion of the partition function or, alternatively, the number of states). We also generalize this approach to the wider class of the (chequered) Utiyama graphs. Apart from the presented exact formulas, our technique allows one to establish the correspondence between the perfect gas of clusters and the Ising model on the lattices which have positive coefficients in the low-temperature expansion (e.g. square lattice, hexagonal lattice). However it is not always the case -- we present that for the triangular lattice the coefficients could be negative and the perfect gas of clusters interpretation is problematic.
△ Less
Submitted 31 July, 2020;
originally announced July 2020.
-
Partial equivalence of statistical ensembles in a simple spin model with discontinuous phase transitions
Authors:
Agata Fronczak,
Piotr Fronczak,
Grzegorz Siudem
Abstract:
In this paper, we draw attention to the problem of phase transitions in systems with locally affine microcanonical entropy, in which partial equivalence of (microcanonical and canonical) ensembles is observed. We focus on a very simple spin model, that was shown to be an equilibrium statistical mechanics representation of the biased random walk. The model exhibits interesting discontinuous phase t…
▽ More
In this paper, we draw attention to the problem of phase transitions in systems with locally affine microcanonical entropy, in which partial equivalence of (microcanonical and canonical) ensembles is observed. We focus on a very simple spin model, that was shown to be an equilibrium statistical mechanics representation of the biased random walk. The model exhibits interesting discontinuous phase transitions that are simultaneously observed in the microcanonical, canonical, and grand canonical ensemble, although in each of these ensembles the transition occurs in a slightly different way. The differences are related to fluctuations accompanying the discontinuous change of the number of positive spins. In the microcanonical ensemble, there is no fluctuation at all. In the canonical ensemble, one observes power-law fluctuations, which are, however, size-dependent and disappear in the thermodynamic limit. Finally, in the grand canonical ensemble, the discontinuous transition is of mixed-order (hybrid) kind with diverging (critical-like) fluctuations. In general, this paper consists of many small results, which together make up an interesting example of phase transitions that are not covered by the known classifications of these phenomena.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
Area-width scaling in generalised Motzkin paths
Authors:
Nils Haug,
Thomas Prellberg,
Grzegorz Siudem
Abstract:
We consider a generalised version of Motzkin paths, where horizontal steps have length $\ell$, with $\ell$ being a fixed positive integer. We first give the general functional equation for the area-length generating function of this model. Using a heuristic ansatz, we derive the area-length scaling behaviour in terms of a scaling function in one variable for the special cases of Dyck, (standard) M…
▽ More
We consider a generalised version of Motzkin paths, where horizontal steps have length $\ell$, with $\ell$ being a fixed positive integer. We first give the general functional equation for the area-length generating function of this model. Using a heuristic ansatz, we derive the area-length scaling behaviour in terms of a scaling function in one variable for the special cases of Dyck, (standard) Motzkin and Schröder paths, before generalising our approach to arbitrary $\ell$. We then derive an expression for the generating function of Schröder paths and analyse the scaling behaviour of this function rigorously in the vicinity of the tri-critical point of the model by applying the method of steepest descents for the case of two coalescing saddle points. Our results show that for Dyck and Schröder paths, the heuristic scaling ansatz reproduces the rigorous results.
△ Less
Submitted 3 May, 2017; v1 submitted 31 May, 2016;
originally announced May 2016.
-
Agent-based model for the h-index - Exact solution
Authors:
Barbara Żogała-Siudem,
Grzegorz Siudem,
Anna Cena,
Marek Gagolewski
Abstract:
The Hirsch's $h$-index is perhaps the most popular citation-based measure of the scientific excellence. In 2013 G. Ionescu and B. Chopard proposed an agent-based model for this index to describe a publications and citations generation process in an abstract scientific community. With such an approach one can simulate a single scientist's activity, and by extension investigate the whole community o…
▽ More
The Hirsch's $h$-index is perhaps the most popular citation-based measure of the scientific excellence. In 2013 G. Ionescu and B. Chopard proposed an agent-based model for this index to describe a publications and citations generation process in an abstract scientific community. With such an approach one can simulate a single scientist's activity, and by extension investigate the whole community of researchers. Even though this approach predicts quite well the $h$-index from bibliometric data, only a solution based on simulations was given. In this paper, we complete their results with exact, analytic formulas. What is more, due to our exact solution we are able to simplify the Ionescu-Chopard model which allows us to obtain a compact formula for $h$-index. Moreover, a simulation study designed to compare both, approximated and exact, solutions is included. The last part of this paper presents evaluation of the obtained results on a real-word data set.
△ Less
Submitted 23 November, 2015; v1 submitted 18 September, 2015;
originally announced September 2015.
-
Exact low-temperature series expansion for the partition function of the two-dimensional zero-field s=1/2 Ising model on the infinite square lattice
Authors:
Grzegorz Siudem,
Agata Fronczak,
Piotr Fronczak
Abstract:
In this paper, we provide the exact expression for the coefficients in the low-temperature series expansion of the partition function of the two-dimensional Ising model on the infinite square lattice. This is equivalent to exact determination of the number of spin configurations at a given energy. With these coefficients, we show that the ferromagnetic--to--paramagnetic phase transition in the squ…
▽ More
In this paper, we provide the exact expression for the coefficients in the low-temperature series expansion of the partition function of the two-dimensional Ising model on the infinite square lattice. This is equivalent to exact determination of the number of spin configurations at a given energy. With these coefficients, we show that the ferromagnetic--to--paramagnetic phase transition in the square lattice Ising model can be explained through equivalence between the model and the perfect gas of energy clusters model, in which the passage through the critical point is related to the complete change in the thermodynamic preferences on the size of clusters. The combinatorial approach reported in this article is very general and can be easily applied to other lattice models.
△ Less
Submitted 14 April, 2015; v1 submitted 29 October, 2014;
originally announced October 2014.
-
Diffusion on hierarchical systems of weakly-coupled networks
Authors:
Grzegorz Siudem,
Janusz A. Hołyst
Abstract:
We analyse diffusion dynamics on weakly-coupled networks (interconnected networks) by means of separation of time scales. Using an adiabatic approximation we reduced the system dynamics to a Markov chain with aggregated variables and derived a transport equation that is analogous to Fick's First Law and includes a driving force. Entropy production is a sum of microscopic entropy transport, which r…
▽ More
We analyse diffusion dynamics on weakly-coupled networks (interconnected networks) by means of separation of time scales. Using an adiabatic approximation we reduced the system dynamics to a Markov chain with aggregated variables and derived a transport equation that is analogous to Fick's First Law and includes a driving force. Entropy production is a sum of microscopic entropy transport, which results from the particle's migration between networks of different topologies and macroscopic entropy production of the Markov chain. Equilibrium particles partition between different sub-networks depends only on internal sub-network parameters. Our framework, confirmed by numerical simulations, is also useful for considering diffusion in nested systems corresponding to hierarchical networks with several different time scales thus it can serve to uncover hidden hierarchy levels from observations of diffusion processes.
△ Less
Submitted 13 December, 2018; v1 submitted 10 March, 2013;
originally announced March 2013.
-
Analytical approach to model of scientific revolutions
Authors:
Paweł Kondratiuk,
Grzegorz Siudem,
Janusz A. Hołyst
Abstract:
The model of scientific paradigms spreading throughout the community of agents with memory is analyzed using the master equation. The case of two competing ideas is considered for various networks of interactions, including agents placed at Erdős-Rényi graphs or complete graphs. The pace of adopting a new idea by a community is analyzed, along with the distribution of periods after which a new ide…
▽ More
The model of scientific paradigms spreading throughout the community of agents with memory is analyzed using the master equation. The case of two competing ideas is considered for various networks of interactions, including agents placed at Erdős-Rényi graphs or complete graphs. The pace of adopting a new idea by a community is analyzed, along with the distribution of periods after which a new idea replaces the old one. The approach is extended for the chain topology onto the more general case when more than two ideas compete. Our analytical results are in agreement with numerical simulations.
△ Less
Submitted 8 September, 2011; v1 submitted 2 June, 2011;
originally announced June 2011.
-
External bias in the model of isolation of communities
Authors:
Julian Sienkiewicz,
Grzegorz Siudem,
Janusz A. Holyst
Abstract:
We extend a model of community isolation in the d-dimensional lattice onto the case with an imposed imbalance between birth rates of competing communities. We give analytical and numerical evidences that in the asymmetric two-specie model there exists a well defined value of the asymmetry parameter when the emergence of the isolated (blocked) subgroups is the fastest, i.e. the characteristic time…
▽ More
We extend a model of community isolation in the d-dimensional lattice onto the case with an imposed imbalance between birth rates of competing communities. We give analytical and numerical evidences that in the asymmetric two-specie model there exists a well defined value of the asymmetry parameter when the emergence of the isolated (blocked) subgroups is the fastest, i.e. the characteristic time tc is minimal. This critical value of the parameter depends only on the lattice dimensionality and is independent from the system size. Similar phenomenon was observed in the multi-specie case with a geometric distribution of the birth rates. We also show that blocked subgroups in the multi-specie case are absent or very rare when either there is a strictly dominant specie that outnumbers the others or when there is a large diversity of species. The number of blocked species of different kinds decreases with the dimension of the multi-specie system.
△ Less
Submitted 23 August, 2010;
originally announced August 2010.