Economics
A simple method for estimating the Lorenz curve
T. Sitthiyot and K. Holasut
The study addresses the challenge of estimating the Lorenz curve, especially when only grouped or summary statistics are available rather than microdata. It highlights that interpolation underestimates inequality and no single income distribution fits the entire range, motivating parametric Lorenz curve approaches. The research question is whether one can reverse-engineer a parametric Lorenz curve directly from observed summary indicators—specifically the Gini index and selected tail income shares—without error minimization, and achieve practical accuracy across diverse countries. The purpose is to introduce a simple, computationally convenient method that uses the observed Gini index and income shares of the bottom and top population groups to estimate a parametric Lorenz curve that possesses a closed-form Gini expression. This is important for contexts with limited data availability and for broad applications across disciplines that use Lorenz curves and Gini indices.
Three broad approaches exist for estimating Lorenz curves from grouped data: (1) interpolation, which assumes within-group homogeneity and tends to underestimate inequality; (2) assuming an income distribution (e.g., power-law in the upper tail; gamma or log-normal for the lower tail), which risks misspecification; and (3) specifying parametric Lorenz functional forms. Numerous functional forms have been proposed (e.g., Kakwani and Podder, Kakwani, Rasche et al., Aggarwal, Gupta, Arnold, Rao and Tam, Villaseñor and Arnold, Basmann et al., Ortega et al., Chotikapanich, Ogwang and Rao, Ryu and Slottje, Sarabia and various coauthors, Rohde, Helene, Wang and Smyth, Fellman, Tanak et al., Sitthiyot et al.). Many widely used forms lack closed-form Gini expressions (requiring beta or confluent hypergeometric functions), complicating computation. Even when explicit Gini expressions exist, prior work typically estimates functional-form parameters first and then derives the Gini, rather than starting from the observed Gini to infer parameters. Kakwani (1980) has been reported as a best performer for fit in prior comparisons, but it requires special functions for Gini. The gap identified is the absence of methods that exploit observed Gini to directly estimate the parameters of a Lorenz functional form and recover the curve without optimization.
The paper proposes a two-parameter Lorenz functional form constructed as a weighted average of two well-known Lorenz curve components: a convex power function (e.g., y=x^2) and the Pareto-implied Lorenz component y=1−(1−x)^P. The combined specification is controlled by parameters P (inequality level) and k (weight shaping curvature), with 0≤k≤1 and P≥1. A key property is that the Gini index derived from this specification has a closed-form expression depending only on P, yielding Gini = (P−1)/(P+1) and thus P = (1+Gini)/(1−Gini). Therefore, P can be computed directly from the observed Gini. To identify k, the method uses the observed income shares of the bottom m% (Bm) and top m% (Tm), forming the ratio Rm = Bm/Tm. Under the specified functional form, this ratio can be expressed in terms of k and known quantities a=m^2, b=1−m, c=n, d=1−n with n=1−m, leading to a closed-form solution for k: k = (a − R + c·R)/(c·R − m·R·m + d·R·m + a + b − 1). With P (from the Gini) and k (from tail share ratio), the entire Lorenz curve is retrieved without any error minimization or specialized software. When more grouped data are available (e.g., decile shares), the same functional form can also be fitted via least-squares minimization to estimate P and k directly, and then compute the Gini via its closed form. Model performance is evaluated using R^2, MSE, MAE, MAS, IIM, and the Kolmogorov–Smirnov test comparing estimated and actual decile shares. The method is demonstrated on four countries (Malta 2018, Taiwan 2016, USA 2016, Côte d'Ivoire 2015) using UNU-WIID data, with two variants of the simple method: using bottom/top 10% shares and using bottom/top 5% shares.
- Simple method with bottom/top 10% shares: Estimated Lorenz curves fit well. Reported R^2 values: Malta 0.9970 (P=1.81, k=0.47), Taiwan 0.9929 (P=1.92, k=0.49), USA 0.9713 (P=2.40, k=0.31), Côte d'Ivoire 0.9095 (P=3.88, k=0.22). Goodness-of-fit measures (MSE, MAE, MAS, IIM) are small; K-S tests show no significant differences (p-values 0.975–1.000). Lower inequality (lower Gini) is associated with better fit.
- Simple method with bottom/top 5% shares: Also performs well but slightly worse than the 10% case, especially for high-inequality contexts. R^2 ranges from 0.8520 (Côte d'Ivoire) to 0.9980 (Malta). K-S tests again indicate no significant differences (p-values 0.975–1.000).
- Error minimization using all decile shares: The specified functional form achieves near-perfect fits: R^2 between 0.9995 and 1.0000. Estimated decile shares closely match actual; K-S test p=1.000 in all cases. Estimated Gini indices are virtually identical to observed values (e.g., Malta observed 0.287 vs estimated 0.287; Taiwan 0.315 vs 0.316; USA 0.411 vs 0.411; Côte d'Ivoire 0.590 vs 0.589).
- Comparison with Kakwani (1980): Kakwani’s form yields slightly better goodness-of-fit statistics (lower MSE/MAE/MAS/IIM) for decile shares, but the proposed alternative provides Gini estimates closer to observed values (e.g., Malta: observed 0.287, alternative 0.287 vs Kakwani 0.281; Côte d'Ivoire: observed 0.590, alternative 0.589 vs Kakwani 0.569). The proposed form is also more parsimonious (2 parameters vs 3) and offers a closed-form Gini, avoiding special functions or numerical integration.
- Practical insight: The simple method tends to perform better with bottom/top 10% shares than with 5% shares, and performance improves as inequality decreases.
The findings demonstrate that a Lorenz curve can be accurately recovered from minimal information—an observed Gini index and two tail income shares—by leveraging a parametric specification with a closed-form Gini. This addresses the practical challenge of limited data availability and the theoretical gap of reverse-engineering parametric Lorenz curves from the Gini. The method produces close approximations across countries with diverse inequality levels and contexts, with stronger performance when inequality is moderate and when using bottom/top 10% shares rather than more extreme tails. When richer grouped data are available, directly fitting the same functional form yields near-perfect reconstruction of the observed decile shares and Gini. Compared with the widely regarded Kakwani (1980) functional form, the proposed specification yields comparable decile-share fits and superior alignment with observed Gini values, while being computationally simpler and more parsimonious. These results support the method’s relevance for policy analysis and cross-disciplinary applications where Lorenz curves and Gini indices are employed, particularly in low-data settings.
The paper introduces a simple, computationally convenient method to estimate Lorenz curves using only three indicators: the observed Gini index, and the income shares of the bottom and top population groups. The underlying parametric functional form—a weighted average of a convex power function and a Pareto-implied Lorenz component—admits a closed-form Gini, enabling direct recovery of the inequality parameter P from the observed Gini and calculation of the weight k from tail share ratios. Empirically, the method fits decile shares well across four countries and performs better with bottom/top 10% shares than with 5% shares, especially at lower inequality. With full decile data, direct fitting of the same functional form yields near-perfect fits and Gini values essentially identical to observed. Compared with Kakwani (1980), the proposed form is more parsimonious, computationally simpler for Gini evaluation, and yields Gini estimates closer to observations while achieving comparable goodness-of-fit to decile shares. Future work could extend the approach to other datasets and contexts, assess robustness to different tail share choices (m), and explore applications in other scientific domains where Lorenz/Gini are used.
The simple method’s accuracy depends on the degree of inequality and the informativeness of tail shares. Using only extreme tails (e.g., bottom/top 5%) in high-inequality contexts (e.g., Côte d'Ivoire, Gini 0.590) yields a cruder approximation. The method relies on grouped data and assumes the specified functional form adequately represents the true Lorenz curve; misspecification may affect fit at distribution tails. Performance improves when more grouped data are available or when using less extreme tail shares (e.g., 10%).
Related Publications
Explore these studies to deepen your understanding of the subject.

