logo
ResearchBunny Logo
A simple method for estimating the Lorenz curve

Economics

A simple method for estimating the Lorenz curve

T. Sitthiyot and K. Holasut

This paper unveils an innovative method for estimating the Lorenz curve using the Gini index and income shares, demonstrating effectiveness across diverse countries. Conducted by Thitithep Sitthiyot and Kanyarat Holasut, this research highlights a straightforward technique that circumvents traditional error minimization approaches, making it especially valuable for limited income distribution data.

00:00
00:00
Playback language: English
Introduction
The Lorenz curve, introduced by Max O. Lorenz in 1905, graphically represents the relationship between cumulative population rank (poorest to richest) and cumulative wealth held. For over a century, it's been crucial for illustrating income and wealth distribution and inequality. While ideal data would include individual income records, this is often unavailable, leading researchers to rely on summary statistics from databases like the UNU-WIID, PovcalNet, and WID. These databases typically provide grouped income data, presenting sparse Lorenz curve points. Existing methods for estimating the Lorenz curve from such grouped data include interpolation techniques, assuming specific income distributions, or using parametric functional forms. Interpolation methods, however, underestimate inequality. Similarly, no single statistical distribution perfectly fits the entire income distribution; while power-law distributions may well describe the upper tail, the lower tail remains a subject of debate (gamma vs. log-normal distributions). Misspecification can significantly affect estimates of income shares and inequality measures. Numerous studies have proposed parametric functional forms to estimate the Lorenz curve directly; however, many lack closed-form expressions for the Gini index, making computation complex. Existing forms with closed-form solutions often don't easily allow for reverse engineering from the observed Gini index to estimate parameters. This paper addresses these limitations by introducing a simple, straightforward method that utilizes readily available data (the Gini index, bottom income share, and top income share) to estimate Lorenz curves. The method uses a weighted average of two well-known functional forms, offering a simple yet robust approach, especially when data is scarce.
Literature Review
The paper reviews a significant body of literature on parametric functional forms for estimating the Lorenz curve, citing works by Kakwani and Podder (1973, 1976), Kakwani (1980), Rasche et al. (1980), and many others. The authors highlight the limitations of many widely used forms, particularly the lack of closed-form expressions for the Gini index, which requires computation of complex functions like the beta function or confluent hypergeometric function. They also note that while some existing forms do have closed-form Gini index solutions, the reverse process – estimating parameters from an observed Gini index – hasn't been explored. The study acknowledges the use of linear convex combinations of functional forms to address the limitations of individual forms, referencing work by Sarabia (1997) and Ogwang and Rao (2000). The authors emphasize the comparative advantage of their proposed method, highlighting its simplicity and the fact that it uses easily accessible summary statistics for estimating the Lorenz curve.
Methodology
The authors propose a new functional form for the Lorenz curve, derived from a weighted average of two established forms: the exponential function (y(x) = x²) and the Pareto distribution implied form (y(x) = 1 - (1-x)P). The weighted average is given by: y(x) = (1-k)x² + k(1-(1-x)P), where 0 ≤ k ≤ 1 and 1 ≤ P. This form satisfies Lorenz curve conditions: y(0) = 0, y(1) = 1, convexity, and non-negativity of the first and second derivatives. The parameter P reflects the income distribution inequality measured by the Gini index, while k determines the curve's curvature, allowing adjustment without changing the Gini index. The authors derive closed-form expressions for the area under the Lorenz curve and the Gini index: Area = P/(P+1) and Gini = 1 - 2/(P+1). To estimate parameters, the authors use the observed Gini index (from UNU-WIID) and the income shares of the bottom (Bm) and top (Tm) m% of the population. The ratio Bm/Tm is expressed in terms of k and P. This allows the calculation of k, given the observed Gini index, Bm and Tm. With P and k calculated, the entire Lorenz curve is estimated. The methodology highlights the simplicity of their method, requiring no complex error minimization techniques or software packages unlike other methods. To validate the method, the study uses data on the Gini index and income shares (both 10% and 5%) from the UNU-WIID for four countries: Malta, Taiwan, USA, and Côte d'Ivoire, chosen for their differing income inequality levels, and socioeconomic backgrounds. The study uses goodness-of-fit statistics (R², MSE, MAE, MAS, IIM) and the Kolmogorov-Smirnov test (K-S test) to assess the performance of their model in estimating decile income shares. In addition to the simple method, the authors employ curve fitting (minimizing sum of squared errors) using all available grouped data on decile income shares to directly estimate Lorenz curves and compare results with their simpler method and with Kakwani's (1980) model.
Key Findings
The study presents findings from two estimation methods: the simple method using the Gini index and income shares of either the bottom/top 10% or 5%, and the error minimization method using all available decile income share data. **Simple Method (10% and 5%):** Using the Gini index and the bottom/top 10% income shares, the estimated Lorenz curves showed good fits to actual observations (R² between 0.9095 and 0.9970 for 10%, 0.8520 and 0.9980 for 5%), with higher R² values for countries with lower income inequality. Goodness-of-fit statistics (MSE, MAE, MAS, IIM) and K-S tests confirmed no significant differences between estimated and actual decile income shares (p-values between 0.975 and 1.000). The simple method using 10% shares slightly outperformed the 5% method, particularly for countries with high income inequality. **Error Minimization Method:** Using all decile income shares, the error minimization method yielded even better fits (R² between 0.9995 and 1.0000). Goodness-of-fit statistics and K-S tests again indicated no significant differences between estimated and actual decile income shares (p-value = 1.000 for all cases). The estimated Gini indices calculated from the fitted Lorenz curves were nearly identical to the observed Gini indices. **Comparison with Kakwani (1980):** The study compared the proposed functional form with Kakwani's (1980) functional form, considered a high-performing model. While Kakwani's model showed slightly better R² values, the goodness-of-fit statistics suggested only small differences in the accuracy of decile income share estimation. However, the proposed method's Gini index estimations were significantly closer to the observed values than those from Kakwani's model. The proposed method is also simpler and more parsimonious (two parameters vs. three).
Discussion
The findings demonstrate the efficacy of the proposed simple method and the new functional form for estimating the Lorenz curve. The simple method provides a valuable tool, particularly when data is scarce, allowing researchers to accurately estimate the Lorenz curve from just a few summary statistics (Gini index and income shares at the tails of the distribution). The results highlight that the accuracy of the simple method depends on income inequality levels and available data (10% vs. 5%). The error minimization method, using more complete data, provides even more precise estimations. The comparison with Kakwani's (1980) method underscores the robustness and parsimony of the proposed model, which provides comparable accuracy in decile income share estimation, but shows superior performance in Gini index estimation and computational simplicity. This simple method fills a gap in the literature, offering a practical and computationally efficient way to estimate Lorenz curves, especially in situations where detailed income distribution data is limited.
Conclusion
This study presents a novel simple method and functional form for estimating Lorenz curves. The method's strength lies in its simplicity and ability to provide accurate estimates using limited data (Gini index and extreme income shares). The proposed functional form offers comparable performance to existing leading models but provides advantages in terms of computational convenience and parameter parsimony. Future research could explore the method's application to other types of size distributions and the development of even more efficient estimation techniques. The method’s robustness and accessibility make it a valuable addition to income inequality research.
Limitations
While the proposed method performs well, there are limitations. The accuracy of the simple method relies on the availability and precision of the Gini index and income share data. In cases with high income inequality and only limited income share information (e.g., only the bottom and top 5%), the accuracy may be somewhat reduced. Furthermore, the study's findings are based on a relatively small sample of four countries. More extensive testing across diverse countries and contexts would further strengthen the generalizability of the method's performance. The choice of the weighting scheme for the combination of the two functional forms could be further explored and optimized.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny