In my lastest paper “Gender Gap Through Time and Space: A Journey Through Wikipedia Biographies and the ‘WIGI’ Index” (blog post and on arxiv.org), my co-author Piotr Konieczny and I proposed a gender index. WIGI, the Wikipedia Gender Inequality Index, is composed of many indicators, but one in particular, the “nation-WIGI”, was designed to be comparable with other well-known indices. The nation-WIGI ranks each nation by the ratio of female biography articles who are citizens of that nation. Designed in this way it is possible to correlate WIGI to other indexes. And potentially, we thought, given enough indexes and with high enough correlations, we could get a sense for what WIGI is measuring in terms of other indices.
Due to word-count limits, we were unable to submit this research question with the rest of the paper, so it is included here. Formally we formulated is thus:
RQ4: Of the other Gender Indices which divide also by nation which index is Wikipedia most closely related to?
First let’s recap the four other nation divided indices we are inspecting (see section 3 of our paper for more detail).
- The UNDP’s Gender-related Development Index (GDI) introduced only in 1995.
- A gender-focused extensions of the Human Development Index. GDI’s primary focus lies in gender-gaps in life expectancy, education, and incomes.
- The Gender Equity Index (GEI) introduced by Social Watch in 2005.
- Developed to measure all situations that are unfavourable to women, it ranks countries on three dimensions: education, economic participation and empowerment.
- The Global Gender Gap Index (GGGI) developed by the World Economic Forum in 2006.
- Intended to allow comparative comparison of gender gap across different countries and years, it focuses on four areas: economic participation and opportunity, educational attainment, political empowerment and health and survival statistic.
- The Social Institutions and Gender Index (SIGI) of the OECD Development Centre from 2007.
- A composite indicator of gender equality that solely focuses on social institutions (norms, values and attitudes), as well as on the four dimensions of family code, physical integrity, ownership rights and civil liberties.
With each of the above four foreign indices we have a ranking associating a nation (sometimes referred to as an economy) and an ordinal position. We would like to understand how close two indices are, for which we use the Spearman rank correlation coefficient. Two other technical points to be addressed are that we must use the intersection of nations covered by each index to avoid missing data problems. And lastly, we compute a calibration step to find the start decade of Wikidata-data that maximises the correlation in question.
The full source code of this calculation is available on github. Also as an aside, I have another blog post on an functional-programming solution to joining many dataframes at once, that was useful in computing these results.
Finally we produced a comparison table of indices, their correlation, the correlation significance, and the maximizing start decade. We present it ordered by correlation:
National-WIGI compared to Alternative Indexes
Calibrated Start Decade
Each alternative index shows some statistically significant moderate correlation with our nation-WIGI index. This proves that the female ratio of Wikidata humans associated with a country is, at minimum, a legitimate addition to the landscape of gender inequality indexes.
Additionally, the fact that each alternative index most highly correlates when we consider only those biographies starting around 1900 is a positive sanity check for our data. Intuitively this makes sense in the light of the fact that traditional indexes talk about modern history only.
Still, what is the interpretation that our nation-WIGI is most highly correlated to GEI, and least with GDI? What do GEI and GDI measure that show what WIGI is measuring? We dig further into the methodologies of theses indices.
Social Watch’s GEI explains itself that:
“In Education, GEI looks at the gender gap in enrolment at all levels and in literacy; economic participation computes the gaps in income and employment and empowerment measures the gaps in highly qualified jobs, parliament and senior executive positions.”
And the UN’s GDI reports itself as:
“The new GDI measures gender gap in human development achievements in three basic dimensions of human development: health, measured by female and male life expectancy at birth; education, measured by female and male expected years of schooling for children and female and male mean years of schooling for adults ages 25 and older; and command over economic resources, measured by female and male estimated earned income.”
So we find that both indexes use indications connected to education and economic activity. The differing factor ultimately is that the GEI additionally measures empowerment by positions of power whereas the GDI additionally measures life expectancy. This suggests that the ratio of female biographies by nation in Wikidata are more highly correlated to women’s positions of power by country than to life expectancy by country. That, at first glance, is commensurate Wikipedia’s notability policies. Notability in Wikipedia essentially defers to inclusion or absence in the journalistic and scholarly record. That means that humans in positions of power, as GEI covers, would would tend to be in Wikipedias in greater proportion. Thinking about GDI’s life expecetancy uniqueness, one does not obviously see a strong reason that those with greater life expectancy are more covered in Wikipedia.
Clearly this is a very rough investigation, and our conclusions can only be limited. Yet we still have some evidence for Wikipedia’s notability policy effecting the gender representation. That link might be clear with some feminist reasoning, but the data also supports the notion. Surely this is a nice fact to know for those who criticize the notability inclusion as it stands.
For questions or suggestions, contact me on twitter – @notconfusing.