In my lastest paper "Gender Gap Through Time and Space: A Journey Through Wikipedia Biographies and the 'WIGI' Index" (blog post and on arxiv.org), my co-author Piotr Konieczny and I proposed a gender index. WIGI, the Wikipedia Gender Inequality Index, is composed of many indicators, but one in particular, the "nation-WIGI", was designed to be comparable with other well-known indices. The nation-WIGI ranks each nation by the ratio of female biography articles who are citizens of that nation. Designed in this way it is possible to correlate WIGI to other indexes. And potentially, we thought, given enough indexes and with high enough correlations, we could get a sense for what WIGI is measuring in terms of other indices.
Due to word-count limits, we were unable to submit this research question with the rest of the paper, so it is included here. Formally we formulated is thus:
First let's recap the four other nation divided indices we are inspecting (see section 3 of our paper for more detail).
With each of the above four foreign indices we have a ranking associating a nation (sometimes referred to as an economy) and an ordinal position. We would like to understand how close two indices are, for which we use the Spearman rank correlation coefficient. Two other technical points to be addressed are that we must use the intersection of nations covered by each index to avoid missing data problems. And lastly, we compute a calibration step to find the start decade of Wikidata-data that maximises the correlation in question.
The full source code of this calculation is available on github. Also as an aside, I have another blog post on an functional-programming solution to joining many dataframes at once, that was useful in computing these results.
Finally we produced a comparison table of indices, their correlation, the correlation significance, and the maximizing start decade. We present it ordered by correlation:
National-WIGI compared to Alternative Indexes