Data Management: The Plan

Perhaps because it’s not something I would have done on my own, thanks to the prodding of Daniel Mietchen, I  have created a data management plan for my open-PhD adventure. What is a data management plan (DMP), you might ask? Now that I’m up to speed, I can tell you that it’s a document in which you set out the parameters for how you will create, share, and store the outcomes of a project. It’s also the sort of thing you go through in order to pose detail questions to yourself and make rigorous your otherwise slightly sloppy thinking.

Questions of the license to use for data produced were quite easy for me: I’m dedicated to using as open a license as possible while requiring attribution. Other topics however gave me pause in considering how I will handle the spew of data exhaust I produce all the time. For instance, documentation; how will I keep track of what those bits represent?

There is a large amount of hubris to sidestep with documentation. Thinking that I could keep a grand folder structure, or complete list seems like myopic optimism. I think there is some wisdom to draw from the casual obvservation that even as organizational tools online improve most coordination still gets done by plaintext emails. That’s why I have no plan to keep a megalist sort of card-catalogue of documentation, but to include it as files alongside the datae. I will aim for IPython/Jupyterized inline documentation of data handling when possible, but resorting to standard files in the directory otherwise. That is, I’ll be relying on search as a organisational principle, so the key will be making the barrier to searchable documentation as low as possible – like writing quick files.

On the question of archiving; how will I keep the data around for a long time? This posed a very difficult question for me because I wasn’t sure exactly how much longevity I want from my data. 2 years, 5 years, 20 years? Starting with needs-constraints rather than desires, I thought that storing my data should be easy and free (as in beer). That’s why I’m opting for using Github in the DMP. But there are two worries with Github, one is that it limits files to 100mb, so perhaps it’s not suitable for all possible data. The second concern is that Github is a company, like any other swashbuckler with venture-capitalism-driven bravado they could disappear easily. So then I thought that I might rely on some HTTP accessible servers at my University: no filesize limits, corporate independency, and tape backup storage. I am rather happy with those combinations, but if I wanted to invest a lot more effort for not a lot more benefit then I could nitpick at both of them being centrally managed regardless of profit motive. The only way to get around this would be to what? Create torrents of my data and seed them from personal servers. The idealist in me is tempted, but the time-scheduler sees the headache.

If you haven’t had to make a data management plan, I could understand that it seems rather abstruse and time consuming. However you should do it anyway because a) it forces you to think more closely about your data on an abstract level, and b) makes it 50% less scarier than it needs to be. Oh and in the interest of openness after all, my DMP is here: Open-PhD data management plan.pdf.

Dear NSF: Part 1

In what will surely not be the last time I ask the U.S. government for money, I made the plea to be a fellow of theirs today. The Nation Science Foundation Graduate Research Fellowship Program (NSFGRFP)  supports new PhD students in their research aims for 3 years.  And I am told, and also suppose, that I want that. The way in which they ask you to prostrate is a standard multi-essay plus recommendations mode. The essays, when viewed as “papers for which you haven’t done the work,” were useful writing and thought exercises. I am quite energized by firming up a proposal about what I might do in the next three years, even if it seems incomprehensibly difficult and moon-shot-ish  right now.

The climate code suggested that I ought to show how my research could have “multiple returns on investment.” I would like it if my application did as well. Perhaps publishing them in full here will have some good consequence in the future.


Graduate Research Plan Statement

Blindspot: A Passive Implicit Bias Test From Digital Footprint

Gender, race, sexuality, nationality, social-class, native-language, weight, etc. are causes of implicit and explicit social biases that affect human relationships, and in the worse case make life difficult for many.  For example, identical academic resumés with men’s names get offered more jobs with higher starting salaries than women’s (Moss-Racusin et al.). Similarly, physicians show a bias to attribute patient’s symptoms to coronary disease for blacks more than whites. Invetiably, social technologies transfer these biases (Friedman et al.), but their design can mitigate or exacerbate the transfer. In a negative example, photography tools have been optimized for taking pictures of white people (Dyer).  Meanwhile the internet has been a boon to the Gay Liberation movement by connecting people without fear of homophobia (Weinrich). However, the Human-Computer Interaction literature does not address these issues much (Kannabiran et al). That is a problem because social technology is an ever-growing component of our lives, and implicit biases are subtext to every transaction. Implicit bias is difficult to recognize because it cannot be seen with introspection (Kang, et al.). Without taking an implicit bias test one may remain unaware of the problem, and currently those tests require time and effort. I propose to build easier, more integrated, passive implicit bias tests by utilizing a person’s digital footprint. This proposal involves three parts; to operationalize bias from passive data, to create tools using that operationalization, and finally to evaluate those tools with a field study.

Step 1: Operationalize bias from passive data

The first issue to tackle with this general approach is to be able to compute the implicit bias of a behaviour. The state of the art is the Implicit Association Test “IAT” (Greenwald et al. 1995), an isolated test-taking activity. My plan is to improve on the test by applying the notion to a person’s passive behaviour. I ask: can we treat the time reading combined topics as a passive test? For instance, if we know the content of a browsing history, and the time spent on each article, perhaps we already have the results of an implicit bias test. To evaluate the merit of our passive test we will correlate results with the IAT.

In comparing user and global biases I have been making inroads. In a recent project I built gender inequality indexes from the biographies of all Wikipedias by nationality. One result from this is that there is a high correlation between the nationality-index and the United Nation’s inequality index. As the UN measures inequality by position of power and education levels in countries, we know that so too do Wikipedia editors. I presented this finding at OpenSym ’15 (Klein), and continuation of the work is now supported by a Wikimedia Foundation grant.

I am also experienced in finding novel measures for human behaviour. Borrowing from the economics literature I have suggested a measure of “collaborativeness” of a group of Wikipedia editors based on the articles they edit (Klein et al.). This is also an implicit bias check of sorts, it looks at the underlying propensity for users to collaborate, based on how different they are from a global norm. My familiarity with social bias data and repurposing methods will be key in building a theory of passive implicit bias.

Step 2: Create tools using that operationalization

Once I develop a way to compute an implicit bias from a set of web pages, I would build a tool to allow any web user to monitor themselves. I envision this taking the form of a browser plug-in, that will, with permission, read and track your browser history. I will use the linked open data of the web – Wikidata, Freebase etc. – which provides semantic information on the internet at large for a “ground truth”. For instance if you are reading New York Times Opinion Pages, we will infer the amount of time spent reading about U.S. prison reform (domestic-interest & Police Chief Garry McCarthy, male, age 56) vs. Canada’s Muslims in the upcoming election (foreign-interest, Muslim-interest, & political elections). Over the scale on an entire browsing history, we can sum the time spent on and between different dimensions of bias.

To quell the issue of potential privacy violations this tool will be entirely open source, something with which I’m well acquainted. For instance I created a monitoring service which watches every edit on Wikipedia in realtime for citations, pings the author of the citation, and uploads any open access articles for link reliability. Seeing the benefit in this Crossref offered financial support to expand and maintain this service (Bilder). Thus my skills in building realtime tools for social good is already advanced.

Step 3: Evaluate those tools with a field study

The last stage of this proposal would be to conduct a field study of how well this implicit bias tool works. We would seek a stratified sample to install and use the browser plugin. Next, we would monitor participants at intervals, determining if being aware of their bias scores affects their future bias score. We must control for factors such as having browsing history monitored, and the frequency of seeing implicit bias scores. To test the method’s efficacy we would correlate results with standard implicit bias scores. I am at the beginning of my training in conducting field study. This is why I chose my advisor Dr. Haiyi Zhu, a computer scientist with ample experience in the area. At the moment we are embarking on an interview study on Couchsurfing and AirBnB, in which I am learning the interview skills and interactive methods crucial to field study.

Intellectual merit

This work put together – successful or not – would advance the fields of human-centered computer science and implicit social bias research. Our first step, to create a passive implicit bias test based on a digital footprint, will contribute by improving implicit bias measurement methods. Our tool-building will provide another return on investment by enriching the linked-open-data community with an open bias dataset, a dimension at the moment which is entirely missing. In total, the project successfully completed would mean a novel, unprecedentedly easy implicit bias check, usable by anyone.

Broader impact

This proposed work can help unlock technology’s potential to reduce implicit social bias (e.g.,  gender, race, sexuality, nationality) and equalize society. We are attempting to build a “one-click solution”, so that time and technical barriers to test one’s own implication are as low as possible. Thus the broadest swathe of society will be able look into a mirror of their own bias. Yet, another aspect of creating a more equal society lies in access to research. As someone that was an unaffiliated researcher before re-entering the academy, I know the value of Open Access work, as it was all I had for years. I have a data management plan in place to publish all papers and non-privacy-sensitive documents (including this one), under open licenses. I am already an attender of Open Knowledge Festival, and was a Featured Speaker at Wikimania 2015. By advocating for open access research I’m promoting the spread of knowledge above prestige and profit, and eventually a more egalitarian world.


Moss-Racusin, Corinne A., John F. Dovidio, Victoria L. Brescoll, Mark J. Graham, and Jo Handelsman. “Science Faculty’s Subtle Gender Biases Favor Male Students.” Proceedings of the National Dougherty Academy of Sciences 109, no. 41 (October 9, 2012): 16474–79.

Steinpreis, Rhea E., Katie A. Anders, and Dawn Ritzke. “The Impact of Gender on the Review of the Curricula Vitae of Job Applicants and Tenure Candidates: A National Empirical Study.” Sex Roles 41, no. 7–8 (October 1999):

Friedman, Batya, and Helen Nissenbaum. “Bias in Computer Systems.” ACM Trans. Inf. Syst. 14, no. 3 (July 1996)

Kannabiran, G., Jeffrey B., and Shaowen B. “How HCI Talks About Sexuality: Discursive Strategies, Blind Spots, and Opportunities for Future Research.” CHI ’11. New York, NY, USA: ACM, 2011.

Reagle, Joseph. “‘Free as in Sexist?’ Free Culture and the Gender Gap.” First Monday 18, no. 1.

Dyer, Richard. “Making ‘white’ people white.” The social shaping of technology (1999): 134-140.

Greenwald, Anthony G., Debbie E. McGhee, and L. K. “Measuring Individual Differences in Implicit Cognition: The Implicit Association Test.” Journal of Personality and Social Psychology 74, no. 6 (1998): 1464–80.

Klein, Maximilian. “Wikipedia in the World of Global Gender Inequality Indices: What The Biography Gender Gap Is Measuring.” In Proceedings of the 11th International Symposium on Open Collaboration. San Francisco: ACM, 2015.

Klein, Maximilian, Thomas Maillart, and John Chuang. “The Virtuous Circle of Wikipedia: Recursive Measures of Collaboration Structures.” In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 1106–15. CSCW ’15. New York, NY, USA: ACM, 2015.

Geoffrey Bilder. “Citation Needed | Crossref Blog.”

Green, Alexander R., Dana R. Carney, Daniel J. Pallin, Long H. Ngo, et al.“Implicit Bias among Physicians and Its Prediction of Thrombolysis Decisions for Black and White Patients.” Journal of General Internal Medicine 22, no. 9


Personal, Relevant Background and Future Goals Statement

An Attack On My Belief-System

The discovery of the reality of social biases was a turning point in my life. The first time I awoke to my implicit, internalised racism a mix of discomfort and amazement overcame me. It was during a protest on the civil-rights-famous steps of Sproul Hall in Berkeley, where after a few rebel-rousing speeches a black woman came to the stage and started delivering activist poetry. I had been brought up to think of myself as not racist, and yet in a rare moment of self-awareness I saw myself dismissing her content because of how she spoke.


After the world-shattering realization of how we can hide prejudices from ourselves, subsequent prejudices came to light more rapidly. My own misogyny became very real upon a reading of favourite academic Joseph Reagle’s “Free as in Sexist,” (Reagle) a deconstruction of sexism in Open Culture. (Of course it took a man to show me that.) I lost religious dogma at the holocaust memorials in Berlin and Auschwitz, when I saw that accepting any unquestioned message is dangerous. Only last month the blog of a woman I met at a wedding introduced me to the “fat stigma” I had unwittingly been harbouring (“Talkin Reckless”). These continuous epiphanies fuel my wonder at just how many unidentified stigmas I’m still holding?


The feeling I get from a solid attack on my belief-system is so powerful that chasing after it has become the driving force in my life. Now I want it to be my career. I want to use research methods promote a more equitable society by uncovering and addressing implicit stigmas. I foresee my future as working with an organization or think-tank, likely non-profit, who also focuses on equality issues. Whether by grant or employment, it is of paramount importance that I work with a social justice-oriented team. My past experience in working in that sector has been overwhelmingly positive. Yet owing to the fact that this line of work is more self-directed and typically outside large corporate structure, advancing my career will mean I need to become more independent as researcher.


After having worked as a self-employed developer and researcher the last two years, I see graduate studies as a best path to enabling my personal development. Past experience has shown me that I need to learn a larger swathe of research methods. For my particular project – building a tool to unearth implicit bias from browsing habits – I need to learn the ways in which people are best convinced. And I am also looking for graduate school to partner me with others doing similar things. Together, mastering technical methods, delving into psychology, and working within the landscape, will give my goal of exposing implicit bias the broadest possible impact.


I learned how to both borrow and re-use theory from other fields, and to take initiative on projects during my work on “The Virtuous Circle of Wikipedia” (Klein et al.). While volunteering at the Wikipedia booth during a poster session at UC Berkeley, a conversation quickly turned into a collaboration.  An economist asked me to provide Wikipedia data for testing economic complexity theory in that domain. Being mentored by my coauthor Maillart – who would describe himself as an “econo-physicist” – I  learned what it means to generalize findings and theories across domains. That has been an inspiration, as I now see the rich potential of exploring and reading outside computer science. The story continues as I also understood more about research collaboration dynamics. When the project started slipping behind schedule, I took the initiative to create a plan to finish on time by managing my superiors, assigning them task-lists, as well as plowing on with the  analysis myself. Thus they awarded me with first-authorship on the paper. That has brought me to the next challenge of completely leading a research effort, which I think graduate school can teach me.


I am prepared to manage a team, but see my relative paucity in methods of persuasion from “Wikipedia Indicators of Gender Inequality (WIGI)” (Klein and Konieczny), a grant I won from Wikimedia Foundation. The aim of WIGI is to provide a series of inequality indexes like the United Nations Gender Inequality Index based off Wikipedia and including time, ethnicity, and occupation dimensions. When correlated to other indexes it can tell us both about the world and Wikipedia (Klein). On the back of that poster and paper, I won a grant from the Wikimedia Foundation to make the dataset available as a service. As the principal grantee, I manage a paid team of 4, which has taught me the lessons of how to lead without pushing. At first I erred in being to Laissez-faire in trusting employees to be self-directed, and later settle on a more accountable weekly-homework model.

We’ve produced a prototype, which will make it much easier for other researchers to include inequality as a dimension in their projects. Yet, even though we have made this dataset, I am still at quite a loss of how to announce it and persuasively “sell” the research – another skill that graduate school can teach me.


Having been engaged in exemplary models of collaboration during my time as Research Assistant at OCLC, and a grantee of Creative Commons, I know the value of community.

“If you want to go fast you go alone; if you want to go far you go together,” I recall the CEO Online Computer Library Center (OCLC) saying in a company address. Founded in 1967, they are ancient by tech standards, but their long-term vision taught me what it meant to me a community member. During my time there as a research assistant I found an opportunity to use their bibliographic data to enrich Wikipedia’s articles. Having racked up over 2 million edits, I published about the process with help from the tight-knit librarian community (Klein and Kyrios).  My experience from OCLC in thinking about multiple stakeholders, and industry standards, is how I came to win a grant from Creative Commons to create a plug-in to include the technology of the Learning Resource Metadata Initiative (LRMI) in MediaWiki sites. Working with the LRMI board, and MediaWiki developers closely, we came to a win-win implementation, that will make highly searchable Google results of Open Educational Resources (Campbell). It was a big lesson to work with many people, but that sometimes slower pace pays off in the longevity of results. With my graduate studies I hope to find a network of colleagues with whom “to go far.”


Keeping sight of my overall goal to spread awareness of implicit bias, it is easy to explain how the trajectory of my experience will make a broad impact in society. Being privy to your own blind-spots, is difficult, if not paradox by definition. Yet, it is that difficulty which makes it such a worthy project. In fact to the western rationalist mind, paradox is often an indicator of faulty underpinnings. Those who are least affected by implicit bias are, not by coincidence, the most privileged in society. And the most privileged in society have an opportunity to change it. As someone who, more than not, is privileged, I feel it my responsibility to use my position, potential, and energy to continue bring the issue of inequality to light. There is no need for me to translate how my work and history will broadly impact society, because my work and history is about the broad unequal impacts in society.



Reagle, Joseph. “‘Free as in Sexist?’ Free Culture and the Gender Gap.” First Monday 18, no. 1 (December 30, 2012).

“Talkin’ Reckless.” Accessed October 23, 2015.

Klein, Maximilian, Thomas Maillart, and John Chuang. “The Virtuous Circle of Wikipedia: Recursive Measures of Collaboration Structures.” In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 1106–15. CSCW ’15. New York, NY, USA: ACM, 2015. doi:10.1145/2675133.2675286.

Klein, Maximilian. “Wikipedia in the World of Global Gender Inequality Indices: What The Biography Gender Gap Is Measuring.” In Proceedings of the 11th International Symposium on Open Collaboration. San Francisco: ACM, 2015.

Klein, Maximilian, and Piotr Konieczny. “Gender Gap Through Time and Space: A Journey Through Wikipedia Biographies and the ‘WIGI’ Index.” arXiv:1502.03086 [cs], February 10, 2015.

Klein, Maximilian, and Alex Kyrios. “VIAFbot and the Integration of Library Data on Wikipedia.” The Code4Lib Journal, no. 22 (October 14, 2013).

Campbell, Lorna. “LRMI Implementation Cases Study: Untrikiwiki | Open World.” Accessed October 23, 2015.

Am I Doing My PhD In The Open?

Yesterday I clicked on a JSTOR link, and a full text PDF popped up – it wasn’t an Aaron Schwarz liberation plan. I was in the academy reading closed access research. I’m in the academy, and I’m on the precipice of taking it for granted.

Last November I asked if I should do my PhD in the open and answered in the affirmative, but at that point I hadn’t even been accepted by any PhD-granting institutions. Only one month into ensconcing myself in the GroupLens lab, at the Univ. of Minnesota, I somehow forgot about the my commitment to openness. It is time to right that.

Dropping onto the conveyor belt of academia has been surprisingly strengthening. The factor that I expected to aid my moonshot research goals – being in a group of like-minded peers – is working just as planned. I wasn’t even put off by the PhD “retreat” this week, a Chipotle-catered 4 hour meeting, wherein they told us that the name of the game was publish or perish, to work hard to aggrandize the legacies of our advisors, and that we had to keep up with CMU. I believed the faculty when they said that our success was their success and vice-versa. Perhaps I’m absorbing the Minnesota nice, or maybe it’s just that reassuring to finally be part of a team. While the scare-tactics are minimal and the amiability and cooperative nature of the group remains as authentically high as it is – I’ve found what I was looking for in a cohort.

The knock-on factor that I didn’t anticipate being as useful as it is, is being taught. Since my last employment in 2013, I’ve dedicated my efforts to self-learning teacherless autodidacticism. But now I’m re-exploring the benefits, and detriments of external tutelage. My few courses, Human-Computer Interaction Methods, Machine Learning, and Introduction to Research, and spanning the gamut of how a classroom affects knowledge.

The HCI Methods course is precisely the high-level, just-a-bit-of-guidance I was anticipating from graduate study. It has been eye-opening to receive a systematic overview of ‘methods’, of the many “Ways of Knowing in HCI,” as our textbook is titled. I’m shown exactly how to start walking, as a crawling being.

Particularly in realm of so-called qualitative research, the questions I didn’t know I had about how to do it well are rising and getting answered. One pain point is is for me to treat a quality-focus as an intellectually rigorous path. I am having difficulty with the challenge of unlearning the hard/soft, hard/easy, good/bad, false spectra and associations. I must overcome my bias, and I think that just might happen in time because if I look at empirical evidence, in the lab there is no disparaging of anyone’s work, only louder and louder applauding.

On the other hand this Machine Learning class is a curse of recidivist pedagogy. All of a sudden the pointless constructs of midterms, final exams and grace days return. The content the class is supposed to teach is perfectly delicious. The way it’s brought to the table is all the wrong parts of the confusion of education and schooling. I’m not too old for it, simply have lost all my appetite for people-pleasing. My advisor reminds that I must only appease to the minimum degree. I less and less see the point in even playing the game half-heartedly.

Upon turning in my first homework I remarked inwardly that I hadn’t cared so little about a program I’d written since I quit computer science in undergrad studies. And then I saw a theme recurring. It goes deep. I failed Math in high school, but it became my major in college. I failed computer science in college, but it became my the subject of my graduate studies. Well that’s what an administrator would say. I prefer the frame that in both cases I didn’t fail the subject, but failed to be subjected by the schooling. Today I’m facing that conundrum once again, and hope to develop a better coping mechanism than my previous pyrrhic stalemates.

My third class, which I might rename “Learn You a Research”, is a seminar of sorts, emphasising my independence as a researcher. It has the right aims of assistance, and solidifying fundamentals. The semester-long project is to write a literature review to be proud of, and that’s something that (a) I would be doing anyway, and (b) am pleased to think a bit more intentionally about. In fact, the topic of that review is due today, and I shall be submitting, in the name of an Open PhD is:

Survey Topic: Trust and Discrimination in the Sharing Economy.

Focus: My goal is to review online systems and mechanisms used for establishing trust, and using the lens of design, consider how they effect discrimination.

That needs a bit of work in it’s precise formulation, but it correctly surfaces my personal fascinations with implicit bias and peer-to-peer technologies. It is also meant to marry those ideas with those of my advisor, Haiyi Zhu, who is intriguing with the new-to-me domain of the sharing economy and a more formalized notion of trust.

Pursuing her and collaborator Mako Hill’s very simpatico agenda, we’ve set out on a project that I wouldn’t have considered on my own. It is an interview-based inquiry into hosts and guests that use both AirBnB and CouchSurfing (and other “network hospitality” sites); about when they would use one and not the other, and what that means for the co-existence of the sites, and their underlying ideologies.

This project has me hyped and nervous at the same time. Hyped because it hones on interrogating this “Capitalism kills love” lens to which I’ve been inculcated – and I’m addicted to inspecting my own assumptions. Nervous because it represents the start of my doing purely “qualitative” work (if you believe binary). The world is fresh and new – I am filing my first IRB request – which has involved learning what an IRB even is. It’s got me saying “I’ve never done that before, but I’ll do my best,” a very good phrase to be saying at this juncture.

Moreso, it has me nervous on another level, trying to conduct this research in the open. As much as I have been enjoying my new comrades, I have not heard much discussion of Open Access research. Conducting this one with one of the co-founders of Ubuntu, we will use a private git repo for collaboration – which is a start. I’m still embarrassed to make a fuss in my lab, but at the same time I want to be clear to with I’m as open-source, as open-data, open-access as possible. I still feel a fear of being a nagging, naïve newbie. That I suppose is just baby step #2, of doing my PhD in the open.

So there you have it, all I have done in my PhD so far – just my thoughts – opened. While I’m being confessional, there is one more thing. I wrote a letter which lives in my desk draw that’s titled “Failure Plan”. It says to go on a long walk, and then reinvestigate ways to try again.

OpenSym 2015 – Wikipedia in the World of Global Gender Inequality Indices

In a previous post I outlined how the process of correlating the gender bias from Wikipedia with other gender inequality indices. Tomorrow I will present a poster on the same topic at OpenSym 2015. I’ll be explicating how Wikipedia’s biographical bias is closer to the gender bias in highly-qualified jobs, than longevity. It’s part of what I’ve been discovering during my Grant with the Wikimedia Foundation. You can read more in the preprint and poster.


Wikipedia in the World of Global Gender Inequality Indices: What The Biography Gender Gap Is Measuring


OpenSym 2015 Poster

The Universal Empathy Machine: Nonviolent Communication Explained with Mathematics and Computer Science

0. The Universal Empathy Machine

Empathy is not sympathy. What’s the difference? Think of the Universal Turing Machine. It is a machine that accepts a program and data, and runs that program on that data. In this way it can simulate all programs on all data. Let us think of a human as a program and human experience as data. Sympathy then, is running your program on someone else’s data. Empathy is running their program on their data. As you can see the results of the sympathy and empathy computations are not guaranteed to be identical. In a nutshell Nonviolent Communication is about becoming the Universal Empathy Machine, to be able to emulate the architecture of an arbitrary person given an arbitrary experience.



Cover of Nonviolent Communication, replete with sunflower
Cover of Nonviolent Communication, replete with sunflower

Nonviolent Communication (abbrv’d NVC), is a theory by Marshall Rosenberg and the title of a book which has an unfortunate cover. Dressed up in a sunflower, you would associate it with self-help pseudoscience and may not allow it to surprise you. I only popped it open because a) it could be pirated on The Pirate Bay, and b) it was the reciprocating recommendation to me after I had been proselytizing my then-favourite-read to a friend, and so I felt obliged. As you can see neither of those reasons should really have you running to the library.

Its insight-olives are sparse in its ciabatta. But however rare they are, those morsel were escape plans for decade-long arguments. I felt so resourceful having a theory of dealing with conflict where I never had one before. The only problem was I couldn’t chat to my friends about it, let alone recommend it. (Update: which now turns out to be a common phenomenon). It’s not intended for anyone that would use the terms logical, or reasonable to describe themselves. They’d be seeking different analogies, examples, and want it to be quite a bit shorter.

Well this is that version, your very short introduction to Nonviolent Communication, abridged and explained through mathematics and computer science analogies. I’ll translate it into the realm of motivations, axioms, communication protocols, and finally foundational flaws.

1. The Intention of Nonviolent Communication is Connection

Any good sceptic should immediately be asking the purpose-question. What we are interested in is the family of problems characterized by the set of disharmonies, disagreements, and arguments.

Now it must be noted that nonviolent communication admits to there being intractable arguments. Not every argument is solvable, and like the halting problem, there is no way of deciding whether an argument will run forever without just trying to solve it.

The main approach we use to arguments is finding connections. A connection is relation aRb between persons a and b, not necessarily distinct, such that a and b are ready to resolve the disagreement. (What is it when one person is not ready to resolve? We address that later).

Argument resolution often never starts because it does not aim to find connection. In some cases we are talking past each other, we need to connect onto what the main topics are. In other cases we are discussing the same topic, but cannot connect onto a mutually agreeable answer, here NVC says to connect on observations and feelings and needs.

Protip: Notice connection is not always with “others”, because often we want to change abusive self-dialogue.

2. Axioms of Nonviolent communication

There are three strong axioms; sorry Wittgenstein fans.

2.1 Feelings Are Connected to Needs

We suppose a Connection map, which maps feelings to needs.

C \colon F \mapsto N

Where F is the set of all feelings, and N is the set of all needs. Note that C is not necessarily injective but is surjective. The intuition here is that whatever feelings are observed, can be map to a need – probably unmet. This need usually becomes our focus.

2.2 All Needs Matter

The set of needs and important needs are exactly equal.

\forall n \colon n \in \{ \text{needs} \} \iff n \in \{ \text{imporant needs} \}

Taking as an axiom that all needs are important needs allows participants to declare needs without fear. The Universal Empathy Machine is a system of how to accept needs as important which do not appear important, to you.

2.3 There Is Always a Choice

The empty set is not contained in the set of all choices.

S := \{ s \in \mathcal{P}(choices) \wedge s \neq \emptyset \}

Let’s consider this our “Axiom of Choice” – we always have one. NVC asks us to accept a strong theory of free will. We have a nonempty, and possibly infinite set of reactions for all interpersonal interaction.

3. Communication Protocols

In trying to connect we will have to in some way communicate with each other, let’s call this messaging. NVC says that it’s important to do this in a specific way. The messages that we pass between objects, probably humans, not necessarily distinct, are a 4-tuple containing:

messaging tuple := (observations, feelings, needs, requests)

This quadruple through the unveiling of each element, produces a flow from empiricism, to emotion, humanism, and finally to action.

Not every message needs to contain all four parts, for brevity often they can be omitted. When starting though it can be useful to be exhaustive for practice.

3.1 Observations

"Observing without Evaluation" 
~ NVC Chapter Title 3

In the canonical specification of Nonviolent Communication, this chapter is literally titled “Observing without Evaluation”. Little do they know just how apt that is. In this messaging block we are transmitting our observations rather than our opinions. The assumption NVC operates with is that when viewing the world, we sense our observable universe and then evaluate it to return opinions. But it is not clear that those opinions are useful yet, so let’s not naturally default to eager evaluation. (Well you may not, depending on how much of a functional purist you are.) Taking a cue from the lazy evaluation model, we don’t have to return opinions until they are necessary. In fact, evaluation is not necessary until we send feelings.

If you are not a fan of fixed evaluation strategies, another way to think about the observation section is it’s where we make our imports. Here we are providing the populated namespaces, libraries, Connection API and constants that we will reference in the rest in the rest of our communication. The context. Since sensory perception varies from human to human, we cannot rely on the exterior universe to be observed equally, thus we pass along our context. The point is that we do not want any of our further statements to be received ambiguously, so our definitions must be precise. We make only natural-philosophy-style remarks that equate to be exactly true. “You’re always late,” has truth value in the open interval (0,1) depending on which human is observing. “We agreed to meet at 7:30, and I saw you arrive on Monday at 7:45, and on Tuesday at 8:05” has a truth value of just 1. Now the rest of our program, or proof – whichever side of the isomorphism you prefer – can refer to lateness with no misgivings.

Exercises: Observation or Opinion?

1. “Your email signature is 41 lines long, rendering for me as over 4 screenfuls, where as your last 5 messages to the list were each less than 41 lines long.”
2. “Dante often does not wash his dishes in the hackerspace.”
3. “Allesandra told me that I was not good at identifying contrapositives.”
4. “Our group facilitator controls the meetings.”


1. This is an observation, which is entirely verifiable.
2. This is an opinion because “often” is not defined.
3. This is an observation if Allesandra literally said so, but not if Allesandra was only referring to a specific time the speaker did not identify a contrapositive, in that case the speaker would be making an evaluation.
4. This is an opinion because “controls” is open to interpretation.

Protip: when you are having difficulty finding observations to base what you want to say, and your communication is a reply it is OK, and even encouraged to literally repeat what your partners have said. $echo what-they-said. More about this in 4. Receiving section.

3.2 Feelings

There is the counter-intuitive Rosenberg law: "Expressing our vulnerabilities can help resolve conflict."

After having carefully preserved the pre-evaluation observation, it is finally time to also give the results of our evaluations – feelings. To explain what feelings are we explore a classic gotcha – psuedofeelings. Pseudofeelings unfortunately do pass duck-typing tests. They key difference, is that, with respect to the feeler, feelings are internal, and psuedofeelings are external.

Some examples of pseudofeelings are “I feel unimportant”, “I feel misunderstood”, or “I feel ignored”. Re-expressed as feelings these would be, respectively: “I feel discouraged because I observed I was not part of [important decision]”. “I feel anxious because you doing [action] doesn’t reflect that you understood me.”, “I feel hurt, because I perceive I am being ignored.”. These re-expressions take an external feeling, and talks about what external event made you feel internally. This is important, because a statement about yourself can never be blaming, and allows others to see your perspective on your observations.

Exercises: Feeling or Pseudofeeling?

1. “I feel scared when you talk about about forking”.
2. “When you don’t cite me, I feel neglected.”
3. “I’m happy that you found time to come to Wikimania.”
4. “I feel disappointed by the fact that you did not publish your dataset, because I had to recreate it.”


1. Feeling. Scared describe’s the internal state of the feeler.
2. Pseudo-feeling. Neglect is a thought about the exterior world. Feeler is probably depressed about not having their work recognized.
3. Feeling. The user happy, and said so.
4. Pseudo-feeling. Despite very clear reasoning, disappointment is not a feeling, but a pseudofeeling. User is probably feeling aggrevated because of needless extra work.

3.3 Needs

"God gave us the universal needs, man created the rest" 

We have described the outside world, and stated how we feel about it, but we’ll require one more step to expose our “Connection API”. Needs are those sufficient and necesarry condition for you life. According to NVC all humans come preloaded with immutable natural feelings which are factory defaults. Because needs are very low-level, primitive objects, it s likely that the communicators will have some of these in common. And with common needs, connection can be found.

Needs are typically very basic, like: autonomy, celebration, creativity, appreciation, love, respect, play, peace, food, rest, sex. Communicating these needs may seem weak, irrational, and impossible to admit out loud, but the whole point is to open up. This point is opensourcing ourselves to the very lowest machine-level. There are two ways in which this Richard Stallman doctrine aids us with emotion. First, the open code of ourselves is a signal for our partners to work with us, and the mystery of how we work disappears. Secondly, when we disclose our code, all bugs are shallow. It is scary that others will be delving into our innermost code, but like Heartbleed, it is the only route to long term security. Remember, in order to further our opening have the Axiom 2.2, the safety mechanism that all needs are important.

Exercises: When Are Needs Being Expressed?

  1. “I feel angry when you talk about transhumanists that way, because I am wanting respect for my own destiny and I hear your words as an insult.”
  2. “I’m discouraged because I would have liked to have progressed further in my work by now.”
  3. “I feel disappointed because you assigned yourself to those bugs, but didn’t squash them.”
  4. “I’m sad that you won’t be meeting me at the vegan restaurant for dinner because I was hoping we could chat about anarchism together.”


  1. Wanting respect for way of life is a basic need, whatever it may be.
  2. This is close enough to a need. It is implied that the need to is for the speaker to be feel fulfilment from progressing through work. This is actually an exercise verbatim out of the original NVC book.
  3. No need is being expressed here. Perhaps the speaker needs the mental comfort of having no outstanding issues, or needs the security of that comes with trustworthy friends – we don’t know and it isn’t clear.
  4. Human contact is a need. Maybe they also need a tempeh gyro.

3.4 Requests's author Max Ogden analogizes callbacks to the numbers given you at restaurants that tell waitrons what to do with your food after it has been cooked. In our case, it is more like the waitron telling you that their job has many harsh realities, which makes them feel very oppressed, crushing their need for economic freedom, and finally telling you to help them smash the capitalist wage-labour sociopolitical complex.

At last we can try to alter the world with requests. Requests are callbacks we send to our communicating partner. They indicate what and when we’d like your partner to do. Like asynchronous javascript, there are a lot of security issues. Co-communicators won’t want to run to malware. That’s why the best request-callbacks are verified non-malicious by being the conclusion of a observation-feeling-needs-request syllogism.

Protip: By Axiom 2.3 we always have a choice, and so it is impossible to be in a situation where only the other party can break a stalemate.

Issuing precise requests clarifies what we want from our partner. If it feels difficult to articulate what we want from others, that’s typically because it is not an action. NVC says that more specific actions make better requests, otherwise we’re issuing request that the receiver can’t know if they’ve done it. If we shout at our colleague that a project is behind schedule, and we know they can’t speed it up – we’re not asking them to speed it up, but merely to acknowledge our anger. In this case the call back request might be “give receipt of my frustration”. Colloquially this would be known as venting. It is nice to have no ambiguity about when the roles are just to listen, or to actually address a behavioural pattern.

Exercises: Request or No Request:

  1. “I want you to grok me.”
  2. “I’d like for you to indicate one moment in my presentation that you appreciated.”
  3. “I would like you to walk more slowly in the airport and tell me where you’re going before you walk off.”
  4. “I want you to be proud of your organizing work.”


  1. Not a request, because grok is not specific action. It could however be illustrated by asking for the receiver to paraphrase speaker. (See how to do this well in 4. Receiving)
  2. This request is asking for a something concrete, empathetic reception. These kinds of requests are made to seem ridiculous in the modern era, but that is just the long term cultural effect of “guess culture“.
  3. A little bit exasperated, but quite clear actioning in the request. This isn’t not not one of my pet peeves.
  4. How is the speaker going to know when the receiver is being proud? “I want you to tell my friends about your organizing work,” is more direct if that’s what they’re looking for.

4. Receiving

NVC’s messaging protocol is two-way, and now that we’ve covered “expressing honestly” there’s still “receiving empathetically”. Receiving empathetically can be understood as the process of parsing unstructured conversation text into the formal grammar of NVC. The conversation text is what the other person is telling you and our target grammar is the observation-feelings-needs-request 4-tuple. We are not guaranteed to get all of the components, and not in any specific order. We’ve just got a really difficult parsing problem on our hands.

The reason that people are offended when they are asked, “Did you hear me? What did I say?” is because it s actually difficult to paraphrase what may already be a hard message to hear. We will attempt to do better, to translate their communication into an NVC object. Once we can be sure we have their experience (data), and how they are dealing with that experience (program), we can become the Univeral Empathy Machine (section 0), and be fully empathetic. When done right the empathetic affect will come alive. Firstly, and very clearly their need to be just-listened-to will be fulfilled. More subtly, our partners can fine-tune their thinking by seeing how closely our Empathy Machine mirrors their Identity function.  That is, how well our reflection matches their intended expression. This lets them know if we are missing any of their points, or they haven’t stressed what they would like to.

Protip: Even if someone starts trying to brute force attack us with volume and vitriol, we can still receive empathetically. Intimidating messages are also people asking us to meet their needs. Try for instance, “It seems like you’re really angry about my deleting the private key; because you need more security about what’s happening in your life.” Likewise, if we are engaged with someone not ready to resolve, see if an application of empathy towards them helps.

Counter-intuitively this paraphrasing saves time, even though it takes time. A typical pitfall in a time-saving mentality of receiving is the bad habit of trying to short circuit the conversation by offering unsolicited advice to people. Offering unsolicited advice would be as if a parser (a) took input, (b) maybe did or didn’t parse the input, (c) did not verify the meaning of the maybe-parsed result, and then (d) returned advice based on exogenous heuristics. Returning that computation to the speaker would understandably be nonplussing if not absolutely frustrating as it is devoid of any indication that it related to what they said. Giving advice is only useful iff advice is what the speaker is asking for. By assuming they want a “fix-it” response, we are only engaging in the folly of mansplaining.

Receiving empathetically is to parse our partners messages and run it through our Universal Empathy Machine.

Exercises: Empathetic Reception (Y/N)?

  1. Person A: Counting error in Ultimate Street Fighter IV finals? How could I do something so stupid? Person B: Nobody’s perfect, don’t be too hard on yourself.
  2. Person A: You’re a delusional utopian.
    Person B: Are you feeling frustrated because you would like me to admit that there could be other ways of interpreting the Black Lives Matter movement?
  3. Person A: Oh I’m being SO BAD! I NEVER eat cupcakes! Person B: Maybe exercising more would help you.
  4. Person A: When friends of a friend of a friend join our camp without showing commitment, I feel encroached on. It’s like how fraccing companies squeeze me with anti-protest tactics. Person B: I know how you feel. I used to feel that way too.
  5. Person A: I’m unhappy with the grant’s status because you should have made more impact by now. Person B: I know you’re unhappy, but we’ve been slowed by bureaucratic process.


  1. B is giving advice to A, which is not an empathetic response. “You sound like you’re enraged by your lapse of concentration,” is more along the NVC lines.
  2. Empathetic response since B is trying to ascertain from A’s perspective why A might be lashing out.
  3. Again B is advising A, even though the tone is lighter. B might want to try and understand what feelings are behind A’s not being neutral about food.
  4. Not an empathetic response, an sympathetic response. Same data, but whose program is being applied?
  5. Trick empathy. Just saying you understand is not the same as demonstrating you understand. B left A’s comment about impact on the floor, which B could have used to empathise with.

5. Criticisms of NVC

How many different input methods can we use to write an email? Maybe with a physical keyboard, or virtual one on a phone, different auto-complete schemes, speech-to-text, and maybe we’ve even had the pleasure of tapping one out T9 stylee. Even though we may aim to transcribe the same thoughts, based on the technique used, the final text will be different. If the text of our emails are altered so too are the conversations. Now, different formats of email it will benefit specific ways of writing. We might be happy arranging dinner plans tapping on glass, but for conforming to the standards of a formal letter begs for the old clickety clack.

As input methods change a conversation, so does NVC. Since it is very literally a theory of discourse, using NVC will necessarily bring with it prediscursive bias. The format of the discussion is not variable that is discussed. But fair enough, any communication strategy would come attached with its own biases. The question then really becomes, since NVC is prescribed conversation format, which speakers does it benefit?

NVC’s founding theorist, was an American white man born in 1930 as the son of Russian Jews. What does that mean specifically for who NVC benefits in conversation? My reading turned up no mention of how Rosenberg’s personal background might affect his theorizing. In my opinion – I am myself a white man with similar citizenry and ancestry – it imports notions of classical logic, a Mazlowic need hiearchy, and western rationality.

To expand, the technique has an orderly system to follow. This system is static and procedural, where it could be more goal-directed. The ontology presupposes the universality of basic needs. This could be interpreted as the hubris of someone currently with privilege assuming that others are like them. And lastly it does not make large mention of how it would fit in a multiplicity of different communication strategies, as a pluralist might.


NVC has a grand concept, which works at times and is undermined by it’s flaws at others. It was useful for me because it was the first conflict strategy I’d heard of, and it “made sense” to me. It turned out not to be a persuasion-hack, but it did teach me the concept of empathy. Understanding empathy for the first time was truly a dose of mind expansion. I’ve kept the format of NVC’s exercises at the end of each chapter, because as contrived as it seems, the questions are hard, empathy is not intuitive, and practice is vital. It’s really more praxis than theory. In fact, practising empathy has been the inroad to new ideologies for me like: feminism, anti-racism, LGBTQ-allyship, and other social movements for which I am not the effected demographic. I hope you, person who likes mathematical analogies, can glean something studying from it too.

Notconfusing rules for conversation: 2 rules and a jumpstart.

Meeting people can be a slog. “Hello, what’s your name?”, “Where are you from?”, “What do you do?”,  “How do you yawn?”. Yawn? Sorry I was nodding off just writing about how repetitive and tiresome modern meeting and greeting can be. Owing to the way that social networks store information about us, we’re used to thinking about people in a list of attributes “forms” structure. Trans-inclusive feminism has already laid out how select-a-value gender is problematic for self-determination, and it has even subtler consequences in meeting people. We’ve come to assume the next person you meet is some combinatoric permutation of drop-down menus. How are we supposed to meet that person that is our life long friend, but at the moment is just looks like one more INTJ or Virgo?

In fact the disillusionment from these gruelling social interactions is exactly the motivation for having friends, as a commiserating shelter. How do we let humans do the human thing and wow us with their outstanding creative expression of self from the moment we first meet? I submit notconfusing’s two rules for conversation.

  1. Ask questions that reflect choices people have or could make.
  2. Ask questions that have never been asked before.

Asking questions that reflect choices or decisions is a way to understand a person’s values and principles, which is more informative than part of their current happenstance. Even though this point is supposed to cause a deeper understanding, the questions need not be heavy. “When you’re sleeping on your favourite side, are you facing towards your alarm clock?” might tell you a bit about how much someone wants to combat their own habits without asking “how cognisant are you of your habits and how do you want to combat them?” The analysis of their choices can be done together out loud or both parties can be trusted to do so internally. In either case the point is to revel in the complexity of your partner, while gifting them a bit of Rogerian psychology.

Notice that just Rule 1 by itself could still allow for a “What are your hobbies?” variant, so Rule 2 is brought in to stem the tedium. At first it might seem impossible to ask an entirely unique question to every person, but – as I will prove – there really are an infinite number of these types of questions. Here are a few strategies.

The first strategy is analogous to a infinite game I learned called “Uses for…” where you try to come up with as many uses as you can for a specific item. The example I recall reading about is a bed sheet. So let’s play: It can be used as a tablecloth, as an escape rope for climbing out windows, as a substitute for an all-white painting, as a shooting target for short-sighted people, as a stencil for papier-mâché bed etc. etc. Try and come up with 5 more.

Now apply  creative riffing to the things you notice about your partner. For instance these are the topics I brought up from the last ice-breaking conversations I’ve had: reminiscing over video rental returns (standing near a letterbox), a comparison of how different tapes will tear when you don’t have scissors (electrical taped wallet), how often I think about life from a bird’s-eye view (standing at different levels), and the history of the vulcanization of rubber (rode with a flat tyre). Going off-script and generating questions based on the partner and surroundings guarantees freshness. The way your associate engages gives you some understanding of their gestalt person-ness.
Even if you are feeling like you filled out pointless forms all day at work so that you are sapped of your free-associativity, there is always the abstraction “meta” trick. Assume that you have racked your brain, and “where are you from?” is the absolute best question you can come up with because you are only meeting people out of some hateful obligation. You can apply question-abstraction to ask them “what does a person’s answer to <absolute best question I can muster> mean about a person’s personality?” Yes, use your own staleness as weapon. Since the result of the question-abstraction is also a question, it can be infinitely applied to itself to yield infinitely many unique questions. QED. (If you think this a sad proof, then I encourage you to really try it. I imagine you’ll become loopy enough by the hypnotic repetition of speaking that your co-discusser will either join in with you in your recursion – great fun – or they will have walked away, which is just a well.)

A last technique, if you want to borrow a bit, is to use my growing list of ice-breakers.  I’ve created them as group introductions when I was facilitating Sudo Room hackerspace meetings. As they are targeted to a tech-y crowd you may need to customize a bit –  exactly the point that I’m trying to champion.

With the application of these 2 rules you begin to transgress social mores for great good. You ought to explode small talk to eschew complacency. Then you can make more and better friends. Although ironically making this kind of conversation may have effect of pinning you as a werido. Yet disobey the laziness of phone alienation as Saul Williams does in Talk to Strangers  “… that square box don’t represent the sphere that we live in. The earth is not a flat screen, I aint trying to fit in.”

List of Yoga Quotations

Here are a list of Yoga quotations that I’ve compiled from my 200-hour yoga teacher training, other classes I’ve attended, and various yoga books.

Jon Isaacs

  • “The pose begins once you want to leave it”.
  • “Who went to their first yoga class because their life was going really well‽”
  • “I had a hedgefund guy fire me once because I was talking about greed in class.”

Sean Feit

  • On taking non-harming literally, “I take an antibiotic – genocide.”

Jean Mazzei

  • “You can have peace or mind, but not peace of mind because the mind’s purpose is to think.”

Cora Wen

  • “You probably think you have a knee, there’s nothing there. There’s no knee.”
  • “The knee is the prisoner of the hip and the ankle.”

Stacey Swan

  • On being a good teacher, “It’s not about putting your foot behind your head, but keeping it out of your mouth.”
  • “The american way is ‘no pain, no gain’, but yoga is ‘no pain, no pain'”
  • “A good yoga class should be like a Seinfeld episode,” (in that is should come full circle at the end.

Karen Macklin

  • “Vinyasa can also mean how you sequence your life.”

Adrianna Webster

  • “On an inhale, breathe out”.

Leslie Kaminoff – Yoga Anatomy

  • On the spine, “The full glory of nature’s ingenuity is apparent in the human spine…From an engineering perspective it is clear that we have the smallest base of support, the highest center of gravity, and the heaviest cranium (proportional to our body weight) of any other mammal.  As the only true bipeds on the planet, we are also earth’s least mechanically stable creatures.”
  • On breathing, “The energy expended in breathing produces a shape change that lowers the pressure in the chest cavity and permits the air to be pushed into the body by the weight of the planet’s atmosphere. In other words, you create the space and the universe fills it.”
  • Paraphrase on hand balances, “4/5th of the foot is dedicated to weight-bearing  and 1/5th is dedicated to dexterity. The hand (on the other hand) is 1/5th weight-bearing, 4/5th dexterous.”

Rudolf von Laban

  • “Each bodily movement is embedded in a chain of infinite happenings from which we distinguish only the immediate steps and, occasionally, those which immediately follow… In every trace form created by the body, both infinity and eternity are hidden.”

Joel Kramer – Yoga as Self-Transformation

  • “The essence of yoga is not attainment, but how awarely you work with your limits.”
  • “If you’re running from the feeling, it’s pain.” (Otherwise it’s just intensity.)
  • “Yesterday’s Level of Flexibility”. The (unhelpful) concept which I call YLF.

Desikachar –  The Heart of Yoga

  • Yoga defined, “attempting to do something you haven’t before.”


Oh headstand, “It’s like Wu-Tang says, you gotta ‘protect ya neck.'”

On stepping onto your mat, “Let’s go for a magic carpet ride.”

Travis Judd

  • “Make a conscious choice about what kind of practitioner you want to be right now.”


I can’t recall the provenance of these quotes sadly. Let me know if you can.

  • “The idea that we are ever not moving is an illusion.”
  • “asana is a process not a product otherwise we could say ‘not in a pose’ if head isn’t touching knee, but that is false.”
  • ‘Yoga’ has the root ‘Yuj’ which is the root for the English word ‘Yoke.’
  • Like humans, “water is transparent and reflective but don’t see those properties when in motion.”
  • “If you feel like you’re being inauthentic start telling the truth.”

















WIGI, an Inspire Grantee

WIGI, the Wikipedia Gender Index, my project which looks at the gender representation in Wikipedia Biography articles, has won an Inspire Grant.

Over the last six months along with fellow Wikipedians we prototyped and extended this research into a paper Gender Gap Through Time and Space: A Journey Through Wikipedia Biographies and the ‘WIGI’ Index”. One aspect of the biography gender gap we were not able to observe however was the trend of female and nonbinary biography.  We were only ever looking at a single point in time because it’s too computationally complex to compare all the histories of the Wikipedias together at once. Now, with $22,500 and a small team, our aim is to sample this data weekly thereby gathering some longitudinal data on the way that Wikipedians are representing biographies.

Our project’s form is to create a data portal which  will display the visualisations of the state of gender in biographies. The underlying data which associates biography gender with Wikipedia language, date of birth/death, citizenship, profession, and celebrity status, will be purposefully published under an open license. We hope that other researchers can make use of this social indicator, much the in same way one can United Nation’s Gender Inequality Index.

The project is will be managed entirely on github, and should be completed in about 6 months.

It promises to be,



Asking Ever Bigger Questions With Wikidata

This is a Guest-Blog I wrote for Wikimedia Deutschland: copied here:

German summary: Maximilian Klein benutzt Wikidata als als Datenfundus für statistische Auswertungen über das Wissen der Welt. In seinem Artikel beschreibt er, wie er in Wikidata nach Antworten auf die großen Fragen sucht.

Asking Ever Bigger Questions with Wikidata

Guest post by Maximilian Klein

A New Era

Simultaneous discovery can sometimes be considered an indication for a paradigm shift in knowledge, and last month Magnus Manske and I seemed to have both had a very similar idea at the same time. Our ideas were to look at gender statistics in Wikidata and to slice them up by date of birth, citizenship, and langauge. (Magnus’ blog post, and my own.) At first it seems like quite elementary and naïve analysis, especially 14 years into Wikipedia, but only within the last year has this type of research become feasible. Like a baby taking its first steps, Wikidata and its tools ecosystem are maturing. That challenges us to creatively use the data in front of us.

Describing 5 stages of Wikidata, Markus Krötsch foresaw this analyis in his presentation at Wikimania 2014. The stages which range fromKnow to Understand are: Read, Browse, Query, Display, and Analyse (see image). Most likey you may have read Wikidata, and perhaps even have browsed with Reasonator, queried with autolist, or displayed with histropedia. I care to focus on analyse – the most understand-y of the stages. In fact the example given for analyse was my first exploration of gender and language, where I analysed the ratio of female biographies by Wikipedia Language: English and German are around 15% and Japanese, Chinese and Korean are each closer to 25%.

To do biography analysis before Wikidata was much harder. To know the gender of an article you’d resort to natural language processing or hacks like counting gendered categories and guessing based on first name. Even more, the effort had to be duplicated for each language that had to be translated. Now the promise of language-free semantic data, and tools like Wikidata Query and Wikidata Toolkit are here. The process is easier because it is more database-like; select, group by,apply, and combine.

With this new simplicity, let’s review what we have imagined so far. Here’s a non-exhaustive introduction to the state of creative question-asking so far:

Pushing Ourselves to Think Even Bigger

Can we think even bigger if we use more of the available data? Thinking about the fact that every claim may have an attached reference, Markus Krötzsch always wants to know, for a given set of claims what references must be believed in order to believe the set of claims? With that notion we could look at all the claims associated with all the items of a given language, and thus the required belief system of that langauge. At this point we could ask what are the differences in the belief systems of any two langauges?

Another way we could test the fundamental principles of knowledge and culture is to consider the chains made by the subclass of, instance of, or cause of properties. Every language is present at different links of each chain. So we can look at the differences in ways in which languages organize a hierarchy of concepts – or if they think it’s a hierarchy at all.

Much fun for logicians and epistemologists. But we can also ask more socially important questions, questions about how language and society relate. What biases do we have that we aren’t even aware of? The method, for which I’ve proposed a PhD, could be conducted as follows. We’re aware of sexism in our societies, and as you’ve seen we’ve started to build a statistical profile of how it manifests in Wikidata. Likewise we’re cognizant of racism and homophobia. We might next look at rates people appear in Wikidata by race and desire. Let’s assume we could train a model to say that these kinds of distributions are types of social biases. Next we could search every property in Wikidata to see if it indicated social bias. If successful we may find overlooked stigmas and phobias in society.

I claim that our theoretical question-answering ability has paradigmatically shifted with the growing up of Wikidata. Soon enough you won’t even need to be a sophisticated programmer to whisper your questions into the system. So next time your reading, browsing, querying or displaying Wikidata, challenge yourself to think about how to analyse it too.