Category Archives: Lesser Thoughts

Suggestions of Fake Profiles in Couchsurfing

I have been investigating profiles of users of Airbnb and Couchsurfing this year as research into personality differences between users of market- and socially-based network hospitality websites. Along the way I have uncovered some suggestive data supporting a rumor that Couchsurfing may have been manipulating the size of its user-base through fake profiles.

After I had assembled datasets of these user’s publicly viewable data, I started to take a look at the sign-up dates of each profile to gauge the ages of the user bases. In  inspecting the Couchsurfing set, I found an usual spike in sign-ups in 2013.

Conducting a web search for reasons why this would be I queried the web “what happened to Couchsurfing in 2013”.… Read the rest

3 Ways To Access Wikidata Data Until It Can Be Done Properly

Note: This post is quite old. In fact Wikidata can now be accessed “properly” via the Wikidata Query Service (WDQS). However the techniques outlined below still have their advantages.

The inaugural Wiki Research Hackathon went very well, and I’m affirmed that I feel best when I’m conducting Wiki Research. I was asked to give one of the tech talks of the day about accessing Wikidata data programmatically. Here is an outline of the talk

Purpose:

We’ll be viewing Wikidata as file in its own right for research, not as it’s canonical use case of being used in various Wikipedias.

Native format:

Wikidata is a mostly standard Mediawiki instance except that pages don’t store “Wikitext”, they store JSON blobs.… Read the rest

How A Small Bug I Wrote Started Helping Holocaust Deniers

In my early software education, I’d been taught about how untested  software could result in deadly radiation-therapy machines. But since I never planned to be in the medical devices industry, these sort of warnings didn’t apply to me – after all I was only writing Wikipedia bots. But this week I was proved wrong when another Wikipedian messaged me with a query unlike any I’d received before (empahsis mine):

Hi Max, I’ve pinged you a couple of times, but in case you’re not getting them, would you mind commenting?

It’s about an edit your bot made to Wikidata that changed the infobox of a featured article about a book about the Holocaust, Night.

Read the rest

U and Why?: Part 2: Snow Bike and the Seven Identities

While Ceephax has exposed me to a hyperactive, ultrajoyful side of myself, Why? has shown me how to cope with inexorable loss of impermanent identities. Just today I was going to go to an event I’d been looking forward to for over a year; the Stupor Bowl is a 50 mile bike ride in Minneapolis, stopping at 9 bars, and on the coldest day of the year. But after ceding a pool table yesterday to a man who my friends told me looked very much like myself, due to his moustache and cycling jacket, I became paralyzed by the self-awareness of my new identity.… Read the rest

Against the Grain: Influencing Factors of Opinion Change in Wikipedia’s Article for Deletion Process

 

This is my final project from my Machine Learning course this past semester. My collaborators and I attempted to find out when, and why users at English Wikipedia’s article for deletion forum, voted against their tendencies. That is, what makes an “deletionist” vote “keep” and when an “inclusionist” votes “delete”? In the end we found that basic machine learning techniques could not perform much better than random, but the intelligence that did emerge came from using information about group herding behaviour, and appeals to the local bureaucratic process.

Against the Grain: Influencing Factors of Opinion Change in Wikipedia’s Article for Deletion Process

Zhiyi Li, Cheng Peng, and Max Klein

Introduction

On 1 November 2015, English Wikipedia hit 5,000,000 articles; but while article creation is much celebrated, deleting an article is a lesser known process.… Read the rest

WIGI Update: Results from Usability Testing

Since the beta version of  wigi.wmflabs.org, our site dedicated to the biography gender gap on WIkipedia is progressing on the technical side, we decided  conducting usability study on to help improve interaction aspects. We conducted a usability study to find out more. It was fantastic to here people say ”this is the tool we’ve been waiting for,” and we also want to address issues that lead people to describe their experience as “burdensome”.  Thanks to Masssly for compiling the report. I repost his summary here, and the full report can be read on wikimedia commons:

Executive Summary

During the week of November 17 – December 1, 2015, WIGI was tested among participants pulled from the Wikimedia community, loosely described as anyone who edits Wikipedia-the-encyclopedia, or is a potential reader of any of its language versions.… Read the rest

Data Management: The Plan

Perhaps because it’s not something I would have done on my own, thanks to the prodding of Daniel Mietchen, I  have created a data management plan for my open-PhD adventure. What is a data management plan (DMP), you might ask? Now that I’m up to speed, I can tell you that it’s a document in which you set out the parameters for how you will create, share, and store the outcomes of a project. It’s also the sort of thing you go through in order to pose detail questions to yourself and make rigorous your otherwise slightly sloppy thinking.… Read the rest

Dear NSF: Part 1

In what will surely not be the last time I ask the U.S. government for money, I made the plea to be a fellow of theirs today. The Nation Science Foundation Graduate Research Fellowship Program (NSFGRFP)  supports new PhD students in their research aims for 3 years.  And I am told, and also suppose, that I want that. The way in which they ask you to prostrate is a standard multi-essay plus recommendations mode. The essays, when viewed as “papers for which you haven’t done the work,” were useful writing and thought exercises. I am quite energized by firming up a proposal about what I might do in the next three years, even if it seems incomprehensibly difficult and moon-shot-ish  right now.… Read the rest

Am I Doing My PhD In The Open?

Yesterday I clicked on a JSTOR link, and a full text PDF popped up – it wasn’t an Aaron Schwarz liberation plan. I was in the academy reading closed access research. I’m in the academy, and I’m on the precipice of taking it for granted.

Last November I asked if I should do my PhD in the open and answered in the affirmative, but at that point I hadn’t even been accepted by any PhD-granting institutions. Only one month into ensconcing myself in the GroupLens lab, at the Univ. of Minnesota, I somehow forgot about the my commitment to openness.… Read the rest

OpenSym 2015 – Wikipedia in the World of Global Gender Inequality Indices

In a previous post I outlined how the process of correlating the gender bias from Wikipedia with other gender inequality indices. Tomorrow I will present a poster on the same topic at OpenSym 2015. I’ll be explicating how Wikipedia’s biographical bias is closer to the gender bias in highly-qualified jobs, than longevity. It’s part of what I’ve been discovering during my Grant with the Wikimedia Foundation. You can read more in the preprint and poster.

Preprint:

Wikipedia in the World of Global Gender Inequality Indices: What The Biography Gender Gap Is Measuring

Poster:

OpenSym 2015 Poster

Read the rest