Category Archives: Research Notes

House Rules

Click to read full paper.

“I was on holiday, and I saw the same people advertising their room on Couchsurfing and Airbnb.” Two years ago my friend and co-author Benjamin Mako Hill pitched to me investigating the phenomenon of seeing the same exact accommodation available on both the paid site Airbnb, and the totally free site Couchsurfing. My advisor Haiyi Zhu turned to me and asked “what do you think?” I’m not sure how long I took to respond, but in that moment two memories came flooding back. The first was having Couchsurfed with someone who also on occasion Airbnb’d their room.… Read the rest

Synthetic Training Data For Rare Object Detection in Computer Vision

My internship this summer at Lawrence Livermore National Lab was nothing if not practical. The exceedingly real objectives of government science gave my first foray into deep learning gravitas and purpose.

The domain was satellite imagery, and the research question was whether computer vision could be trained to identify objects for which we have no training data. This presents a problem for the family of convolutional neural network algorithms (CNNs) which on the contrary require lots examples to learn from. The idea that I implemented to address this conundrum was to “synthetically” manufacture training data with renders from CAD software, and then test its performance on “natural” (i.e.… Read the rest

So you want to upload an image to the cloud with Node.js

So you want to upload an image to the cloud with Node.js?

Maybe you want a small raspberry pi webcam to take timelapse footage and send it to a server every hour because of its small harddrive. Maybe you want to build a social network swapping images of Lizard People, and your sever can t handle all the image traffic. Maybe you want to back-up your irreplaceable collection of dead-sea scroll fragments — it’s irreplaceable. You  might want to keep around images or files for many different reasons, and having them publicly accessible in the cloud is better than trying to manage them yourself, for storage and network reasons. … Read the rest

Suggestions of Fake Profiles in Couchsurfing

I have been investigating profiles of users of Airbnb and Couchsurfing this year as research into personality differences between users of market- and socially-based network hospitality websites. Along the way I have uncovered some suggestive data supporting a rumor that Couchsurfing may have been manipulating the size of its user-base through fake profiles.

After I had assembled datasets of these user’s publicly viewable data, I started to take a look at the sign-up dates of each profile to gauge the ages of the user bases. In  inspecting the Couchsurfing set, I found an usual spike in sign-ups in 2013.

Conducting a web search for reasons why this would be I queried the web “what happened to Couchsurfing in 2013”.… Read the rest

Design for Doulas

I have typically avoided the realm of UI design, as I view as fraught with of cults of personalities and nonstop bikeshedding, but this semester I decided to try my hand and find seperate the theory from the style posing as theory. The course I am taking is centered around a large project to design an application that helps a population of people with a need they have. This coincides nicely with a dream I have harbored to make technology for doulas– providers of nonmedical, practical and emotional support for pregnancy.  My partner is a doula and leader in a doula organization, so I have been somewhat privy to the way they use tech to run their program.… Read the rest

Against the Grain: Influencing Factors of Opinion Change in Wikipedia’s Article for Deletion Process


This is my final project from my Machine Learning course this past semester. My collaborators and I attempted to find out when, and why users at English Wikipedia’s article for deletion forum, voted against their tendencies. That is, what makes an “deletionist” vote “keep” and when an “inclusionist” votes “delete”? In the end we found that basic machine learning techniques could not perform much better than random, but the intelligence that did emerge came from using information about group herding behaviour, and appeals to the local bureaucratic process.

Against the Grain: Influencing Factors of Opinion Change in Wikipedia’s Article for Deletion Process

Zhiyi Li, Cheng Peng, and Max Klein


On 1 November 2015, English Wikipedia hit 5,000,000 articles; but while article creation is much celebrated, deleting an article is a lesser known process.… Read the rest

WIGI Update: Results from Usability Testing

Since the beta version of, our site dedicated to the biography gender gap on WIkipedia is progressing on the technical side, we decided  conducting usability study on to help improve interaction aspects. We conducted a usability study to find out more. It was fantastic to here people say ”this is the tool we’ve been waiting for,” and we also want to address issues that lead people to describe their experience as “burdensome”.  Thanks to Masssly for compiling the report. I repost his summary here, and the full report can be read on wikimedia commons:

Executive Summary

During the week of November 17 – December 1, 2015, WIGI was tested among participants pulled from the Wikimedia community, loosely described as anyone who edits Wikipedia-the-encyclopedia, or is a potential reader of any of its language versions.… Read the rest

Data Management: The Plan

Perhaps because it’s not something I would have done on my own, thanks to the prodding of Daniel Mietchen, I  have created a data management plan for my open-PhD adventure. What is a data management plan (DMP), you might ask? Now that I’m up to speed, I can tell you that it’s a document in which you set out the parameters for how you will create, share, and store the outcomes of a project. It’s also the sort of thing you go through in order to pose detail questions to yourself and make rigorous your otherwise slightly sloppy thinking.… Read the rest

Am I Doing My PhD In The Open?

Yesterday I clicked on a JSTOR link, and a full text PDF popped up – it wasn’t an Aaron Schwarz liberation plan. I was in the academy reading closed access research. I’m in the academy, and I’m on the precipice of taking it for granted.

Last November I asked if I should do my PhD in the open and answered in the affirmative, but at that point I hadn’t even been accepted by any PhD-granting institutions. Only one month into ensconcing myself in the GroupLens lab, at the Univ. of Minnesota, I somehow forgot about the my commitment to openness.… Read the rest

OpenSym 2015 – Wikipedia in the World of Global Gender Inequality Indices

In a previous post I outlined how the process of correlating the gender bias from Wikipedia with other gender inequality indices. Tomorrow I will present a poster on the same topic at OpenSym 2015. I’ll be explicating how Wikipedia’s biographical bias is closer to the gender bias in highly-qualified jobs, than longevity. It’s part of what I’ve been discovering during my Grant with the Wikimedia Foundation. You can read more in the preprint and poster.


Wikipedia in the World of Global Gender Inequality Indices: What The Biography Gender Gap Is Measuring


OpenSym 2015 Poster

Read the rest