“High Speed Rail: The Board Game” Review: The Nonviolent Utopian Interactive Senate Simulation

Board Games do not attract me by default. The closest I have come to enjoying board games is playing Risk, single player on my phone, and reading a lot about Nomic. But when Alfred Twu speaks, I listen, so if when announced his own fresh take on board gaming my ears piqued and my credit card edged ever so slightly out of my pocket. High Speed Rail: The Board Game emerged from High Speed Rail: The Map, of Guardian Fame. Neither the map or game, have rooted in any material claim to reality, but most things that inflame your imagination don’t.

High Speed Rail: The Map


The $15 printed on overhead transparency version arrived at my house a few days ago (versions go up to $100 with glass pieces). And last night 6 of us maxed-out the player count and sat down to squabble.

Foxy Pigeon Cottage Navigating the Fraught Ways of High Speed Rail Voters
Foxy Pigeon Cottage Navigating the Fraught Ways of High Speed Rail Voters

Except, there was rather little squabbling, and that completely changed my mind about what board games can be. The key thing to know about the HSR, is that you can’t move.

The situation is that you are given 3 objectives, of the type “Logistics Industry: Connect 2 of the 3 Cities – Detroit, Houston, San Diego”. Then you have to place tiles to realize your rail network – except as I mentioned – you’re not allowed to place tiles. Rather when it’s your turn, everybody else will place a tile as a proposal, and you will select another player’s proposal (or 2 proposals at a time for a 5+ player game).

This means that all your strategy comes by way of surveying other player’s objectives and finding the most mutual ground with them. This is slightly complicated by the fact that only 2 of the objective cards are publicly shown, and the 3rd one is kept secret. Barring this small niggle the games is not competitive at all, you are constantly trying to find compromising ground – in some ways a very realistic,  and in others a very unrealistic version of democracy in the US.

Unlike most games, I didn’t feel frustrated for having lost to unfair rules, or scheming opponents. It wasn’t too long, only about 45 minutes (and it was everyone’s first time playing).  To play is to softly massage an unfolding  system. The game has CC-BY-SA logos on all the pieces, and everyone is mostly cooperating. High Speed Rail is the nobly amateur, egalitarian pastime that would be played in Ecotopia.

You can can download a free print-it-yourself version here.

Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With The Open Access Signalling Project

In what could easily be a recurring annual trip,Matt Senate, and I came to Berlin this week to participate in Open Knowledge Festival. We spoke at the csv,conf a fringe event in its first year, ostensibly about the comma separated values, but more so about unusual data hacking. On behalf of WikiProject Open Access – Signalling OA-ness team, we generalized our experience in data-munging with Wikimedia projects for the new user. We were asked to make the talk more story-oriented than technical; and since we were in Germany, we decided to use that famous narrative of Häskell and Grepl. In broad strokes we go through: how Wikimedia projects work, history of Wiki Data-Hacking, from “Ignore All Rules” to calcification, Wikidata told as Hänsel and Gretel, signalling OA-ness, how you could do it too.

These are the full slides (although slide show does not seem to like our Open Office document so much):

And a crowdsourced recording of the session:

We missed half of lunch with the queue of questions extending past our sessions, which was fabulous to see such interest. There is a particular affinity we found with the Content Mine initiative, which wants to programmatically extract facts from papers. Since we are finding and uploading mine-able papers, you could imagine some sort of suggestion system which says to an editor “you cited [fact x] from this paper, do you also want to cite [extracted facts] in the Wikipedia article too?”. Let’s work to make that system a fact in itself.

Wiki-Class Set-up Guide and Exploration

Best viewed with IPython Notebook Viewer


Wiki-Class Set-up Guide and Exploration

Wiki-Class is python package that can determine the quality of a Wikipedia page, using machine learning. It is the open-sourcing of the Random Forest algorithm used by SuggestBot. SuggestBot is an opt-in recommender to Wikipedia editors, offering pages that need work which look like pages they’ve worked on before. Similarly, with this package, you get a function that accepts a string of wikitext, and returns a Wikipedia Class (‘Stub’, ‘C-Class’, ‘Featured Article’, etc.). Wiki-class is currently in alpha according to its packager and developer [@halfak](https://twitter.com/halfak), and although I had to make a few patches to get some examples to work, it’s ready to start classifying your wikitext.


  1. Setting it up on Ubuntu.
  2. Testing the batteries-included model.
  3. Using the output by introducing a closeness measure.
  4. Testing making our own model.


At first you may be frustrated to learn that Wiki-Class is Python 3 only. You’ll not be able to mix it with pywikibot, which is Python 2.7 only, and that can also mean upgrading some of your other tools. However just try to recall these update gripes next time you encounter a UnicodeError in Python 2.x; and then be thankful to Halfak for making us give Python 3 a try. I outline getting the environment running in Ubuntu 14.04 here.

Firstly, if you want to use the Ipython notebook with python3 you can do so with apt-get. And while we’re at it, for convenince we’ll also install another version of pip for Python 3.

In [95]:
!sudo apt-get install ipython3-notebook python3-pip
[sudo] password for notconfusing: 

Some requirements of Wiki-class, including sklearn, and nltk, which are a pain with Python 3 since they haven’t been properly packaged for it yet. So these you’ll have to get from source:

In [1]:
!pip3 install git+https://github.com/scikit-learn/scikit-learn.git
!pip3 install git+https://github.com/nltk/nltk/#

Making some random pages for a test dataset

We’ll need to get some Wikitext, with associated classifications, to start testing. I elected to make a random datasetin pywikibot, which as already stated is Python 2.7 only, and thus needs to be in a separate notebook, you can view it on the nbviewer still. Its output is a file test_class_data.json (github link of the bzip) which is just a dictionary associating qualities and page-texts.

Warning, this dataset has some examples that can cause a ZeroDivisonError because some of these pages have 0 non-mark-up text. I wrote this patch which fixes this issue.

Testing the Pre-built Model

In [3]:
import json
import pandas as pd
from wikiclass.models import RFTextModel
/usr/local/lib/python3.4/dist-packages/pandas/io/excel.py:626: UserWarning: Installed openpyxl is not supported at this time. Use >=1.6.1 and <2.0.0.
  .format(openpyxl_compat.start_ver, openpyxl_compat.stop_ver))

Each model is stored in a .model file. A default one is included in the github repo.

In []:
!wget https://github.com/halfak/Wiki-Class/blob/master/models/enwiki.rf_text.model?raw=true
In [35]:
!mv enwiki.rf_text.model\?raw\=true enwiki.rf_text.model

Now we load the model.

In [4]:
model = RFTextModel.from_file(open("enwiki.rf_text.model",'rb'))
In [5]:
classed_items = json.load(open('test_class_data.json','r'))
print(sum([len(l) for l in classed_items.values()]))

The Wiki-Class-provided model only deals with ‘Stub’, ‘Start’, ‘B’, ‘C’, ‘Good Article’, and ‘Featured Article’ classifications. It does not include not ‘List’, ‘Featured List’, or ‘Disambig’ class pages. So we have to sort out the standard classes out of our 38,000 test articles.

In [6]:
standards = {actual: text for actual, text in classed_items.items() if actual in ['Stub', 'Start', 'C', 'B', 'GA', 'FA'] }
In [5]:
print(sum([len(l) for l in standards.values()]))

Now we iterate over our 36,000 standard-class pages, and put their Wiki-Class assessments into a DataFrame.

In [6]:
accuracy_df = pd.DataFrame(index=classed_items.keys(), columns=['actual','correct', 'model_prob', 'actual_prob'])
for actual, text_list in standards.items():
    #see if actual is even here, otherwise no fair comparison
        for text in text_list:
                assessment, probabilities = model.classify(text)
            except ZeroDivisionError:
                #print(actual, text)
            accuracy_df = accuracy_df.append({'actual': actual,
                                              'correct':int(assessment == actual),
                                              'model_prob': probabilities[assessment],
                                              'actual_prob': probabilities[actual]}, ignore_index=True)

What you see here is that the output of an assessment is really two things. The ‘assessment’ which is simply the ‘class’ which the algorithm predicts best, but secondly a dictionary of probablities of how likely the text is to belong to each class.

In our DataFrame we record four data. The ‘actual’ class as Wikipedia classes it; whether the actual class matches the model prediction. The probabilty (read: “confidence”) of the model prediction. And lastly the probability of the actual class. Note in the “correct” case model_prob and actual_prob are the same.

In [7]:
df  = accuracy_df.dropna(how='all')
actual correct model_prob actual_prob
18 Start 0 0.4 0.0
19 Start 1 0.8 0.8
20 Start 0 0.4 0.0
21 Start 0 1.0 0.0
22 Start 1 0.7 0.7

If we look at the correct mean averages we should hopefully see something above 1/6th, which would be the performance of just guessing. Which we do.

In [8]:
groups = df.groupby(by='actual')
B         0.247391
C         0.278138
FA        0.854167
GA        0.444444
Start     0.387334
Stub      0.698394
Name: correct, dtype: float64

See how “close” predications are if they are not correct.

Now we hack on the output. The Random Forest is really just binning text into difference classes, it doesn’t know that some of the classes are closer to each other than others. Therefore we define a distance metric on the Standard Wiki classes. I call this order the “Classic Order” To get an intuition, consider this example. If an article is a Good Aritcle and the model prediction is also Good Article then it is off by 0; if the model prediction is Featured Article it is off off by 1; if the model prediction is Start then it was off by 3.

In [7]:
classic_order = ['Stub', 'Start', 'C', 'B', 'GA', 'FA']
enum_classic = enumerate(classic_order)

for enum, classic in dict(enum_classic).items():
    print(enum, classic)
0 Stub
1 Start
2 C
3 B
4 GA
5 FA

Now we are going to iterate over the same dataset as above, but instead of recording “correctness”, we record the closesness in a DataFrame.

In [8]:
classic_order = ['Stub', 'Start', 'C', 'B', 'GA', 'FA']
classic_dict = dict(zip(classic_order, range(len(classic_order))))

off_by_df = pd.DataFrame(index=classed_items.keys(), columns=['actual','off_by'])

for classic in classic_order:
    for text in standards[classic]:
                assessment, probabilities = model.classify(text)
            except ZeroDivisionError:
                #print(actual, text)
            off_by_df = off_by_df.append({'actual': classic,
                                              'off_by':abs(classic_dict[assessment] - classic_dict[classic])}, ignore_index=True)

So it should look something like this as a table

In [9]:
off_by  = off_by_df.dropna(how='all')
actual off_by
18 Stub 2
19 Stub 1
20 Stub 0
21 Stub 0
22 Stub 0

And as a chart.

In [10]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

WARNING: pylab import has clobbered these variables: ['text']
`%pylab --no-import-all` prevents importing * from pylab and numpy

We can see that the middle classes are less easy to predict where as the ends are easier. This would corroborate our expectations. Since the the quality sprectrum bleed past these rather arbitrary cut-off points,ore of the quality specturm would lie in these intervals, and so its easier to bin them.

In [11]:
ax = off_by.groupby(by='actual',sort=False).mean().plot(title='Prediction Closeness by Quality Class', kind='bar', legend=False)
ax.set_ylabel('''Prediction Closeness (lower is more accurate)''')
ax.set_xlabel('''Quality Class''')
<matplotlib.text.Text at 0x7fc089810550>

Making a model

Now we test the model-making feature. We will use our dataset of ‘standards’ from above, using a random 80% for training and 20% for testing.

In [27]:
from wikiclass.models import RFTextModel
from wikiclass import assessments

Divvyig up our data into two lists.

In [28]:
import random

train_set = list()
test_set = list()
for actual, text_list in standards.items():
    for text in text_list:
        if random.randint(0,9) >= 8:
            test_set.append( (text, actual) )
            train_set.append( (text, actual) )


And the next step is quite simple, we just click a button supplying our train_set list, and test by supplying our test_set list. Also the package conveniently supplies a saving function for us to store our model for later use.

In [29]:
# Train a model
model = RFTextModel.train(

# Run the test set & print the results
results = model.test(test_set)

# Write the model to disk for reuse.
model.to_file(open("36K_random_enwiki.rf_text.model", "wb"))
pred assessment    B    C  FA  GA  Start  Stub
real assessment                               
B                130   29   1   5    105    40
C                 34  112   0   2    151    33
FA                 7    3   4   0      1     0
GA                 8    8   0  11      9     1
Start             80   87   0   2   1420   525
Stub              40   32   0   0    547  3973

Now to look at accuracy, we norm the DataFrame row-wise.

In [30]:
norm_results = results.apply(lambda col: col / col.sum(), axis=1)
pred assessment B C FA GA Start Stub
real assessment
B 0.419355 0.093548 0.003226 0.016129 0.338710 0.129032
C 0.102410 0.337349 0.000000 0.006024 0.454819 0.099398
FA 0.466667 0.200000 0.266667 0.000000 0.066667 0.000000
GA 0.216216 0.216216 0.000000 0.297297 0.243243 0.027027
Start 0.037843 0.041154 0.000000 0.000946 0.671712 0.248344
Stub 0.008711 0.006969 0.000000 0.000000 0.119120 0.865200

And finally we can view the peformance by class, which intriguingly seems to be better than what we got with the batteries-included model.

In [35]:
for c in classic_order:
    print(c, norm_results.loc[c][c])
Stub 0.865200348432
Start 0.671712393567
C 0.33734939759
B 0.41935483871
GA 0.297297297297
FA 0.266666666667

We can see that, having a large number of stubs to train on really gives us a high precision in classifying them.

So there you have it – a brief playing around with Wiki-Class, an easy way to get rough quality estimates out of your data. If you extend any more examples of using this class, I’d be intrigued to see and collaborate on them.


In []:

What Part of “School” Don’t You Understand?

I received an apologetic email from HackerSchool an hour ago, that was sorry to tell me they couldn’t admit me this fall – quizzically I was not gutted. HackerSchool is part of the wave of “Hacker Education,” where you exchange something with a company for programming education. HackerSchool differentiates in that you don’t pay them upfront, or necessarily at all – they just want a cut of a potential recruiting bonus when they pawn you to another company. They also have good perspectives on lightweight social rules and gender equality which piqued me.  Still, let us not mince, this is private education. A more dedicated Hacker might call it a co-option of DIY, gift-economy culture.

Although this might seem a bitter and fruitless retaliation in response to a rejection letter, that is only the first lily-pad. In fact, there is something more that made HackerSchool particularly attractive to me, which only became apparent in retrospect. As I wrote in my application (which is copied in its entirety below), and previously about dogs, a central conundrum for me is not knowing how to work for myself. I can work to impress authorities, appear clever for narcissistic purposes, or for fear of failure – but not because I want to.

HackerSchool’s “everyone determines their own lesson plan” philosophy could be a vital stepping stone to a dreamed-of autodidacticism. On the face of it, going to HackerSchool even looks like genuine autodidacticism. But closer inspection would reveal that you have an authority (the HackerSchool institution) that instructs you to teach yourself. The outer loop, the most meta-level, is still a deference to a force that isn’t your own. It’s virtualizing self-ownership, which is really just “bluepilling” yourself.

It's virtualizing self-ownership, which is really just
It’s virtualizing self-ownership, which is really just “bluepilling” yourself.


This conflict arose during my 14 minute interview with  the organizers, in the question of “how would I learn,” my favoured topics? “I suppose, I would use textbook as in College,” I replied without confidence, and later amended to “but typically it’s been project-needs plus stackoverflow.”  In both cases I now see I pointed to historic examples where the main motivator was either a professor or boss. Unsurprisingly this was a lacking answer to both myself, and an interviewer not looking to be my professor or boss.

Looking for positive cases of escaping this servitude, there is obviously one classical logic. Accomplishment-desire drives non-authority-pleasing mechanisms of work. But what if we allow that natural curiosity is not the only way to exit the teach-yourself-to-teach-yourself paradox. What could be alternatives? We could use as a starting point the goal vs. process attitude dichotomy. In this framework the ravenous prodigy sits neatly on the “goal” side. And on the other side?

There isn’t a prominent model to represent the unspurred, successful process-worshipper. The best exmaples I can offer are probably something like Aaron Schwartz, Grigori Perelman, or a stereotypical monk. Having such a dearth of role models is probably because process-oriented people aren’t highly lauded in our prize-counting society, and are thus non-notable. This is a dead-end I feel I’ve been running into frequently.

The conclusive feeling here is not directed, but is still a redoubling of effort. It’s a large, and still partially free internet out there. There’s Open Access research to read and write, and Open Source code to execute and develop. Even without the promise of coming to an epiphany of how not to get depressed about the fact that I do it alone in my bedroom, that unlit corridor still calls to me as the one with light at the end.

Should it help anyone fulfil their dreams, and for the sake of radical transparency, this was my HackerSchool Application.

HackerSchool Application


Please include any that you have: GitHub, LinkedIn, personal web site, etc.



https://www.linkedin.com/pub/maximilian-klein/4/b1b/63 Any tips for updating?

http://notconfusing.com (trying to fix width issue on category subpages may need to switch thee)

Code CracklePop

Write a program that prints out the numbers 1 to 100 (inclusive). If the number is divisible by 3, print Crackle instead of the number. If it’s divisible by 5, print Pop. If it’s divisible by both 3 and 5, print CracklePop. You can use any language.


rice = [3,5]

crispies = ['Crackle','Pop']

rice_crispies = dict(zip(rice, crispies))

for i in range(101):


    for flake in rice:

        if i % flake == 0:

            print(rice_crispies[flake], sep='', end='')

            print('', end='\n')


Please link to a program you’ve written from scratch.

You can use something likeGitHub’s gist to host your code. It doesn’t need to be long, but it should be something you’ve written yourself, and not using a framework (e.g., Rails). If you don’t have anything to submit, code something small, like a game of tic-tac-toe.


This tutorial translates an Economic algorithm into Python. In short, it does some matrix-calculations, statistical analysis, and some plotting. Its most advanced language-feature is a python “generator.”


What is the most fascinating thing you’ve learned in the past month?

This doesn’t have to be about programming.


We all know a averaged crowd of fair-goers can guess the weight of a heifer more accurately than any of the individual simpletons among them. Science shows the principle extends to marbles and encyclopedias as well. But what I picked up this month at the Network Science Conference ‘14, was that it can be applied to stock trading too. Diversification strategies work – but they can also be diversified. A team of researchers explained a technique that they simulated. If you followed x traders, mimicking exactly the trades they perform, but with 1/x of your money, then for sufficiently high x, the return is higher than any of the individuals.


The network science bit comes in because you don’t want anyone you follow to be following each other. For the highest return on investment, those who you follow should have “no common ancestors,” in network parlance.


More so than stock trading, the “wisdom of the crowds” theory appeals to me. Trying to make clever stock decisions is a huge industry, and this intuitive simple mechanism can compete with more complex ones. What’s fascinating here to me is how theories can unexpectedly translate between domains.

What do you want to be doing in two years?


Two years from now I would like be swimming through the gooey centre of a large research project at a think tank or in research and development. Stemming from my previous employment at OCLC Research (a library think tank), I enjoy the freedom of blue-sky thinking. Therefore the employers that have a large enough budget for pure research (Microsoft Research is a good example of this) are the competitive waters that I want to compete in. Having such lofty dreams are never regrettable in my experience because there are always failsafes. In this case one can always sell oneself as a Data Analyst for business intelligence.


To enjoy any future work however it would be crucial for me to be in a team of stellar collaborators. My personal adage (which I stole from a guy that works in a copy shop) is “life is the water cool, the water cooler is life.” Being around people ignites my mind (even at the copy shop), and I want to continue fuelling that fire. I will continue to invite uncomfortable differences in perspective. Therefore in two years I want to be in a team that values learning over goals. Goals inevitably follow learning – but not vice-versa.

Why do you want to do Hacker School?

I see Hacker School as the centre part of a venn diagram of my desires which are (1) learning self-directedly (2) being part of a supporting group, and (3) boosting employment opportunity after.


Last year when I went to my boss and asked her to crack the whip on me harder, my own actions perplexed me. I quit my job to attack the problem of relying on authority to motivate my work. But next came the paradox: how can one self-direct one’s self to autodidactically become self-directed? Recursion without a base case.


From a pragmatic perspective I still go to my local hackerspace because I enjoy what could be termed as “co-learning.” The social environment drives me. Being conscious of your impression on others, can psychologically push you to work. It’s not self-actualizing alone in your bedroom, but it’s effective.


Hacker school seems like a realpolitik compromise between bootstrapping self-ownership, and well-proved social dynamics. Given that Hacker School can also help with the personally-dreaded task of a job-search after, I see a trifecta being won.


What would you like to work on at Hacker School?

E.g., things you want to learn or understand better, projects you want to build or contribute to, etc.


While there are a few pet-projects that jump to mind, none are as important as the process of the work I might do. Rather than pronounce any work in detail, I would describe my desires declaratively. There are two main criteria. Firstly, like a carrot just beyond horse-mouth’s reach, I want to find a project that is harder than I expect, simply to level-up. Secondly, is to overcome the folly of the lone-inventor myth. To horizontally work with a partner is as important as being the bringer of the techno-revolution. So the thing I would like to work on is a new idea I would receive while I’m at Hacker School.


That being said, in absence of any external input, a few of the topics I want to understand better are machine learning, genetic algorithms, and pattern recognition. These corners of computer science are somehow just cool. Pursuing them, since they are substantially complex seems commensurate with Hacker School motto of “get dramatically better.”


Also I want to be able to make my phone turn off silent mode by sending it a secret text for those hidden-under-couch situations.

Programming background

This information will not disqualify your application. We use it to better get to know our applicants and where they currently are. If you’re worried that you won’t fit into Hacker School, you can read aboutsome of our alumni.

Describe your programming background in a few sentences.

2006. Failed Java in Community College.

2009. Discover I enjoyed programming Turing Machines on paper in “Computability Theory” in my Pure Math major.

2010. Enroll in – and revel in – the purity of the Berkeley/MIT Scheme tradition.

2011. Fail Java again. Tech career funeral and wake.

2012. Phoenixed with Python + Stackoverflow, to write Wikipedia bots.

2013. Welcome to the FOSS movement. Linux and git start unpacking in my brain.

2014. Hacker School


Have you worked professionally as a programmer?

If so, please describe your experience.


Working in programming and working hard at reinventing the idea of a “professional” programmer have been the last three years of my life. When I was “Wikipedian-in-Residence” I turned my job into programming by convincing management of my proposed bot-writing projects. In my own business I’ve won contracts to deliver reports that were the result of custom programs. So although I’ve never worked as a typical professional programmer, I like my life to be about delivering code for pay.


Do you have a Computer Science degree or are you seeking one?


I have a Bachelor’s degree in Mathematics from University of California Berkeley, and have applied and furthered my Computer Science knowledge outside of academia. In the far future I have considered enrolling in a  Master’s or PhD program. My draw towards a heavily mathematical emphasis looms. From my work with Wikipedia, a more human and social element has nestled in my head. Therefore it’s possible that my interests would converge in Computer Science degree.

Logicomix, page 162.

Prerequisite-free Set Theory – Just The Intuition

Logicomix, page 162.
Logicomix, page 162.

My favourite Hackerspace Sudo Room  is very close to  Bay Area Public School,  whose concept of a anti-capitalist University intrigues me very much. In chatting  about their plans for Math education, they expounded on the need for a primer to Set Theory, as they had been learning the Philosophy of Alain Badiou, who utilizes those foundations. Their request was for softer, more intuitive introduction. And just a short 18 months after that casual chat, this last, Saturday June 14th 2014, I held that public education, and it went brilliantly. 2 very curious mind showed up and we had fun reading the comic example aloud. The comic we used as a launching point is Logicomix: An Epic Search for Truth.

Continue reading

See how the Method of Reflections evolves as a recursive process.

Method of Reflections: Explained and Exampled in Python

The introduction of post is mirrored here, but the full tutorial is on IPython Notebook Viewer.

Method of Reflections Explained and Exampled in Python


See how the Method of Reflections evolves as a recursive process.
See how the Method of Reflections evolves as a recursive process.

The Method of Reflection (MOR) is a algorithm first coming out of macroeconomics, that ranks nodes in a bi-partite network. This notebook should hopefully help you implement the method of reflection in python. To be precise, it is the modified algorithm that is proposed by Caldarelli et al., which solves some problems with the original Hidalgo-Hausmann (HH) algorithm doi:10.1073/pnas.0900943106. The main problem with (HH) is that all values converge to a single fixed point after sufficiently many iterations. The Caldarelli version solves this by adding a new term to the recursive equation – what they call a biased random walker (function G). doi: 10.1371/journal.pone.0047278 . I hadn’t seen any open-source implementations of this algorithm, so I thought I’d share my naïve approach.

Read on at http://nbviewer.ipython.org/github/notconfusing/wiki_econ_capability/blob/master/Method%20of%20Reflections%20Explained%20and%20Exampled.ipynb

Continue reading

Morten’s Rule of Airports

This is Morten’s Rule of Airports, it’s history, and some of its benefits. The rule states:

If [the length of your layover] minus [the time it would take to comfortably get to the city centre and back] is greater than or equal to one hour, then you should exit the airport.

Or for those of you who read nerd:



During an expedition to see my friend Morten in British Colombia, in which I was struggling with travel-stress, I was struck by a story of enlightened travelling genius. Morten told of a tale where a flight of his was delayed at Charles de Gaulle Airport in Paris by several hours. It was an inconvenience because he was missing appointments for work and with friends in Denmark. He was getting stressed when he overheard a couple that were vocally calculating the time it would take to get to the city centre and back. Even with reasonable train delay buffers the journey would still leave them with 1 hour and 5 minutes in Paris. While it seemed a bit ludicrous to go for just 1 hour and 5 minutes, there wasn’t much else to do, so he boarded the shuttle, and set off mostly aimlessly. In his 65 minutes, he bought a croissant and a coffee and sat and pondered. A state of relaxation and bliss came over him, he relates. It was just atmosphere soaking, but it was real – especially compared to the departure lounge. On returning to the airport with the planned buffer in tact, he wrote to all his missed appointments that he was really sorry to miss their meetings. To his friend he recounted his hour of cosmically-displaced croissant munching and a restored inner peace.

Impressed very much by this story at the time, I’ve decided to honour it. Therefore I have abstracted its principle and named it after its originator. You can help me to codify this rule by implementing it in your life.

I walked this talk when I last stopped over in London. Landing at 8am, and taking off again at 2pm, I counted an hour’s travel in each direction and the need to be two hours early.  I was still left with 14-8-1-1-2 = 2 hours of freedom. At first I expected to do nothing besides my normal coffee and journalling  ritual. Yet, stepping out of Victoria station and on my way to hunting down a Costa’s, I encountered a City Cycle Hire. My plans changed in that instant, and after the nuisance of accepting all liability for my own dangerous cycling behaviour, out clicked a 3-speed indestructible 2-wheeler. Then with a buzzing smile across my face, I intrepidly raced to Buckingham Palace, Nelson’s Column, St. James Park,  and the Houses of Parliament with the goal of trying to get a fast-forward visual-effect through speed. Later I topped-up a SIM called an old friend for 13 mins 36 seconds,  and guzzled a Strongbow at 11am opening time. It was a brilliant interlude, providing adventure, exercise, and chemical euphoria in a condensed movie-long wander. Far better than purposelessly zooming in and out on the display cameras at the unbranded electronics shop (which I still did for 20 minutes anyway, but 60). I hope you are as convinced as I was to get out and enjoy the only-marginally but still better activities that are available in the open-air tourist traps.

In fact I had great joy at playing the fool as I asked a someone to take a picture of my luggaged self, and received this appalling shot.


It did however this piece of poetry:

The picture was cropped
by the tourist-photographer.
The subject was photographed
by Nelson’s trunk.

Profiles of Inspiring Wikimedians I Met at Wikiconference USA 2014

Wikiconference USA 2014, in New York, just finished, and more than usual this conference instilled in me a lot of motivating social energy. Yes, I did present there, twice, on “Answering Big Questions With Wikidata“, and “Signalling Open Access References,” but more so than usual I enjoyed attending other presentations. On reflecting why that was, I came to realize it was the earnest authentic effort of other Wikimedians, that shone so brightly. These are some of the more inspiring characters from the conference, but by no means a complete list.

Sumana Harihareswara

Sumana gave the opening keynote wherein she talked about implicit versus explicit exclusion. To introduce the subject she told of her positive experience at Hacker School which does actively exclude some people (there’s an application process), but as a result makes a more intentionally inviting space. That is because only inviting, inclusive individuals are selected for Hacker School. As she related this to Wikipedia, the shortcomings of our emphasis on liberty highlighted, perhaps it doesn’t ensure a safe learning space. A key quote that summed this was “in the Wikimedia community, since we don’t exclude anyone explicitly, we exclude others implicitly [sic].” Strong free speech defence is not muting some overbearing voices. A full transcript of the talk is available.

I particularly became aware of Hacker School’s “no well-actually’s” rule that Sumana presented. Many times during the conference, when someone was doing something in Python that had a technical side-note I wanted to slot into the conversation, but being newly aware of how this is disruptive, I simply allowed the real learning to continue. Sumana very much practiced this too, as I witnessed when someone came up to her to her to talk of her discomfort about someone who was wearing Google Glass at the conference. She without pause jumped to help, offering to go with the privacy advocate to find a conference organizer without any judgement on the camera-controversy itself.

Sumana also gave an impromptu Gender Diversity training, which came from Ada initiative. Actually this was offered twice and I attended both sessions (and it was my third time since I’d watched it online). Sumana’s rapid-fire style resonates with my personality and preferred learning style very well.  This allows me to really synch-up with the lesson and download the content with high-mental-bandwidth. In general Sumana is an over-clocked, but liquid-cooled processor, which is brilliant if you are too, and have fibre-optic connection.

In Zürich we were going to work on a python project together because we were both talking about wanting to pair-program python on twitter. Then we realized we didn’t want to work on the same project. I was really impressed with her straightforward, unpretentious communication when she said “It doesn’t seem like want to work on the same project – so perhaps another time.” The combination of directness and openness is liberating. Often we see one without the other, but having both is a fantastic combination.

And she is the mentor of:

Frances Hocutt

Frances gave a walkthrough and workshop on the MediaWiki API. I actually interact with the API a lot, but never directly only through pywikibot, so I was much enlightened by this lesson. In fact Frances explained with great care and deliberateness, from step zero, what is an API, all the way to the specifics of the MediaWiki API and how to use a client library. Frances’ teaching style is methodical. The pace is never frantic, taking time to get every word right, never needing to allow herself extra time with “er” or “um”. Learning from Frances is like having an immaculate syllogism patiently unfurl in front of you.

She also did the brave thing of giving a live demo of mwclient, starting from pip installation. Which was great to learn because I am only familiar with its not-quite-competitor pywikibot. So she both didn’t assume any technical knowledge, but didn’t leave experienced programmers bored, which is a hard balance to strike. This is her blog about her presentation including links to her slides.

Frances also taught the same Gender Diversity Training aimed at cis-men, which I attended. It was in this reprise that I most caught the proverbial advice –  “to follow your discomfort.”

Finally I’d like to credit her Chemistry knowledge and quick wit. In my previous blog post about sex ratios, I mentioned I’d found an occurrence of “sodium” for a sex. Frances quickly Sherlock Holmes’d that this was likely because someone had tried to enter not applicable – “na”, and probably received the auto-suggest chemical element.

Joelle fleurantin

Joelle reflecting. Attribution: http://fleurantin.cc/
Joelle reflecting. Attribution

At the conference Joelle gave a lightning talk that I enjoyed about her involvement in improving the Mozilla wiki, at which she has an internship. One day, she said, she became curious about the wiki’s usage statistics, but could not find anything more than minimal information that was contained in maintenance reports. So she has started building some scripts to analyse and visualize the Mozilla wiki’s usage.

Later in her lightning talk she also discussed her own autodidactic learning techniques, where she told of being a big recorded-conference-video watcher. Joelle has a particular penchant for linux.com, she shared. Therefore her being part of Gnome’s Outreach Program for Women should come as no surprise. As part of her Gnome involvement, Joelle fuzzes 0MQ, a stress-test debugging techinque as she patiently explained to me.

Over a beer in Brooklyn later on, she was coaxed to shed some immodesty and recap the tech-art piece that she’d made. It is a interactive installation where one wears headphones listening to a monologue of a woman talking about her inner thoughts and as you approach a video screen a proximity sensor tracks you. The closer you get to the screen, the video changes to reveal increasingly more intimate footage.

Megan Wacha

Megan on the Left. Attribution https://twitter.com/Museocat/status/472387217298825216/photo/1
Megan on the Left. Attribution

Megan is the Research and Instruction Librarian for the Performing Arts at Barnard College of Columbia University. Her presentation at the conference was about the multiplicities of roles for Librarians in Wikipedia. Regretfully because of scheduling I couldn’t attend it. She however attended mine and Wrought’s Signalling Open Access talk, and amazed from the Q and A. In a debate about whether it is it overcomplicating to import Open Access articles to Wikisource, as there may be corrections or retractions published, she noticed the more general problem. This was the first time I heard someone say “I’m going to bring this up with MLA.” Her reasonable position is that “we should really be citing the used-source and not the original publication.” I didn’t even know you could take issue with MLA.

During a lightning talk about there not being enough video in Wikipedia, a list of high profile articles without videos were cited, “Racing, Soccer, Dance.” On the word dance, with a large hacker-confidence she leaned over to me and said “we’re going to fix that.” What an assertion,  and I believe it because of her other on-wiki work. Do you know the Ntozake Shange article? Well its existence is owed to the inclusion of particularly hard to find sources – which is her speciality.


Two last special mentions, that I didn’t get enough time to know well, but want to hat-tip.

Dorothy Howard

Is currently working at the Metropolitan Library Council, as Wikipedian in residence.
Endearingly to me, she promotes the Wikipedia Authority Control project, which is easy to enjoy since it aggrandizes the work with VIAFbot. But this is also part of a holistic effort of hers to be a sort of techno-evangelist for a lot of wiki-library projects, and anyone that is in that space knows its presence is hotting up.

Jennifer Baek

Has been involved in SFC for a long time, Wrought told me that he remembers meeting her in 2008 in Berkeley. Apparently since then she has not let up. She was the main conference organizer, and fire-putter-outer. When I was accidentally double-booked (to speak in two places at once) she coolly helped make sane the logistics. Thank you for making the conference happen.

The Virtuous Circle of Wikipeda: The Poster

It may seem like a small piece of work, but I wanted to commemorate this moment – my first poster. I never had the need to manufacture one. Today I presented it at NetSci (Network Science) 2014, and received many useful comments on the research. We found a few other that are, like ourselves, translating  the ‘method of reflections’ into new domains. The paper related to this poster is in review, but you can also access a preprint files on github.

On the art side I’d like to thank unluckylion, for encouraging me to make a bold statement. I think it paid off, and I’m only mildly guilty about the blatant copyvio of the Wikipedia logo. Although I’ll use that point to show the necessity for the new attribution logos.

Skeumorph anyone?
Skeumorph anyone?


Sneak Peek at Wikimedia’s New Bold, High Concept Iconography

Wikimedia’s User Experience team invited me and a few others into the office to be part of focus group concerning a proposed new iconography.

The are two proposed new design languages, and an icon or “mark” for each Wikimedia project.
A selfie with two of the fablous design team, May and YuFei.
Penchant for selfies. Myself (left) with two of the fabulous design team, May (centre) and Yufei (right).

With free pizza proffed, the UX team Jared Zimmerman, May Galloway, and Yufei Liu, (pictured right) launched right into the need for these new set of icons, or “marks” as they are calling them.

  • The current logos don’t scale to 16 pixels square, and don’t overlay well.
  • To distinguish links to wikimedia sites on non-wikimedia sites.
    • Other sites have “social media” icons which if a brand is big enough replace a text link. Think facebook “f”, or twitter’s “t”.
    • Also, there was an intriguing mock-up which displayed twitter having a special preview of a link to Wikipedia, much like they treat youtube links specially.
  • Attribution to wikimedia content is verbose and cumbersome, and could be wrapped into an iconic link.

I’m convinced. Just like there are “post to facebook” buttons polluting the internet, there may as well be “read on Wikipedia” icons to restore some balance to the universe. Even though it’s minor, the attribution point is also valid. When I want to attribute commons – like I do on other parts of this blog – well all that copypasta is half of my repetitive strain injury.

Before continuing to show you what these marks actually look like, allow me to appease the User Experience team by disclaiming these disclaimers about the designs you are about to see.

  1. Not replacements. The marks are not meant to be replacements for the current logos (don’t call these logos). They are in-addition-to what we already have, and for others to use when pointing links to or mentioning Wikmedia.
  2. Not final. The marks shown here are not final, they are open for community review and scrutiny. I trust them because they sat quietly as I bombasted how the Wikpedia mark looks like it’s from M*A*S*H.
  3. Not forced. The marks will not be forced on the community. There will be a Request for Comment, and the outcome of that RfC will decide the fate of this project. Wikimedia Foundation is not making anyone do anything.
Wikipedia. While the tangram looks like it should be sent back to the army's crate-stenciling department, the path is pure Ikea, self-explaining simplicity.
Wikipedia. While the tangram looks like it should be sent back to the army’s crate-stenciling department, the path is pure Ikea, self-explaining simplicity.
Commons. *shutter sounds* There was some quibbling that commons is not just photos, so a camera doesn't represent it well. But I don't think you can beat it for recognizability. Notice in the Path, the lower semi circle motif turns into hand adjusting the lens.
Commons. *shutter sounds* There was some quibbling that commons is not just photos, so a camera doesn’t represent it well. But I don’t think you can beat it for recognizability. Notice in the Path, the lower semi circle motif turns into hand adjusting the lens.

With that said we can proceed to analysing the design language, of which there are two.


The first of the two languages, which in these images are the upper row, is called “Tangram”. A tangram (oh look there’s a link to Wikipedia, which wordpress could render with a small mark next to it) is a Chinese toy or puzzle, that consists of rearranging certain primitive shapes. All the tangram marks can be made by rearranging four shapes (sadly not pictured here). The tangram series is more “metaphoric” to use the UX team’s words. Although the Wikipedia mark, still a “W”, is not metaphoric as a notable exception. It’s also the simpler of the two series. Often times making out the meaning is a bit more oblique, but easier to see once the meaning is pointed out, which I do in the captions.


Path, shown in the lower row, is the more complicated set of the two. The UX team still says that these will work at 16 by 16 px. They are described as having a sketchier feel, and were explained to preserve the circular nature that exists in the current logos. Path’s meanings are more literal, and thus easier to decipher at first glance, which Jared Zimmerman said, almost regrettably, will bias people to like them better.

I’m sure you have many comments because this is close to a bikeshedding sort of discussion, but that is good because the UX team want your feedback. So make sure you send it to the right people.

Now you may enjoy your sneak peek.

Wiktionary. Do you know what a catchword is? Well that is how we used to access dictionaries, and the UX team is not afraid of a little skeumorph.
Wiktionary. Do you know what a catchword is? Well that is how we used to access dictionaries, and the UX team is not afraid of a little skeumorph.
Wikivoyage. You'll kick yourself for not getting this tangram - it's the sun setting behind a mountain range over a wavy sea. I'm being serious.
Wikivoyage. You’ll kick yourself for not getting this tangram – it’s the sun setting behind a mountain range over a wavy sea. I’m being serious.
Wikiversity. Tangrams show people coming together (althoug in my experience that doesn't equate learning). The path riffs on the classic
Wikiversity. Tangrams show people coming together (althoug in my experience that doesn’t equate learning). The path riffs on the classic graduation cap.
Wikispecies. The official explanation of the tangram is that it is the silhouette of a twisted double helix. And the path 'fingerprint' is more than endearing.
Wikispecies. The official explanation of the tangram is that it is the silhouette of a twisted double helix. And the path ‘fingerprint’ is more than endearing.
Wikisource. The tangram big-stack-of-papers is a stroke of genius IMHO.
Wikisource. The tangram big-stack-of-papers is a stroke of genius IMHO.
Wikiquote. Both are loud and clear. The there was discussion as to whether the displayed quotation marks were international enough. As Germans, French, and shockingly even non-Europeans do it differently.
Wikiquote. Both are loud and clear. There was discussion as to whether the displayed quotation marks were international enough. As Germans, French, and shockingly even non-Europeans do it differently.
Wikinews. The tangram is supposed to a person reading the paper (see it now‽). The path is as obvious as it can get.
Wikinews. The tangram is supposed to a person reading the paper (see it now‽). The path is as obvious as it can get.
Wikidata. If you don't know, the currently logo says "Wiki" (in Morse code I believe). The tangram explanation here was to design something that said "input-output". The path tries to show that the data (lines) are connected, and could be circumscribed in a hexagon.
Wikidata. If you don’t know, the currently logo says “Wiki” (in Morse code I believe). The tangram explanation here was to design something that said “input-output”. The path tries to show that the data (lines) are connected, and could be circumscribed in a hexagon.
Wikibooks. Books must predate graphic design.
Wikibooks. Books must predate graphic design or something.
Meta. The tangram is suppose to give a talk bubble conversation feel, although it was also pointed out that it looks like two laptops interfacing through a mirror. I hope the Path is replaced because it is much to much "live chat" on godaddy or somesuch.
Meta. The tangram is suppose to give a talk bubble conversation feel, although it was also pointed out that it looks like two laptops interfacing through a mirror. I hope the Path is replaced because it is much to much “live chat” on godaddy or somesuch.
Mediawiki. The sunflower disappears, but the brackets remain. Curly brace fans have a lot to be happy about.
Mediawiki. The sunflower disappears, but the brackets remain. Curly brace fans have a lot to be happy about.
Labs. The tangram gives you a sort of walkie-talkie upload-download feel, which is inline. And of course, there was not a lot do you can do with a unicorn.
Labs. The tangram gives you a sort of walkie-talkie upload-download feel, which is appropriate. And of course, there was not a lot do you can do with a unicorn.
Incubator. The path was commented by a focus group member to look like an avocado, to which the UX team's only response was that avocados don't have circular yolks.
Incubator. The path was commented by a focus group member to look like an avocado, to which the UX team’s only response was that avocados don’t have circular yolks.
Fondation. There are many tangrams because the UX team said, perhaps there should not be just one mark for foundation, in its many roles.
Fondation. There are many tangrams because the UX team said, perhaps there should not be just one mark for foundation, in its many roles.