In what could easily be a recurring annual trip,Matt Senate, and I came to Berlin this week to participate in Open Knowledge Festival. We spoke at the csv,conf a fringe event in its first year, ostensibly about the comma separated values, but more so about unusual data hacking. On behalf of WikiProject Open Access – Signalling OA-ness team, we generalized our experience in data-munging with Wikimedia projects for the new user. We were asked to make the talk more story-oriented than technical; and since we were in Germany, we decided to use that famous narrative of Häskell and Grepl.… Read the rest
Wiki-Class Set-up Guide and Exploration
Wiki-Class is python package that can determine the quality of a Wikipedia page, using machine learning. It is the open-sourcing of the Random Forest algorithm used by SuggestBot. SuggestBot is an opt-in recommender to Wikipedia editors, offering pages that need work which look like pages they’ve worked on before. Similarly, with this package, you get a function that accepts a string of wikitext, and returns a Wikipedia Class (‘Stub’, ‘C-Class’, ‘Featured Article’, etc.). Wiki-class is currently in
alpha according to its packager and developer [@halfak](https://twitter.com/halfak), and although I had to make a few patches to get some examples to work, it’s ready to start classifying your wikitext.
The introduction of post is mirrored here, but the full tutorial is on IPython Notebook Viewer.
Method of Reflections Explained and Exampled in Python
The Method of Reflection (MOR) is a algorithm first coming out of macroeconomics, that ranks nodes in a bi-partite network. This notebook should hopefully help you implement the method of reflection in python. To be precise, it is the modified algorithm that is proposed by Caldarelli et al., which solves some problems with the original Hidalgo-Hausmann (HH) algorithm doi:10.1073/pnas.0900943106. The main problem with (HH) is that all values converge to a single fixed point after sufficiently many iterations.
It may seem like a small piece of work, but I wanted to commemorate this moment – my first poster. I never had the need to manufacture one. Today I presented it at NetSci (Network Science) 2014, and received many useful comments on the research. We found a few other that are, like ourselves, translating the ‘method of reflections’ into new domains. The paper related to this poster is in review, but you can also access a preprint files on github.
On the art side I’d like to thank unluckylion, for encouraging me to make a bold statement. I think it paid off, and I’m only mildly guilty about the blatant copyvio of the Wikipedia logo.… Read the rest
The Listiness of Wikipedia¶
Although it was only an aside, an answer of "What is a Reference work?" caught my attention at UC Berkeley iSchool's March 21st Friday Afternoon Seminar by Michael Buckland. One possible answer suggested was: works that are over 80% list.
That definition, although seeming a bit short, was actually serious suggestion published by Marcia Bates in 1984. [Bates, Marcia J. "What Is a Reference Book: A Theoretical and Empirical Analysis." RQ 26 (Fall 1986): 37-57] This is an elegant solution in my opinion as a way to define reference works because although heuristic, it's entirely quantitative.
You’re surfing a topic of great interest to you on Wikipedia, so interesting that you actually click through to the references. You’re excited to read the original material, but all of a sudden you are foiled—you’ve hit a paywall! And $35 to read an article is just too steep.
(A Japanese review of the paper, by National Diet Library also available.)
This article presents a case study of a project, led by Wikipedians in Residence at OCLC and the British Library, to integrate authority data from the Virtual International Authority File (VIAF) with biographical Wikipedia articles. This linking of data represents an opportunity for libraries to present their traditionally siloed data, such as catalog and authority records, in more openly accessible web platforms.
You might be perusing through the latest issue of Refer Journal and come across my latest article Wikipedia in the Library. Andrew Gray of the British Library and I focus on the need and opportunity of bringing Library data in Wikipedia. Form the introduction,
… Read the rest
Wikipedia has traditionally been a divisive topic among librarians and academics. Its goal is undeniably positive and almost utopian – access to all of human knowledge, in every language, offered freely to the world. In practice, however, it can typify “the problem of the internet” – a morass of disorganised information, of dubious accuracy and reliability, offered up without authority or control.
If you read Wikipedia in a more than one language you’ll have noticed the sidebar sometimes ready to link you to the topic of the current article in one or more other languages. If you’ve been following the trends you’ll know that Wikidata is now in charge of keeping these language links in order. (To understand more about how Wikidata works watch my youtube tutorial starting at 5:15) One upshot of that is that we can easily count these links and understand more about the Wikipedia projects – like how “unique” different Wikipedias are. I define a unique Wikidata Item of a language X to be a Wikidata Item that has only one language link, and the language link is in language X.… Read the rest