Pop-tarts and Gutisk

One of the most attractive things about Wikipedia is its magmatic nature. Here is Nicholson Baker discussing the vicissitudes of just one of its pages:

The Pop-Tarts page is often aflutter. Pop-Tarts, it says as of today (February 8, 2008), were discontinued in Australia in 2005. Maybe that’s true. Before that it said that Pop-Tarts were discontinued in Korea. Before that Australia. Several days ago it said: “Pop-Tarts is german for Little Iced Pastry O’ Germany.” Other things I learned from earlier versions: More than two trillion Pop-Tarts are sold each year. George Washington invented them. They were developed in the early 1960s in China. Popular flavors are “frosted strawberry, frosted brown sugar cinnamon, and semen.” Pop-Tarts are a “flat Cookie.” No: “Pop-Tarts are a flat Pastry, KEVIN MCCORMICK is a FRIGGIN LOSER notto mention a queer inch.” No: “A Pop-Tart is a flat condom.” Once last fall the whole page was replaced with “NIPPLES AND BROCCOLI!!!!!”

Another reason, among the hundreds there are, why I love Wikipedia is the amazing number of languages it has embraced, including some like Bishnupriya Manipuri   বিষ্ণুপ্রিযা় মণিপুরী  or Gutisk, which I see as a row of empty boxes.

If you go to Wikipedia’s front page http://www.wikipedia.org/, there is a list of languages, which starts with the top ten in terms of number of entries. I was surprised to see Polish there and even more surprised that there were more articles in Polish than in Russian (766,000 vs. 650,000 when I looked). I thought it might be interesting to try and develop a Wikipedia Linguistic Productivity Index and see how many native speakers it took to write one Wikipedia article.

Having analysed the figures from some of the languages at the top of the table, I can tell you that while it takes only 32 native Dutch speakers and 52 Polish speakers to put together a Wikipedia entry, you need 221  people for Russian, 266 for Portuguese and 468 for Spanish ! If you are a native Spanish speaker you should submit an article now, you have a lot of ground to make up.

The list goes down to  Tshivenḓa • isiXhosa • Zeêuws • isiZulu , all with more than 100 entries, but in actual fact, if you click on “other languages”, you get a new page which lists all of the 278 languages used and ends with almost perfect neatness with Afar (6 entries), Kuanyama (5), Hiri Motu (3), Muscogee (2), Kanuri (1) and Herero (0). I was interested in finding out what the article in Kanuri was about but I couldn’t find my way there. But it must have been hotly discussed. The information provided tells us “1 article, 4,388 edits, 123 active users. Even more interesting is what the non-existent article in Herero might have been about (4,314 edits).

I haven’t worked out a Linguistic Productivity Index for all the languages yet, but I found that on Wikipedia they have already made a similar calculation entitled Wikipedia articles per population. It works on the basis of the total number of speakers, not just native speakers. I am not sure whether that includes people who have read books like Learn isiXhosa overnight or not. At the top of this table, is the amazing result of  the artificial language Volapük, which has produced 118,799 articles with only 25 total speakers. Something is not right in that, surely. Volapük was invented in 1880, can it only have 25 people who are able to speak it? And that is almost the same number of articles as are listed for Arabic (125,000).That would mean that on average each Volapükist has written 4,751 articles. If they have that much energy, surely they should be doing something else, for example knocking on people’s doors like Mormons or Jehova’s Witnessses and converting the world to Volapük.

If you discount Volapük then as well as Ido and Interlingua, two other international languages which come in second and third, the first natural language is Aragonese with its own amazing result of 22,947 articles from a speakership of 21,000 obviously hard-working people. That is over one each.

A famous saying goes: “A language is a dialect with an army” and I was about to attribute the prize for a language with an army to Icelandic, except that Wikipedia tells me that Iceland doesn’t have a standing army. It will have to make do with winning in the “Language with a Coastguard” category. With 300,000 speakers Icelandic has produced 28,220 articles, which means on my scale that it takes 10.6 Icelandic speakers to produce a Wikipedia article.

When I complete my research I will give you my full results. Anyway the purpose of all this was really to tell you that, if you like Wikipedia as much as I do, do read the complete article Nicholson Baker wrote about it for the New York Review of Books in 2008 here. It is truly excellent.

