@Wikidata is no relational #database

10:18, Tuesday, 09 2019 April UTC
When you consider the functionality of Wikidata, it is important to appreciate it is not a relational database. As a consequence there is no implicit way to enforce restrictions. Emulating relational restrictions fail because it is not possible to check in real time what it is that is to be restricted.

An example: in a process new items are created when there is no item available with an external identifier. Query indicates that there is no item in existence and a new item is created. A few moments later the existence of an item with the same external identifier is checked using query. Because of the time lag that exists, what is known to be in the database and what actually is in the database differs and query indicates there is no item and a new but duplicate item is created.

Implications are important.

Wikidata is a wiki. The implications are quite different. In a wiki things need not be perfect, and the restrictions of a relational model are in essence recommendations only. In such a model duplicate items as described above are not a real problem, batch jobs may merge these items when they occur often enough. Processes may use arrays knowing the items it created earlier and thereby minimising the issue.

Important is that we do not blame people for what Wikidata is not and accept its limitations. Functionality like SourceMD enable what Wikidata may become; a link to all knowledge. Never mind if it is knowledge in Wikipedia articles, scholarly articles or in sources used to prove whatever point.
Thanks,
      GerardM

This Month in GLAM: March 2019

20:54, Monday, 08 2019 April UTC
  • Albania report: WikiFilmat SQ – new articles about the Albanian movie industry!
  • Armenia report: Art+Feminism+GLAM, Collaboration with Hovhannes Toumanian museum
  • Australia report: Art+Feminism 2019 in Australia
  • Brazil report: The GLAM at USP Museum of Veterinary Anatomy: a history of learnings and improvements
  • Colombia report: Moving GLAM institutions inside and outside Colombia
  • Czech Republic report: Edit-a-thon Prachatice
  • France report: Wiki day at the Institut national d’histoire de l’art; Age of wiki at the Musée Saint-Raymond
  • India report: Gujarat Vishw Kosh Trust content donation to Wikimedia
  • Italy report: Italian librarians in Milan
  • Macedonia report: WikiLeague: Edit-a-thon on German Literature
  • Netherlands report: WikiconNL, International Womens Day and working together with Amnesty, Field study Dutch Libraries and Wikimedia
  • Serbia report: Spring residences and a wiki competition
  • Sweden report: UNESCO; Working life museums; Swedish Performing Arts Agency shares historic music; Upload of glass plates photographs
  • UK report: Wiki-people and Wiki-museum-data
  • USA report: Women’s History Month and The Met has two Wikimedians in the house
  • Wikidata report: Go Siobhan!
  • WMF GLAM report: Structured Data on Wikimedia Commons; Bengali Wikisource case study
  • Calendar: April’s GLAM events

The Historian’s Craft

18:44, Monday, 08 2019 April UTC

One of the first things that a budding historian learns is the value of good research skills. The second is how to take those results and pull them together into a comprehensive and readable work that can be shared with others. It is no coincidence that these are two skills that Wikipedia volunteers also quickly discover are of invaluable worth when creating or improving a Wikipedia article. As such, it should be of no surprise that Oregon State University instructor Dr. Stacey Smith chose to have her students in her course practice their research and writing skills by contributing content to Wikipedia during the fall of 2018, where their work on African American abolitionists can be read by the entire world. Their work resulted in the creation of multiple new articles on people who lacked articles and the improvement of several that already existed on Wikipedia.

One of the new articles is about William Lambert, a prominent African-American citizen and abolitionist in Detroit, Michigan during the mid to late 19th century. He was born free and was educated by a Quaker schoolmaster, who not only gave him an excellent education but also introduced Lambert to the abolitionist movement. In his twenties Lambert was living in Detroit and working in a tailor shop. It was here that he met George DeBaptiste, with whom he would work and collaborate with on abolitionist matters and on the Underground Railroad. Lambert is perhaps most well known for assisting the fugitive slave Robert Cromwell, who escaped his owner John Dun and fled to Canada, where he could live in freedom. Lambert was responsible for exerting his influence and placing Dun in jail, giving Cromwell the ability to successfully reach Canada. This wasn’t without repercussion, as these actions influenced politicians to pass the Fugitive Slave Act in 1850, which greatly reduced the ability for slaves to escape the cruelty of slavery.

Louisa Matilda Jacobs, public domain via Wikimedia Commons. Photo uploaded by a student at Oregon State University.

Another Wikipedia article that students created was about Louisa Matilda Jacobs, an African American abolitionist and civil rights activist and the daughter of famed fugitive slave and author, Harriet Ann Jacobs. Her mother was a mistress to congressman and newspaper editor Samuel Tredwell Sawyer, Louisa’s father. Harriet was the slave of Dr. James Norcom, who tried to force her into a sexual relationship by threatening her children. She fled, expecting that Norcom would sell her children. This expectation was correct as Sawyer purchased the children and helped them make their way to safety and freedom. Jacobs was eventually reunited with her mother and the two fled to Boston, where she was educated at home until her father paid for her to attend a seminary school in New York. She returned to Boston, where she received training to become a teacher. With her mother, Jacobs founded Jacobs Free School, a Freedmen’s School in Alexandria, Virginia. In 1866 she opened a second one in Georgia called the Lincoln School. She was also active in the activism movement and spoke about women’s suffrage on an American Equal Rights Association lecture tour alongside Susan B. Anthony and Charles Lenox Remond. She also worked as a matron of the National Home for the Relief of Destitute Colored Women and Children and at Howard University.


Interested in adapting a Wikipedia writing assignment to fit your course? Visit teach.wikiedu.org for all you need to know to get started.

Tech News issue #15, 2019 (April 8, 2019)

00:00, Monday, 08 2019 April UTC
TriangleArrow-Left.svgprevious 2019, week 15 (Monday 08 April 2019) nextTriangleArrow-Right.svg
Other languages:
Bahasa Indonesia • ‎English • ‎español • ‎français • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎کوردی • ‎हिन्दी • ‎中文 • ‎日本語

The European Union (EU) Commission’s proposal for a Regulation on preventing the dissemination of terrorist content online runs the risk of repeating many of the mistakes written into the copyright directive, envisioning technological solutions to a complex problem that could bring significant damage to user rights. The proposal includes a number of prescriptive rules that will create frameworks for censorship and potentially harm important documentation about terrorism online. It would further enshrine the rule and power of private entities over people’s right to discuss their ideas.

However, there are still ways to shape this proposal to further its objectives and promote accountability. The report on the proposal will be up for a vote in the Civil Liberties and Justice Committee (LIBE) in the European Parliament on 8 April, and Wikimedia urges the committee to consider the following advice:

1. Stop treating the internet like one giant, private social media platform

According to the draft, any platform that hosts third party content—from social media to Wikimedia projects like Wikipedia and potentially to services hosting private files—needs to describe how it deals with content that may be related to terrorism in its own terms of service. While people can talk about terrorism in different forms and for different purposes, such as for research, awareness raising, and news reporting, the regulation would force platforms to decide what is and what is not an acceptable way to have these conversations.

Yet, the law should not mirror websites’ approach to curbing illegal content by applying their terms of service because this would remove incentives to do better. The proposed regulation would oblige all platforms to act in a similar manner, regardless of the content they host or their operational model. That includes Wikipedia, where a rigorous set of top-down policies may interfere with its robust and effective system of transparent community dispute resolution over content.

Instead, legislators should clearly define illegal terrorist content and leave hosting service providers little room for interpretation.

2. Let courts decide, not machines

Similarly to the new copyright directive, the regulation envisions the use of automated tools to proactively detect, identify and disable access to terrorist content. Deciding what is and what is not expression that condones terrorism is a complicated matter and context is crucial in deciding whether content is illegal under anti-terrorist laws. Such decisions need to be made by courts, not by algorithms which may or may not be subject to human oversight.

Where law enforcement relies on code, the code becomes the law. That goes against how our free knowledge projects operate, with vibrant and open deliberation on what should have its place on Wikipedia, and what shouldn’t. Platforms’ content moderation should build on a proper framework that involves well-prepared people, not only machines.

3. Do not overturn the principles of free expression

Freedom of expression is a right that can only be exercised by the practice of expressing one’s thoughts, ideas, or opinion. Boundaries are only applied when that expression is deemed unacceptable. Content filtering works on exactly the opposite premises—prematurely stifling expression before it has a chance to be heard and assessed.

Any reference to measures that may lead to proactive content filtering should be removed from the proposal. Upload filters overturn jurisprudence and legal practices in all jurisdictions that recognize freedom of expression as a human right. They operate in secrecy and their decisions are shrouded in trade secrets of companies running them. Relying on these technologies may stop some of the communication we don’t want, but it is not worth the price of undermining the foundation of free expression.

4. Do not force websites to remove legal content

The proposal envisions that, in addition to content removal orders, the competent authority can issue a referral to request a company check whether content violates their terms of service. Platforms will face penalties if they do not speedily address these referrals, which creates a strong incentive to act non-transparently and remove content that may in fact be legal.

The measure should be removed from the proposal. Instead, authorities tasked with tackling terrorist content should be required to focus on cases where the terrorist context is evident and issue an order to remove the piece of content in question. Lawmakers need to leave room for the less evident cases to be discussed as acceptable freedom of expression.

The LIBE Committee will vote on a few good changes that have been proposed, most notably the removal of proactive measures and referrals. Providing an exclusion for content disseminated for educational, artistic, journalistic or research purposes is a good idea. However, if the dissemination of terrorist content does not need to be intentional to be removed (as in calling for aiding and abetting terrorist activities), a lot of important information may still get caught up in a surge of removals. We hope that the committee responsible for ensuring respect for civil liberties in EU legislation, will rise to the occasion. We will continue to monitor the legislative process for this regulation and remain committed to defending and promoting free knowledge.

Anna Mazgal, EU Policy Adviser, Wikimedia Germany (Deutschland)
Jan Gerlach, Senior Public Policy Manager, Wikimedia Foundation

weeklyOSM 454

10:09, Sunday, 07 2019 April UTC

26/03/2019-01/04/2019

Logo

The Magic Roundabout – in OSM with some “mapping for the renderer” 1 | © Alby (CC BY-SA 2.0)

Mapping

  • User yumean1119 announced that voting has started for the improved proposal (automatic translation) on the use of highway=cycleway in Japan. The vote ends 14 April.
  • Silent Spike proposed leisure=inflatable_park for playgrounds with inflatable equipment and asks for comments on his proposal.
  • If you’ve ever wanted to add a tag to show that a railway track is not used regularly for scheduled services, including trams and subways, you might be interested in SelfishSeahorse’s proposal. The Request for Comments period has just started and feedback is much appreciated.
  • Warin proposes a new way to mark areas of steps such as the Spanish Steps in Rome. His proposal is now up for discussion.
  • The vote on the proposal that was drafted for better differentiation of various kinds of police facilities has been postponed to allow more discussion on the topic.

Community

  • A number of people have expressed concern about the way the development of iD is organised and specifically how iD is influencing tagging in OSM. This time Frederik Ramm, board member of both OSMF and the German chapter of OSM, raised (de) his concerns and asked for feedback. (automatic translation)
  • An issue on GitHub about tagging track type in the iD editor became contentious. @matkoniecz suggested that iD deliberately pushes “probably not safe for most cars” as a definition for highway=track, and Bryan Housel, iD maintainer, responded “I basically just disregard everything on the tagging mailing list and the OSM wiki”. This was used as an example in the discussion referred to in the previous item…
  • The application period for this year’s Google Summer of Code is now open and, as in previous years, OSM has been been accepted as a mentoring organisation. You, as a student, have the chance of getting paid while working on open source software projects such as OSM.
  • Former OSMF board member Ilya Zverev pens a critique (automatic translation) of OSM’s “do-ocracy”.
  • OpenStreetMap Belgium published an article titled Heritage in Flanders and Crowd-Sourced Projects. They explain how to add information about heritage sites in OSM and Wikimedia projects such as Wikipedia, Wikidata and Wikimedia Commons. The article doesn’t just focus on gathering and adding information but also on how to get and use information on historical places.

OpenStreetMap Foundation

Events

  • The Albanian Open Source Conference OSCAL 2019, which will take place on 18 and 19 May 2019 in Tirana, Albania, is looking for speakers.
  • For mappers who like to plan well in advance: The FOSSGIS 2020, a geospatial conference in Germany that covers many open source, open data and OSM topics, will be held on 11 to 14 March 2020 in Freiburg in Breisgau, Germany. The article also features (de) highlights of the recent conference. (automatic translation)

Humanitarian OSM

  • Following requests received by HOT from several humanitarian partners they are providing support to assist on the ground teams already providing humanitarian aid in Mozambique, Malawi and Zimbabwe after Cyclone Idai. HOT is specifically looking for people with experience as Data specialist or Partner liaison. Apparently, HOT also allows applications from people requiring reimbursement for their activity.
  • After Cyclone Idai hit Mozambique and Zimbabwe, HOT is looking for help to support the local emergency response such as Doctors Without Borders. The article lays out different ways you can support the efforts.
  • The openrouteservice instance for disaster management, operated by the Heidelberg Institute for Geoinformation Technology, is supporting HOT’s activation following Cyclone Idai by updating the data for routing almost hourly.

Maps

  • The MapOSMaptic instance at osm-baustelle.de has implemented a hillshading overlay.
  • Andrew Harvey announced the re-launch of a modernised version of BeyondTracks. BeyondTracks started life in 2012 with the aim of helping people find enjoyable walks around Sydney, Australia. The scope of the site has expanded to cover all of Australia and people who would like to suggest a walk are encouraged to contact BeyondTracks.

switch2OSM

  • A Spanish version of the NewsHereNow website/webapp is now available.Simply click on the “Use my GPS location” button to find the physically nearest, non-chain (“local”) restaurants. If your favourite restaurant is missing, please make sure it (including its website URL) is on OpenStreetMap. There is also a direct link to OpenStreetMap centred at your nearest intersection.

Open Data

  • EldoHub and OpenStreetMap Kenya received funding from Mapbox and the UK government to take part in the 2019 Open Data Day, on 2 March 2019. The Open Knowledge Foundation blogged about the events in Kenya. These took place in EldoHub, a technology innovation hub in Uasin Gishu County; and in Nairobi, where an open mapping track was held to discuss topics such as open spatial data and crowdsourced mapping.
  • ČÚZK, the Czech national survey agency, just published some of their data for commercial and non-commercial use as open data. We are getting another great source for OpenStreetMap in the Czech Republic.
  • The map at electricitymap.org visualises CO₂ emissions and other details about countries’ mix of the production and consumption of energy. The map is currently in German only. However, the project is open source and the developers are asking for help with translation.

Software

  • Nick Whitelegg announced the release of Hikar 0.2.0, an augmented reality app for walkers and hikers. The app is available for Android and the code is open source. The app, which helps the user with footpaths and signposts on the camera feed, now covers the whole of Europe.
  • User clementroux writes (fr) (automatic translation) in his blog about his app Next2Me, which finds and displays facilities with the key amenity=* in an adjustable radius around the current location – of course based on OSM.

Releases

  • JOSM has been updated to version 14945. The validator received various improvements and the Add Tag dialog shows now last used tags not only last added tags.
  • Version 0.12 of the GraphHopper Routing Engine has been released. Turn restrictions and turn costs are now supported in the speed mode, the Isochrone module was made compatible with public transit and the average speed calculation was improved.
  • User Wambacher is constantly updating his OSM Software Watchlist, which includes almost all OSM-related software along with the date of their latest release. Most recently Basecamp, Mapillary, Vespucci and many others received an update. Wambacher is highlighting one new addition: OSHDB, a high-performance data analysis platform for OSM’s full-history data.

Did you know …

  • … how to tag a bus bay where buses can stop without blocking traffic? The tag bus_bay can be added to the node of the stopping position of the bus – or it can be added to the road, representing the full length of the bay. JOSM’s PT_Assistant plugin provides support for adding them conveniently on ways with its double split map mode.
  • [1] … the “Magic Roundabout“? On OSM it looks like this.

OSM in the media

  • Le Monde describes (automatic translation) how local mappers in various African cities are using the JungleBus app to map local bus routes.
  • eepublishers in South Africa understand more about the threat to OpenStreetMap posed by the EU Copyright Directive than members of the European Parliament. Or should the EU MPs be interest-driven?

Other “geo” things

  • The deployment of “Strava Scramblers” to reduce conflicts at popular spots made local riders angry.(This April fool’s day article caught many readers off guard).
  • The AGILE conference in Limassol, Cyprus (17 to 20 June 2019), which is about Geospatial Technologies for Local and Regional Development, will organise a workshop called “GEOgraphical and CULTural aspects of Geo-information: Issues and Solutions” and invites you to submit a short OSM paper.

Upcoming Events

Where What When Country
Kawagoe 川越お花見マッピングパーティ2019 2019-04-06 japan
La Riche La Riche (37)#Ateliers d’initiation à OpenStreetMap 2019-04-06 france
Kyoto お花見!オープンデータソン in 京都 2019-04-07 japan
Rennes Réunion mensuelle 2019-04-08 france
Bordeaux Réunion mensuelle 2019-04-08 france
Essen Mappertreffen 2019-04-08 germany
Taipei OSM x Wikidata #3 2019-04-08 taiwan
Toronto Toronto Mappy Hour 2019-04-08 canada
Lyon Rencontre mensuelle pour tous 2019-04-09 france
Munich Münchner Stammtisch 2019-04-09 germany
Salt Lake City SLC Mappy Hour 2019-04-09 united states
Viersen OSM Stammtisch Viersen 2019-04-09 germany
Cologne Stammtisch Köln 2019-04-10 germany
Buenos Aires Taller Introducción a JOSM en FOSS4G-AR 2019 2019-04-10 argentina
Leoben Stammtisch Obersteiermark 2019-04-11 austria
Zurich OSM Stammtisch Zurich 2019-04-11 switzerland
Berlin 130. Berlin-Brandenburg Stammtisch 2019-04-12 germany
Salt Lake City University of Utah Campus Mapping Party 2019-04-13 united states
Biella Incontro mensile 2019-04-13 italia
Cologne Bonn Airport Bonner Stammtisch 2019-04-16 germany
Lüneburg Lüneburger Mappertreffen 2019-04-16 germany
Reutti Stammtisch Ulmer Alb 2019-04-16 germany
Toulouse Rencontre mensuelle 2019-04-17 france
Karlsruhe Stammtisch 2019-04-17 germany
Bremen Bremer Mappertreffen 2019-04-22 germany
Salt Lake City SLC Map Night 2019-04-23 united states
Nottingham Nottingham pub meetup 2019-04-23 england
Joué-lès-Tours Rencontre Mensuelle 2019-04-23 france
Barcelona #geomobBCN 2019-04-24 spain
Montpellier Réunion mensuelle 2019-04-24 france
Düsseldorf Stammtisch 2019-04-24 germany
Phone/Video Conferencing Mappy Hour US 2019-04-24 united states
Lübeck Lübecker Mappertreffen 2019-04-25 germany
Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, NunoMASAzevedo, Polyglot, Rogehm, SK53, SunCobalt, TheSwavu, YoViajo, derFred.

One of the key mechanisms that allows Wikipedia to maintain its high quality is the use of inline citations. Through citations, readers and editors make sure that information in an article accurately reflects its source. As Wikipedia’s verifiability policy mandates, “material challenged or likely to be challenged, and all quotations, must be attributed to a reliable, published source”, and unsourced material should be removed or challenged with a citation needed flag.

However, deciding which sentences need citations may not be a trivial task. On the one hand, editors are urged to avoid adding citations for information that is obvious or common knowledge—like the fact that the sky is blue. On the other hand, sometimes the sky doesn’t actually appear blue—so perhaps we need a citation for that after all?

Scale up this problem to the size of an entire encyclopedia, and it may become intractable. Wikipedia editors’ time is limited and their expertise is valuable—which kinds of facts, articles, and topics should they focus their citation efforts on?  Also, recent estimates show that a substantial proportion of articles have only a few references, and that one out of four articles in English Wikipedia does not have any references at all. This suggests that while around 350,000 articles contain one or more citation needed flags, we are probably missing many more.

We recently designed a framework to help editors identify and prioritize which sentences need citations in Wikipedia. Through a large study that we conducted with editors from English, Italian and French Wikipedia, we first identified a set of common reasons why individual sentences in Wikipedia articles require citations. We then used the results of this study to train a machine learning model classifier that can predict whether or not any given sentence needs a citation —and why—on the English Wikipedia. It will be deployed in the next 3 months to other language editions.

By improving the identification of where Wikipedia gets its information from, we can support the development of systems to help volunteer-driven verification and fact-checking, potentially increasing Wikipedia’s long-term reliability and making it more robust against biases, information quality gaps and coordinated disinformation campaigns

Why do we cite?

To teach machines how to recognize unverified statements, we first needed to systematically classify the reasons why sentences need citations.

We started by examining policies and guidelines related to verifiability in the English, French, and Italian Wikipedias and attempted to characterize the criteria for adding (or not adding) a citation described in those policies. To verify and enrich this set of best practices, we asked 36 Wikipedia editors from all three language communities to participate in a pilot experiment. Using WikiLabels, we collected editors’ feedback on sentences from Wikipedia articles: editors were asked to decide whether a sentence needed a citation and to specify a reason for their choices in a free-text form.

Our methods and our final set of reasons for adding or not adding a citation can be found on our project page.

Reasons for adding a citation.
Reasons for not adding a citation.

Teaching a machine to discover citation gaps.

Next, we trained a machine learning model to discover sentences needing citations, and characterize them with a matching reason.

We first trained a model to learn from the wisdom of the whole editor community how to identify sentences that need to be cited. We created a dataset of English Wikipedia’s “featured” articles, the encyclopedia’s designation for articles that are of the highest quality—and also the most well-sourced with citations. Sentences from featured articles that contain an inline citation are considered as positives, and sentences without an inline citation are considered as negatives. With this data, we trained a Recurrent Neural Network that can predict whether the sentence is positive,  (should have a citation), or negative (should not have a citation) based on the sequence of words in the sentence. The resulting model can correctly classify sentences in need of citation with an accuracy of up to 90%.

Explaining algorithmic predictions

But why is the model up to 90% accurate? What is the algorithm looking at when deciding whether a sentence needs a citation?

To help interpret these results, we took a sample of sentences needing citations for different reasons, and highlighted words the model considered the most when it classified the sentences. In the case of “opinion” statements, for example, the model assigned the highest weight to the word “claimed”. In the “statistics” citation reason, the most important words to the model are verbs that are often used in reporting numbers. In the case of scientific citation reasons, the model pays more attention to domain-specific words like “quantum”.

Examples of sentences that need citations according to our model, with key words highlighted.

Predicting why a sentence needs a citation

Similar to the “reason” field of the [citation needed] tag, we want our model to also provide full explanations of citation reasons. Therefore we created a model that can classify statements needing citations with a reason. We first designed a crowdsourcing experiment using Amazon Mechanical Turk to collect labels about citation reasons. We randomly sampled 4,000 sentences that contain citations from Featured articles, and asked crowdworkers to label them with one of the eight  citation reason categories we identified in our previous study. We found that sentences more likely need citations when they are related to scientific or historical facts, or when they reflect direct/indirect quotations.

We modified the neural network designed in the previous study, so that it can classify an unsourced sentence into one of the 8 citation reason categories. We retrained this network using the crowdsourced labeled data, and found that it provides reasonable accuracy (precision at 0.62) in predicting citation reasons, especially for classes with a substantial amount of training data.

Next steps: predicting “citation need” across languages and topics

The next phase of this project will involve modifying our models so that they can be trained for any language available in Wikipedia. We will use these multilingual models to quantify the proportion of unverified content across Wikipedia editions, and map citation coverage across different article topics, in order to help editors identify areas where adding high quality citations is particularly important.

We plan to make the source code of these new models available  soon. In the meantime, you can check out the research paper, recently accepted at The Web Conference 2019, its supplementary material with detailed analysis of the citation policies, and all the data we used to train the models.

We would love to hear your feedback and comments, so please reach out to us on our project page to help us improve it.

Miriam Redi, Research Scientist, Wikimedia Foundation
Jonathan Morgan, Senior Design Researcher, Wikimedia Foundation
Dario Taraborelli, former Director of Research, Wikimedia Foundation
Besnik Fetahu, Post-doctoral Scientist, L3S Lab Hannover

The authors would like to thank the community members of the English, French, and Italian Wikipedias, along with workers from Amazon Mechanical Turk, for helping with data labeling and for their precious suggestions.

A buggy history

05:32, Wednesday, 03 2019 April UTC
—I suppose you are an entomologist?—I said with a note of interrogation.
—Not quite so ambitious as that, sir. I should like to put my eyes on the individual entitled to that name! A society may call itself an Entomological Society, but the man who arrogates such a broad title as that to himself, in the present state of science, is a pretender, sir, a dilettante, an impostor! No man can be truly called an entomologist, sir; the subject is too vast for any single human intelligence to grasp.
The Poet at the Breakfast Table (1872) by Oliver Wendell Holmes, Sr. 
 
A collection of biographies
with surprising gaps (ex. A.D. Imms)
The history of interest in Indian insects has been approached by many writers and there are several bits and pieces available in journals and there are various insights distributed across books. There are numerous ways of looking at how people historically viewed insects. One attempt is a collection of biographies, some of which are uncited verbatim (and not even within quotation marks) accounts  from obituaries, by B.R. Subba Rao who also provides something of a historical thread connecting the biographies. Keeping Indian expectations in view, Subba Rao and M.A. Husain play to the crowd. Husain was writing in pre-Independence times where there was a genuine conflict between Indian intellectuals and their colonial masters. They begin with interpretations of mentions of insects in old Indian writings. As can be expected there are mentions of honey, shellac, bees, ants, and a few nuisance insects in old texts. Husain takes the fact that the term Satpada षट्पद or six-legs existed in the 1st century Amarakosa to suggest that Indians were far ahead of time because Latreille's Hexapoda, the supposed analogy, was proposed only in 1825. Such histories gloss over the structures on which science and one can only assume that they failed to find the development of such structures in the ancient texts that they examined. The identification of species mentioned in old texts are often based on ambiguous translations should leave one wondering what the value of claiming Indian priority in identifying a few insects is. For instance K.N. Dave translates a verse from the Atharva-veda and suggests an early date for knowledge of shellac. This interpretation looks dubious and sure enough, Dave has been critiqued by Mahdihassan.  The indragopa (Indra's cowherd) is supposedly something that appears after the rains. Sanskrit scholars have identified it variously as the cochineal insect (the species Dactylopius coccus is South American!), the lac insect, a firefly(!) and as Trombidium (red velvet mite) - the last matches the blood red colour mentioned in a text attributed to Susrutha. To be fair, ambiguities resulting from translation are not limited to those that deal with Indian writing. Dikairon (Δικαιρον), supposedly a highly-valued and potent poison from India was mentioned in the work Indika by Ctesias 398 - 397 BC. One writer said it was the droppings of a bird. Valentine Ball thought it was derived from a scarab beetle. Jeffrey Lockwood claimed that it came from the rove beetles Paederus sp. And finally a Spanish scholar states that all this was a misunderstanding and that Dikairon was not a poison, and believe it or not, was a masticated mix of betel leaves, arecanut, and lime! One gets a far more reliable idea of ancient knowledge and traditions from practitioners, forest dwellers, the traditional honey harvesting tribes, and similar people that have been gathering materials such as shellac and beeswax. Unfortunately, many of these traditions and their practitioners are threatened by modern laws, economics, and culture. These practitioners are being driven out of the forests where they live, and their knowledge was hardly ever captured in writing. The writers of the ancient Sanskrit texts were probably associated with temple-towns and other semi-urban clusters and it seems like the knowledge of forest dwellers was not considered merit-worthy.

A more meaningful overview of entomology may be gained by reading and synthesizing a large number of historical bits, of which there are a growing number. The 1973 book published by the Annual Reviews Inc. should be of some interest. I have appended a selection of sources that I have found useful in adding bits and pieces to form a historic view of entomology in India. It helps however to have a broader skeleton on which to attach these bits and minutiae. Here, there area also truly verbose and terminology-filled systems developed by historians of science (for example, see ANT). I prefer an approach that is free of a jargon overload and like to look at entomology and its growth along three lines of action - cataloguing with the main product being collection of artefacts and the assignment of names, communication and vocabulary-building are social actions involving groups of interested people who work together with the products being scholarly societies and journals, and pattern-finding where hypotheses are made, and predictions tested. I like to think that anyone learning entomology also goes through these activities, often in this sequence. With professionalization there appears to be a need for people to step faster and faster into the pattern-finding way which also means that less time is spent on the other two streams of activity. The fast stepping often is achieved by having comprehensive texts, keys, identification guides and manuals. The skills involved in the production of those works - ways to prepare specimens, observe, illustrate, or describe are often not captured by the books themselves.

Cataloguing

The cataloguing phase of knowledge gathering, especially of the (larger and more conspicuous) insect species of India grew rapidly thanks to the craze for natural history cabinets of the wealthy (made socially meritorious by the idea that appreciating the works of the Creator was as good as attending church)  in Britain and Europe and their ability to tap into networks of collectors working within the colonial enterprise. The cataloguing phase can be divided into the non-scientific cabinet-of-curiosity style especially followed before Darwin and the more scientific forms. The idea that insects could be preserved by drying and kept for reference by pinning, [See Barnard 2018] the system of binomial names, the idea of designating type specimens that could be inspected by anyone describing new species, the system of priority in assigning names were some of the innovations and cultural rules created to aid cataloguing. These rules were enforced by scholarly societies, their members (which would later lead to such things as codes of nomenclature suggested by rule makers like Strickland, now dealt with by committees that oversee the  ICZN Code) and their journals. It would be wrong to assume that the cataloguing phase is purely historic and no longer needed. It is a phase that is constantly involved in the creation of new knowledge. Labels, catalogues, and referencing whether in science or librarianship are essential for all subsequent work to be discovered and are essential to science based on building on the work of others, climbing the shoulders of giants to see further. Cataloguing was probably what the physicists derided as "stamp-collecting".

Communication and vocabulary building

The other phase involves social activities, the creation of specialist language, groups, and "culture". The methods and tools adopted by specialists also helps in producing associations and the identification of boundaries that could spawn new associations. The formation of groups of people based on interests is something that ethnographers and sociologists have examined in the context of science. Textbooks, taxonomic monographs, and major syntheses also help in building community - they make it possible for new entrants to rapidly move on to joining the earlier formed groups of experts. Whereas some of the early learned societies were spawned by people with wealth and leisure, some of the later societies have had other economic forces in their support.

Like species, interest groups too specialize and split to cover more specific niches, such as those that deal with applied areas such as agriculture, medicine, veterinary science and forensics. There can also be interest in behaviour, and evolution which, though having applications, are often do not find economic support.

Pattern finding
Eleanor Ormerod, an unexpected influence
in the rise of economic entomology in India

The pattern finding phase when reached allows a field to become professional - with paid services offered by practitioners. It is the phase in which science flexes its muscle, specialists gain social status, and are able to make livelihoods out of their interest. Lefroy (1904) cites economic entomology as starting with E.C. Cotes [Cotes' career in entomology was short, after marrying the famous Canadian journalist Sara Duncan in 1889 he too moved to writing] in the Indian Museum in 1888. But he surprisingly does not mention any earlier attempts, and one finds that Edward Balfour, that encyclopaedic-surgeon of Madras collated a list of insect pests in 1887 and drew inspiration from Eleanor Ormerod who hints at the idea of getting government support, noting that it would cost very little given that she herself worked with no remuneration to provide a service for agriculture in England. Her letters were also forwarded to the Secretary of State for India and it is quite possible that Cotes' appointment was a result.

As can be imagined, economics, society, and the way science is supported - royal patronage, family, state, "free markets", crowd-sourcing, or mixes of these - impact the way an individual or a field progresses. Entomology was among the first fields of zoology that managed to gain economic value with the possibility of paid employment. David Lack, who later became an influential ornithologist, was wisely guided by his father to pursue entomology as it was the only field of zoology where jobs existed. Lack however found his apprenticeship (in Germany, 1929!) involving pinning specimens "extremely boring".

Indian reflections on the history of entomology

Kunhikannan died at the rather young age of 47
A rather interesting analysis of Indian science is made by the first native Indian entomologist to work with the official title of "entomologist" in the state of Mysore - K. Kunhikannan. Kunhikannan was deputed to pursue a Ph.D. at Stanford (for some unknown reason many of the pre-Independence Indian entomologists trained in Stanford rather than England - see postscript) through his superior Leslie Coleman. At Stanford, Kunhikannan gave a talk on Science in India. He noted in his 1923 talk :

In the field of natural sciences the Hindus did not make any progress. The classifications of animals and plants are very crude. It seems to me possible that this singular lack of interest in this branch of knowledge was due to the love of animal life. It is difficult for Westerners to realise how deep it is among Indians. The observant traveller will come across people trailing sugar as they walk along streets so that ants may have a supply, and there are priests in certain sects who veil that face while reading sacred books that they may avoid drawing in with their breath and killing any small unwary insects. [Note: Salim Ali expressed a similar view ]
He then examines science sponsored by state institutions, by universities and then by individuals. About the last he writes:
Though I deal with it last it is the first in importance. Under it has to be included all the work done by individuals who are not in Government employment or who being government servants devote their leisure hours to science. A number of missionaries come under this category. They have done considerable work mainly in the natural sciences. There are also medical men who devote their leisure hours to science. The discovery of the transmission of malaria was made not during the course of Government work. These men have not received much encouragement for research or reward for research, but they deserve the highest praise., European officials in other walks of life have made signal contributions to science. The fascinating volumes of E. H. Aitken and Douglas Dewar are the result of observations made in the field of natural history in the course of official duties. Men like these have formed themselves into an association, and a journal is published by the Bombay Natural History Association[sic], in which valuable observations are recorded from time to time. That publication has been running for over a quarter of a century, and its volumes are a mine of interesting information with regard to the natural history of India.
This then is a brief survey of the work done in India. As you will see it is very little, regard being had to the extent of the country and the size of her population. I have tried to explain why Indians' contribution is as yet so little, how education has been defective and how opportunities have been few. Men do not go after scientific research when reward is so little and facilities so few. But there are those who will say that science must be pursued for its own sake. That view is narrow and does not take into account the origin and course of scientific research. Men began to pursue science for the sake of material progress. The Arab alchemists started chemistry in the hope of discovering a method of making gold. So it has been all along and even now in the 20th century the cry is often heard that scientific research is pursued with too little regard for its immediate usefulness to man. The passion for science for its own sake has developed largely as a result of the enormous growth of each of the sciences beyond the grasp of individual minds so that a division between pure and applied science has become necessary. The charge therefore that Indians have failed to pursue science for its own sake is not justified. Science flourishes where the application of its results makes possible the advancement of the individual and the community as a whole. It requires a leisured class free from anxieties of obtaining livelihood or capable of appreciating the value of scientific work. Such a class does not exist in India. The leisured classes in India are not yet educated sufficiently to honour scientific men.
It is interesting that leisure is noted as important for scientific advance. Edward Balfour, mentioned earlier, also made a similar comment that Indians were too close to subsistence to reflect accurately on their environment!  (apparently in The Vydian and the Hakim, what do they know of medicine? (1875) which unfortunately is not available online)

Kunhikannan may be among the few Indian scientists who dabbled in cultural history, and political theorizing. He wrote two rather interesting books The West (1927) and A Civilization at Bay (1931, posthumously published) which defended Indian cultural norms while also suggesting areas for reform. While reading these works one has to remind oneself that he was working under and with Europeans and would not have been able to have many conversations on these topics with Indians. An anonymous writer who penned the memoir of his life in his posthumous work notes that he was reserved and had only a small number of people to talk to outside of his professional work.
Entomologists meeting at Pusa in 1919
Third row: C.C. Ghosh, Ram Saran, Gupta, P.V. Isaac, Y. Ramachandra Rao, Afzal Husain, Ojha, A. Haq
Second row: M. Zaharuddin, C.S. Misra, D. Naoroji, Harchand Singh, G.R. Dutt, E.S. David, K. Kunhi Kannan, Ramrao S. Kasergode, J.L.Khare, Jhaveri, V.G.Deshpande, R. Madhavan Pillai, Patel, A. Mujtaba, P.C. Sen
First row: Capt. Froilano de Mello, Robertson-Brown, S. Higginbotham, C.M. Inglis, C.F.C. Beeson, Gough, Bainbrigge Fletcher, Bentley, Senior-White, T.V. Rama Krishna Ayyar, C.M. Hutchinson, Andrews, H.L.Dutt


Entmologists meeting at Pusa in 1923
Fifth row (standing) Mukerjee, G.D.Ojha, Bashir, Torabaz Khan, D.P. Singh
Fourth row (standing) M.O.T. Iyengar, R.N. Singh, S. Sultan Ahmad, G.D. Misra, Sharma,Ahmad Mujtaba, Mohammad Shaffi
Third row (standing) Rao Sahib Y Rama Chandra Rao, D Naoroji, G.R.Dutt, Rai Bahadur C.S. Misra, SCJ Bennett (bacteriologist, Muktesar), P.V. Isaac, T.M. Timoney, Harchand Singh, S.K.Sen
Second row (seated) Mr M. Afzal Husain, Major RWG Hingston, Dr C F C Beeson, T. Bainbrigge Fletcher, P.B. Richards, J.T. Edwards, Major J.A. Sinton
First row (seated) Rai Sahib PN Das, B B Bose, Ram Saran, R.V. Pillai, M.B. Menon, V.R. Phadke (veterinary college, Bombay)

Note: As usual, these notes are spin-offs from researching and writing Wikipedia entries, in this case on several pioneering Indian entomologists. It is remarkable that even some people in high offices, such as P.V. Isaac, the last Imperial Entomologist, and grandfather of noted writer Arundhati Roy, is largely unknown (except as the near-fictional Pappachi in Roy's God of Small Things)


References
An index to entomologists who worked in India or described a significant number of species from India - with links to Wikipedia links (where possible - the gaps are huge)
(woefully incomplete - feel free to let me know of additional candidates)

Carl Linnaeus - Johan Christian Fabricius - Edward Donovan - John Gerard Koenig - John Obadiah Westwood - Frederick William Hope - George Alexander James Rothney - Thomas de Grey Walsingham - Henry John Elwes - Victor Motschulsky - Charles Swinhoe - John William Yerbury - Edward Yerbury Watson - Peter Cameron - Charles George Nurse - H.C. Tytler - Arthur Henry Eyre Mosse - W.H. Evans - Frederic Moore - John Henry Leech - Charles Augustus de Niceville - Thomas Nelson Annandale - R.C. WroughtonT.R.D. Bell - Francis Buchanan-Hamilton - James Wood-Mason - Frederic Charles Fraser  - R.W. Hingston - Auguste Forel - James Davidson - E.H. Aitken -  O.C. Ollenbach - Frank Hannyngton - Martin Ephraim Mosley - Hamilton J. Druce  - Thomas Vincent Campbell - Gilbert Edward James Nixon - Malcolm Cameron - G.F. Hampson - Martin Jacoby - W.F. Kirby - W.L. DistantC.T. Bingham - G.J. Arrow - Claude Morley - Malcolm Burr - Samarendra Maulik - Guy Marshall
 
Edward Percy Stebbing - T.B. Fletcher - Edward Ernest Green - E.C. Cotes - Harold Maxwell Lefroy - Frank Milburn Howlett - S.R. Christophers - Leslie C. Coleman - T.V. Ramakrishna Ayyar - Yelsetti Ramachandra Rao - Magadi Puttarudriah - Hem Singh Pruthi - Shyam Sunder Lal Pradhan - James Molesworth Gardner - Vakittur Prabhakar Rao - D.N. Raychoudhary - C.F.W. Muesenbeck  - Mithan Lal Roonwal - Ennapada S. Narayanan - M.S. Mani - T.N. Ananthakrishnan - K. Kunhikannan - Muhammad Afzal Husain

Not included by Rao -   F.H. Gravely - P.V. Isaac - M. Afzal Husain - A.D. Imms - C.F.C. Beeson
 - C. Brooke Worth - Kumar Krishna -


PS: Thanks to Prof C.A. Viraktamath, I became aware of a new book-  Gunathilagaraj, K.; Chitra, N.; Kuttalam, S.; Ramaraju, K. (2018). Dr. T.V. Ramakrishna Ayyar: The Entomologist. Coimbatore: Tamil Nadu Agricultural University. - this suggests that TVRA went to Stanford on the suggestion of Kunhikannan.

    It’s been quite a long time (four and a half years in fact) since I looked at the state of the African language Wiktionaries. For those new to Wiktionary, the idea is that it will describe all words of all languages using definitions and descriptions in the particular language edition. An ambitious task!

    So, how are the projects progressing?

    A portion of the Octateuch in Ethiopian

    African Language Wiktionaries

    Language 30/5/2010 15/5/2011 29/10/2014 22/3/2019 % +
    Malagasy 4,253 3,599,084 5,482,632 52.33%
    Afrikaans 14,669 14,731 15,794 20,831 31.89%
    Swahili 13,000 13,027 13,903 14,029 0.91%
    Wolof 2,689 2,693 2,310 2,312 0.09%
    Somali 1,635
    Sotho 1,389 1,398 1,343 1,343 0.00%
    Lingala 673
    Zulu 131 510 586 599 2.22%
    Igbo (incubator) 375
    Kinyarwanda 306 306 367 366
    Tsonga 359 363 92 359 290.22%
    Oromo 218 264 322 335 4.04%
    Swati 371 377 290 292 0.69%
    Amharic 319 377 206 217 5.34%
    Egyptian Arabic (incubator) 195

    In short, although it’s been so long since the last update, there’s not much to show. The only project to more than double its articles in four and a half years is Tsonga, off a minute base. Malagasy has always had a huge amount of bot activity, and is still growing from a large base, and Afrikaans shows some signs of life. But overall, the state of the African language Wiktionaries can be described as dormant.

    Perhaps the African language Wikipedias will fare better?

    African Language Wikipedias > 1000 articles

    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +
    Malagasy 79,329 84,634 84,996 91,528 7.68%
    Afrikaans 35,856 46,824 50,275 76,965 53.11%
    Swahili 29,127 37,443 42,773 49,555 15.86%
    Yoruba 31,068 31,577 31,672 31,867 0.62%
    Egyptian Arabic 14,192 17,138 18,605 20,405 9.67%
    Amharic 12,950 13,789 14,286 14,558 1.90%
    Northern Sotho 1,000 7,823 8,050 8,018 -0.40%
    Somali 3,446 4,727 4,898 5,456 11.39%
    Shona 2,321 2,851 3,630 4,278 17.85%
    Hausa 1,345 1,525 1,856 3,494 88.25%
    Lingala 2,062 2,915 3,023 3,113 2.98%
    Kabyle 2,296 2,887 2,844 2,986 4.99%
    Kinyarwanda 1,780 1,810 1,823 1,821 -0.11%
    Kikuyu 1,349 1,357 1,358 0.07%
    Igbo 1,019 1,384 1,320 1,392 5.45%
    Kongo 1,176 1,179 1,193 1.19%
    Wolof 1,023 1,157 1,166 1,184 1.54%
    Luganda 1,153 1,162 1,169 0.60%
    Zulu 683 942 959 1,067 11.26%
    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +

    The Zulu Wikipedia is the latest addition to the 1000 club, having reached this milestone just before Wikimania last year, and progress has been steady since then.

    At first glance, Hausa looks like it’s in great shape, with an 88% increase in the number of articles. But this is misleading, as many of these are one line articles on football players, the entirety of which translates as, for example, “Kenny Allen (footballer) is an English football player.” No disrespect to Kenny Allen, but I’m not sure he and the 100s of other footballers listed there are critical components of Hausa knowledge. There’s a move to delete these articles (you can see the impressive list here while it’s up), but even if they survive, it’s not a sign of a healthy project.

    Leaving aside Hausa, it’s once again Afrikaans, growing at an impressive 53% over the period, that provides an example for the rest. At current rates, it’s on track to pass Malagasy and reclaim its position on top in about a year or so.

    Besides Afrikaans, only Shona, Swahili, Somali and Zulu show a growth rate above 10%, while quite a few sit idle.

    Moving on to the South African language editions specifically:

    South African Language Wikipedias

    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +
    Afrikaans 35,856 46,824 50,275 76,965 53.11%
    Northern Sotho 1,000 7,823 8,050 8,018 -0.40%
    Zulu 683 942 959 1,067 11.26%
    Xhosa 356 708 738 789 6.91%
    Tswana 503 639 641 641 0.00%
    Tsonga 266 526 562 585 4.09%
    Sotho 223 523 539 546 1.30%
    Swati 410 432 439 467 6.38%
    Venda 151 256 256 265 3.52%
    Ndebele (incubator) 12 12 11 -8.33%
    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +

    Afrikaans remains the only project that could be described as a usable Wikipedia – the other languages are still very much in the formative stages. Zulu is also showing signs of life. Besides these two, only Xhosa and Swati see growth rates above 5%. It’s sad to see the stalling of Northern Sotho, while Ndebele shows no signs of getting out of the incubator anytime soon.

    2019 has been proclaimed the Year of Indigenous Languages by the UN, but so far there’s not much sign of a change in the status of the African language projects. Later today sees the South African Centre for Digital Language Resources, in collaboration with the Academy of African Languages and Science from the University of South Africa, present an interactive day workshop on contributing to Wikipedia in South African languages. It’s great to see this initiative, which arose with no help that I’m aware of from Wikimedia South Africa. I’m always hopeful with events like these. Generally very few people to stay around to edit Wikipedia, but as projects like Northern Sotho and Swahili show, one person can make a huge difference in the early stages, and it justs needs a committed editor to stick around. It’s a lonely job editing in the early stages, wondering if it’s worthwhile, no community, no idea if their work is being read. Hopefully someone will take on the challenge!

    If you are looking to contribute, but don’t know where to start, please reach out to Wikimedia South Africa and we’d be happy to assist.

    Related posts

    Image from Wikimedia Commons

    How your students can counteract misinformation

    14:30, Tuesday, 02 2019 April UTC

    This April 2, on #AprilFactsDay, we’re reminded of the importance of trustworthy information. How can we equip the next generations of information consumers and producers with the skills they need to participate in our rapidly changing digital landscape?

    Wikipedia is one of the most trusted sites among the cacophony online. That’s because it’s built on the principle of verifiability; its community-made policies take a strict stance against promotion and advertising; and the volunteers that curate its content value neutrally presented and well-referenced facts. Information that doesn’t adhere to these standards is deleted as soon as one of Wikipedia’s thousands of devoted volunteers encounters it.

    But there are still gaps in information on Wikipedia, which can be harder to spot than false information. That’s where instructors and students in our Student Program are making a difference. Higher education instructors use our tools and assignment templates to teach students how to identify gaps on Wikipedia and use what they’re learning in class to correct those gaps.

    That’s what Dr. Ada Palmer did with her 32 students at the University of Chicago last fall term. Students added 90,000 words of well-researched content to Wikipedia about “how new information technologies trigger innovations in censorship and information control.” *

    Did you know, for example, that food producers can more easily sue their critics in certain states in the US because of food libel laws? The laws are often criticized as a restriction of first amendment rights.

    And newspaper theft, a form of censorship, occurs when an individual, organization, or government removes a large portion of a publication without the consent of the publisher in order to prevent others from reading it. The Wikipedia article now highlights some notable cases, as well as the strategies that various states and cities in the US employ to counteract it.

    Margaret Sullivan of the Washington Post calls April 2 (also known as Fact-Checking Day) “a global counterpunch on behalf of truth.” She also writes that it’s an opportunity to get the public more involved in the processes of informational evaluation that journalists undertake daily.

    In a Wikipedia writing assignment, students participate in fact-checking Wikipedia, looking for informational gaps, and correcting those gaps for the benefit of millions of readers. Wikipedia writing is fact-checking in action, with an added praxis of making the digital informational landscape better.

    In the case of Dr. Palmer’s students, the assignment is also an opportunity to educate others of their rights in the face of false information and censorship.

    In general, a Wikipedia writing assignment provides students with an opportunity to learn to critically evaluate information and participate in modes of knowledge creation that they typically accept passively. When Stanford Graduate School of Education found in 2016 that most students can’t tell the difference between a credible news website and a fake news site, a lot of instructors sprung into action to understand how they could help reinforce these skills in their students. Critical media literacy is an essential part of education and a skill that every instructor in higher education has the power to teach.

    The ability to access trustworthy and free information equips citizens to know and protect their rights. Access, however, is just the first step. Access plus judgement – the ability to discern reliable from unreliable information – is what truly makes a digital citizen.


    Read more about how our work at Wiki Education combats fake news and how you can help. And for more information about Dr. Palmer’s course, visit the University of Chicago’s course page here or their Youtube channel.


    Interested in adapting a Wikipedia writing assignment to fit your course? Visit teach.wikiedu.org for all you need to know.

    Help my CI job fails with exit status -11

    08:41, Tuesday, 02 2019 April UTC

    For a few weeks, a CI job had PHPUnit tests abruptly ending with:

    returned non-zero exit status -11

    The connoisseur [ 1 ] would have recognized that the negative exit status indicates the process exited due to a signal. On Linux, 11 is the value for the SIGSEGV signal, which is usually sent by the kernel to the process as a result of an improper machine instruction. The default behavior is to terminate the process (man 7 signal) and to generate a core dump file (I will come to that later).

    But why? Some PHP code ended up triggering a code path in HHVM that would eventually try to read outside of its memory range, or some similar low level fault. The kernel knows that the process completely misbehaved and thus, well, terminates it. Problem solved, you never want your program to misbehave when the kernel is in charge.

    The job had recently been switched to use a new container in order to benefit from more recent lib and to match the OS distributions used by the Wikimedia production system. My immediate recommendation was to rollback to the previous known state, but eventually I have let the task to go on and have been absorbed by other tasks (such as updating MediaWiki on the infrastructure).

    Last week, the job suddenly began to fail constantly. We prevent code from being merged when a test fails, and thus the code stays in a quarantine zone (Gerrit) and cannot be shipped. A whole team could not ship code (the Language-Team ) for one of their flagship projects (ContentTranslation .) That in turn prevents end users from benefiting from new features they are eager for. The issue had to be acted on and became an unbreak now! kind of task. And I went to my journey.

    returned non-zero exit status -11, that is a good enough error message. A process in a Docker container is really just an isolated process and is still managed by the host kernel. First thing I did was to look at the kernel syslog facility on our instances, which yields:

    kernel: [7943146.540511] php[14610]:
      segfault at 7f1b16ffad13 ip 00007f1b64787c5e sp 00007f1b53d19d30
         error 4 in libpthread-2.24.so[7f1b64780000+18000]

    php there is just HHVM invoked via a php symbolic link. The message hints at libpthread which is where the fault is. But we need a stacktrace to better determine the problem, and ideally a reproduction case.

    Thus, what I am really looking for is the core dump file I alluded to earlier. The file is generated by the kernel and contains an image of the process memory at the time of the failure. Given the full copy of the program instructions, the instructions it was running at that time, and all the memory segments, a debugger can reconstruct a human readable state of the failure. That is a backtrace, and is what we rely on to find faulty code and fix bugs.

    The core file is not generated. Or the error message would state it had coredumped, i.e. the kernel generated the core dump file. Our default configuration is to not generate any core file, but usually one can adjust it from the shell with ulimit -c XXX where XXX is the maximum size a core file can occupy (in kilobytes, in order to prevent filling the disk). Docker being just a fancy way to start a process, it has a setting to adjust the limit. The docker run inline help states:

    --ulimit ulimit Ulimit options (default [])

    It is as far as useful as possible, eventually the option to set is: --ulimit core=2147483648 or up to 2 gigabytes. I have updated the CI jobs and instructed them to capture a file named core, the default file name. After a few runs, although I could confirm failures, no files got captured. Why not?

    Our machines do not use core as the default filename. It can be found in the kernel configuration:

    name=/proc/sys/kernel/core_pattern
    /var/tmp/core/core.%h.%e.%p.%t

    I thus went on the hosts looking for such files. There were none.

    Or maybe I mean None or NaN.

    Nada, rien.

    The void.

    The result is obvious, try to reproduce it! I ran a Docker container doing a basic while loop, from the host I have sent the SIGSEGV signal to the process. The host still had no core file. But surprise it was in the container. Although the kernel is handling it from the host, it is not namespace-aware when it comes time to resolve the path. My quest will soon end, I have simply mounted a host directory to the containers at the expected place:

    mkdir /tmp/coredumps
    docker run --volume /tmp/coredumps:/var/tmp/core ....

    After a few builds, I had harvested enough core files. The investigation is then very straightforward:

    $ gdb /usr/bin/hhvm /coredump/core.606eb29eab46.php.2353.1552570410
    Core was generated by `php tests/phpunit/phpunit.php --debug-tests --testsuite extensions --exclude-gr'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, 
        start_routine=start_routine@entry=0x7f556f461c20 <timer_sigev_thread>, arg=<optimized out>) at pthread_create.c:813
    813    pthread_create.c: No such file or directory.
    [Current thread is 1 (Thread 0x7f55614be3c0 (LWP 2354))]
    
    (gdb) bt
    #0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, 
        start_routine=start_routine@entry=0x7f556f461c20 <timer_sigev_thread>, arg=<optimized out>) at pthread_create.c:813
    #1  0x00007f556f461bb2 in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:120
    #2  0x00007f557214a494 in start_thread (arg=0x7f55614be3c0) at pthread_create.c:456
    #3  0x00007f556aeebacf in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f55614be3c0, max=<optimized out>)
        at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:387
    #4  0x0000000000000000 in ?? ()

    Which @Anomie kindly pointed out is an issue solved in libc6. Once the container has been rebuilt to apply the package update, the fault disappears.

    One can now expect new changes to appear to ContentTranslation.


    [ 1 ] ''connoisseur'', from obsolete French, means "to know" https://en.wiktionary.org/wiki/connoisseur . I guess the English language forgot to apply update on due time and can not make any such change for fear of breaking back compatibility or locution habits.

    The task has all the technical details and log leading to solving the issue: T216689: Merge blocker: quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11)

    (Some light copyedits to above -- Brennen Bearnes)

    Autonomous Systems performance report

    06:49, Tuesday, 02 2019 April UTC

    Today we're publishing our first report of the performance experienced by visitors of Wikimedia websites, focused on the Autonomous Systems visitors are connecting from.

    This report will be updated monthly, with historical data made available. The goal is to watch the evolution of these metrics over time, allowing us to identify improvements and potential pain points.

    In order to make a fair assessment of the autonomous systems' performance, real user metrics collected from web browsers are normalised, in order to avoid differences such as average device power for a given network's users potentially skewing the results. For example, an ISP with more expensive data plans might have users with more expensive, better performing devices on average. This is way we compare data points only for similar effective device CPU power between providers. We also separate the mobile and desktop experiences, because they serve different content, with a notable difference in the median page weight, which directly impacts performance metrics. We wouldn't want the mobile/desktop mix of a given provider to influence the results.

    If you look at the report, you might wonder why some autonomous systems' underlying mobile networks show up under "desktop" and some wired internet providers appear under "mobile". The explanation is that the internet providers either sell home internet devices that are effectively mobile network modems, resulting in people using their desktop computers (and as a result, the desktop websites) over a mobile network. Or the providers have mobile device users automatically connect to the same provider's WiFi routers when users are in reach of one.

    One caveat about this report is that in countries that are physically large, like the United States, the country-wide aggregation in no way reflects important regional differences there might be for a given network. The main reason why we can't look at smaller regions is that we have simply no way of knowing where mobile users are connecting from, short of collecting geolocation data. Since we care deeply about our user's privacy and their experience, it doesn't feel appropriate at this time to ask users for their precise location in order to generate this type of finer-grained data. Such a scheme would also suffer from self-selection bias. There's already a lot of work to be done with the data aggregated at the national level!

    We hope that this public report will help network operators understand their customers' real performance characteristics when it comes to browsing one of the web's largest websites. We are welcoming of peering requests networks might want to propose, should they seek to improve their connectivity to our datacenters.

    Dr. Irene Chen gave her chemistry students a unique opportunity to practice science communication. She incorporated a Wikipedia writing assignment into her course at UC Santa Barbara this last fall. The course discussed major breakthroughs in nucleic acids research – information that students then channeled into relevant Wikipedia articles where details were missing. Eight students added a total of 13,600 words to Wikipedia this way, a process requiring that they synthesize research in the most concise, essentialized way possible.

    3,600 of those words were channeled into Wikipedia’s article about systematic evolution of ligands by exponential enrichment, a chemical process also known as in vitro selection. Before the student began improving the article, it was almost entirely made up of an introduction and no other informational sections. That was even noted on the article’s Talk page, where volunteers discuss their desired changes. Dr. Chen’s student responded directly to the issue by adding sections about the details of the procedure. Those new sections include information about how chemists generate a single stranded oligonucleotide library and then how that library is incubated to allow binding with the oligonucleotide-target. The student also added information about tracking the progress of the resulting reaction.

    Another student expanded Wikipedia’s article about minimum inhibitory concentration (MIC), which before was also just an introduction and few references. 1,200 words later, the article boasts an additional 10 sources, as well as background information about the history and clinical usage of the MIC concept. As the article states, MIC “is the lowest concentration of a chemical, usually a drug, which prevents visible growth of bacterium,” a definition supported by a review paper published in 2005 about the treatment of bacterial infectious diseases. The article has been viewed almost 13,000 times since the student made changes – many more than would a typical term paper!

    Students tend to invest more in their work when they realize it can be accessed by millions. They understand the responsibility to represent information accurately and take their new-found role of ‘knowledge creator’ seriously. They’re a great group to do this work well, especially in the sciences. Students can translate complex course topics for a general audience who might be learning about the topic for the first time because they remember what that was like. They do a great service to the world by sharing expertise they have access to (both through their professor and their library resources) with Wikipedia’s worldwide readership. That matters, and they understand that.


    Interested in incorporating a Wikipedia writing assignment into your course? Visit teach.wikiedu.org for all you need to know to get started. Or hear from instructors who have done this here.

    weeklyOSM 453

    11:17, Monday, 01 2019 April UTC

    19/03/2019-25/03/2019

    Logo

    Map for young parents | 1 | © Leaflet | Map data © OpenStreetMap contributors, CC-BY-SA, Map Tiles © OpenStreetMap

    Mapping

    • mapmeld made an interesting analysis of language clusters by using Unicode code points. They also detected outliers and odd forms of vandalism.
    • The vote on the proposal “Fashion accessory shop” has begun and runs until 8th April 2019.
    • The vote on “Substations functions” ended. The proposal did not achieve the necessary majority.

    Community

    • alexkemp describes how he resolves problems encountered with OSMTracker 0.7 when pictures aren’t georeferenced. He shows how he restores the missing data.
    • SK53 wanted to identify areas of social housing in England. He shows in his blog how he was able to create suitable useful polygons from an open data list of individual properties with PostGIS clustering. He suggests that such approaches may have broader applicability to OSM data.
    • Ilya Zverev recently started working for the taxi company Juno. He describes how GPX tracks from the Juno app are being aggregated and how they can be used to keep OpenStreetMap up-to-date with very fresh data. This data is a lot easier to work with than the GPX tracks stored by OSM, which may cover multiple changes in the road network. He also presented on this subject at FOSDEM in Brussels, and his talk was recorded.

    Events

    • FOSS4G Argentina, a conference with workshops and presentations, takes place from 9th – 12th of April 2019 in Buenos Aires. Registration is open and it includes a workshop on how to use JOSM for editing by (@Hernan)
    • The Linuxwochen 2019 tour through Austria will be in Vienna from May 2 – 4. The article contains some links to video recordings of earlier Linux weeks. (de) (automatic translation)
    • Entur and Ruter are hosting an OpenTripPlanner Summit in Oslo on 3rd April. The proceedings will be live-streamed.

    Humanitarian OSM

    • HOT has created a new development community for the Tasking Manager with further partnerships with Microsoft and Bing Maps. Over the next few months, work will begin on the design for integrating machine-learning and improving workflow performance of the Tasking Manager. Additionally open building datasets for Uganda and Tanzania will be created.
    • In order to be able to help people in crisis areas quickly, NGO helpers depend on high-quality maps. For many areas, these maps must first be produced. In doing so, Doctors Without Borders is supported by modern technology and by thousands of volunteers. On Tuesday 2nd of April 2019 from 19:00 – 22:00 interested people will meet for a Missing Maps Mapathon in Berlin.

    Maps

    • PrivateMajory tweets a 3-D effect map made using QGIS and then enhanced in Blender.
    • DJI appears to be using (in certain countries) OSM data in their Fly Safe GEO ZONE Map
      (several examples). A tweet and emails have been sent to DJI requesting correct attribution, but sadly there was no reply from DJI.
    • The OpenStreetMap.be website now has a brand-new “How To” page with some tutorials and example code to demonstrate on how to use OSM and OSMBE base layer.
    • The first version of the Baby Map was published. It represents the tags that are relevant for parents of small children. However, the data in OSM is still a bit thin.

    Programming

    • The German forum discusses (automatic translation) the categorisation of notes on osm.org. The forest of “Notes” markers is often dense and a categorisation by urgency or keywords could have a positive effect on the processing of the messages. User miche101 arranges notes on a map assigning different symbols to the markers based on keywords.
    • The GISience News Blog of the University of Heidelberg previews a new way to query spatial links in the OpenStreetMap history database OSHDB.

    Did you know …

    • … the 2 Minute introduction videos to help beginners get started with OpenStreetMap and mapping for HOT-projects? Vitor George reminds us of the possibility to add subtitles to these YouTube videos.
    • … how to tag an area that is covered by trees, but neither tagged as landuse=forest nor natural=wood such as for example a cemetery? These can be tagged using landcover = trees along with their default landuse tags.
    • … the smartphone app ViewRanger ?

    OSM in the media

    • The Free Software Foundation (FSF) recognised OpenStreetMap with the 2018 Free Software Award for Projects of Social Benefit and Deborah Nicholson with the Award for the Advancement of Free Software. Kate Chapman received the award from Richard M. Stallman at the Massachusetts Institute of Technology (MIT) and gave a speech on behalf of OSM/OSMF community.
    • The Open Mapping Group at McGill University in Montreal, Canada, hosted an indoor mapping party in the face of frosty weather.

    Other “geo” things

    • The International Telecommunication Union (ITU) published a report on reliefweb.int about the use of disruptive technologies in disaster management. The digital technologies that are supported by many volunteers around the world in the crowdsourcing projects such as OSM are highlighted in particular.
    • Nazi spies, untroubled by the Soviet authorities, made a complete map of Moscow during the late 1930s.
    • Yahoo Japan news reported that Google Maps data of Japan has deteriorated. Users assume that the data source has switched, because the copyright attribution of Zenrin has disappeared.
    • Grab Malaysia and The Department of Geoinformation, of the Universiti Teknologi Malaysia, organise mentoring training for mappers valled Grab Geo*Star in order increase the interest of students in the areas of mapping and OpenStreetMap.
    • Mapbox reports that it enhanced its living map platform with proprietary data from Zenrin, a Japanese map publisher. It is not clear as to whether enhanced was used synonymously for “replaced OSM”.
    • It’s not just OSM that suffers from the occasional broken polygon. Simon Poole spotted woods near Zurich which have disappeared on Here and Bing maps.
    • The book Time for mapping: Cartographic temporalities deals with the entry of the temporal into maps in the digital age and contemporary mapping practices.
    • The children’s book Lindsey, the GIS Specialist introduces the basics of GIS. This is one of a small series published by the US-based surveying firm Bolton & Menk, based on the job roles of their own employees.
    • A Guardian quiz shows population density maps of various cities. Can you identify them? Or do you have to guess?.
    • Esri announces on their ArcGIS blog that the new March version (previously beta) of the OpenStreetMap Vector Basemap is now freely available in ArcGIS.
    • On the south coast of Norway the first underwater restaurant in Europe opened (automatic translation). Potential visitors have to wait half a year to get a reservation due to it being booked out. It is still only mapped as a construction site in OSM.

    Upcoming Events

    Where What When Country
    Sydney Sydney OSM get together 2019-03-30 australia
    UCL Louvain-la-Neuve National Mapathon 2019-03-30 belgium
    ULIEGE Liège National Mapathon 2019-03-30 belgium
    Bochum Mappt die Innenstadt – Mappingtag für Einsteiger*innen und Fortgeschrittene 2019-03-31 germany
    Stuttgart Stuttgarter Stammtisch 2019-04-03 germany
    Bochum Mappertreffen 2019-04-04 germany
    Nantes Réunion mensuelle 2019-04-04 france
    Dresden Stammtisch Dresden 2019-04-04 germany
    Heidelberg DisasterMappers Night of Geography 2019-04-05 germany
    Cayenne Rencontre mensuelle 2019-04-05 france
    La Riche La Riche (37)#Ateliers d’initiation à OpenStreetMap 2019-04-06 france
    Kyoto お花見!オープンデータソン in 京都 2019-04-07 japan
    Rennes Réunion mensuelle 2019-04-08 france
    Bordeaux Réunion mensuelle 2019-04-08 france
    Essen Mappertreffen 2019-04-08 germany
    Taipei OSM x Wikidata #3 2019-04-08 taiwan
    Toronto Toronto Mappy Hour 2019-04-08 canada
    Lyon Rencontre mensuelle pour tous 2019-04-09 france
    Munich Münchner Stammtisch 2019-04-09 germany
    Salt Lake City SLC Mappy Hour 2019-04-09 united states
    Viersen OSM Stammtisch Viersen 2019-04-09 germany
    Cologne Köln Stammtisch 2019-04-10 germany
    Buenos Aires Taller Introducción a JOSM en FOSS4G-AR 2019 2019-04-10 argentina
    Leoben Stammtisch Obersteiermark 2019-04-11 austria
    Zurich OSM Stammtisch Zurich 2019-04-11 switzerland
    Berlin 130. Berlin-Brandenburg Stammtisch 2019-04-12 germany
    Salt Lake City University of Utah Campus Mapping Party 2019-04-13 united states
    Biella Incontro mensile 2019-04-13 italia
    Salt Lake City SLC Map Night 2019-04-16 united states
    Cologne Bonn Airport Bonner Stammtisch 2019-04-16 germany
    Lüneburg Lüneburger Mappertreffen 2019-04-16 germany
    Reutti Stammtisch Ulmer Alb 2019-04-16 germany
    Toulouse Rencontre mensuelle 2019-04-17 france
    Karlsruhe Stammtisch 2019-04-17 germany
    Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
    Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
    Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
    Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
    Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
    Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
    Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
    Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

    This weeklyOSM was produced by PierZen, Polyglot, Rainero, Rogehm, SK53, SunCobalt, TheFive, TheSwavu, YoViajo, adrianxoc, derFred, jinalfoflia, k_zoar, muramototomoya.

    Tech News issue #14, 2019 (April 1, 2019)

    00:00, Monday, 01 2019 April UTC
    TriangleArrow-Left.svgprevious 2019, week 14 (Monday 01 April 2019) nextTriangleArrow-Right.svg
    Other languages:
    Bahasa Indonesia • ‎Deutsch • ‎English • ‎français • ‎italiano • ‎polski • ‎português do Brasil • ‎română • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎کوردی • ‎हिन्दी • ‎中文 • ‎日本語

    Debugging production with X-Wikimedia-Debug

    20:31, Friday, 29 2019 March UTC

    In February 2018, a user reported that some topics created by users on Flow discussion boards were not appearing in the Recent Changes feeds, including EventStreams and the IRC-RC feed. Various automated patrol systems rely on EventStreams, so the bug meant a number of edits bypassed those systems on Flow-enabled wikis.

    When approaching a bug like this, there are typically three things I do:

    1. Determine the steps to reproduce the bug. That was already done by the task author (thank you @Rxy!) and then confirmed by other contributors to the task (h/t @Krinkle, @Etonkovidova)
    2. Attempt to reproduce the issue locally and set breakpoints in code to understand why the problem occurs
    3. Check the production logs to look for any messages related to the bug report

    Unfortunately the problem was not reproducible in the MediaWiki Vagrant development environment. Nor were there any relevant messages in the logs. Since reproducing the issue locally wasn't possible, we merged some diagnostic code but still had nothing. Early on, @SBisson suggested a hypothesis about the code path involved in emitting the event:

    if ( user is trusted ) 
      return true
    else
      let's load the revision from replica, return true based on the the status of the revision
      oh it doesn't exist (yet), return false

    But we could not reproduce this, nor could we identify exactly where this might occur since the code paths for this functionality had many points where execution could stop silently.

    Enter X-Wikimedia-Debug

    One of the useful tools in our stack is the X-Wikimedia-Debug header. I knew about this header (and its browser extensions) from verifying changes that were being SWAT'ed into production but I had not thought to use it for tracking down a production bug.

    I was using the browser extension with the "Log" checkbox ticked (and still not finding anything useful in Logstash to help isolate this bug) when I realized that I could also profile the problematic request. When you check the box to profile a request, XHProf will profile the code that's executed and make the result available for viewing via XHGui.

    Typically you do this to understand performance bottlenecks in your code, as get a complete list of all functions executed during the request, along with the time and memory usage associated with each function.

    I followed the steps to reproduce and then switched on the "Profile" option before posting a new topic on an empty Flow board. Now, I had a profiled request which provided me with information on all the methods called, including which method called another (click on a method call to see its parent and children method calls). From here I could follow the path traversed by Flow's event emitting code, and see exactly where the code execution halted.

    Reproducing the bug locally

    With this knowledge, I went back to my local environment, this time using MediaWiki-Docker-Dev, which has database replication set up as part of its stack (MediaWiki Vagrant does not). I set some breakpoints in the code I suspected was causing the problem, and then found that in RevisionActionPermissions.php#isBoardAllowed(), we had this code:

    $allowed = $this->user->isAllowedAny( ...(array)$permissions );
    if ( $allowed ) {
        return true;
    }
    return !$workflow->isDeleted();

    For a new topic on a blank flow board, $permissions is deletedtext, which would return true for privileged users. But for unprivileged users, Flow would check !$workflow->isDeleted();, and this evaluated as false because the code was querying the database replica, and the title did not exist there yet.

    The submitted solution was to patch isDeleted() to query the master DB when in the context of a POST request, since we know the title would exist in the master DB. With this patch in place, events were once again emitted properly and the bug was fixed.

    Conclusion

    A few of my conclusions from this experience:

    • If you're having difficulty tracking down the code path, consider using the profiler in the X-Wikimedia-Debug browser extension
    • Diagnostic code is helpful (even if it didn't pinpoint the problem here) and debug level logging should be considered instead of silent returns
    • Having database replication in your local development environment can help catch issues while developing and when attempting to reproduce a production issue. One can use the MediaWiki-Docker-Dev environment for this, and see also how to adjust its database replication lag.

    Kosta Harlan
    Senior Software Engineer
    Growth Team


    Learn more about the X-Wikimedia-Debug header and browser extension on Wikitech.

    The working group to consider future CI tooling for Wikimedia has finished and produced a report. The report is at https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Report and the short summary is that the release engineering team should do prototype implementations of Argo, GitLab CI/CD, and Zuul v3.

    How do we get Wikipedia to every corner of the world?

    How can we share the joy of free knowledge with people who have never heard of our website?

    In 2018, we asked Wikipedia’s volunteer editing communities all over the world to think creatively about expanding the reach of the free online encyclopedia.

    Through a crowdsourcing campaign called “Inspire New Readers”, we collected 362 ideas from more than 500 participants. From these, the Wikimedia Foundation funded eight projects through our rapid grants program.

    Half of the proposals that we received focused on using social media and videos to generate additional reach, while the other half were for offline promotion. This campaign gave us many insights on what works well for projects that focus on raising awareness and we have now launched a new rapid grant type. We hope that community organizers continue to work on bringing Wikipedia to everyone.

    Lessons from the Inspire New Readers campaign

    Of the eight projects funded, four focused on online promotion, and four focused on offline promotion.

    One example of online promotion is the video created by the Punjabi Wikimedians community. In it, a group of animals go to Lahore with their teacher, learning facts from Wikipedia along the way. The video went viral: it was seen by almost half a million people. The impact on Punjabi Wikipedia’s main page also exceeded expectations, with page views increasing by 357% during the campaign.

    An example of an offline campaign to raise awareness of Wikimedia projects is one carried out by Maithili Wikipedia community: “The Wikipedia Rickshaw”, which had the goal of engaging new readers in Rajbiraj, a province in Nepal. For this campaign, the local community ran rickshaw announcements about Maithili Wikipedia in the city, held offline meetups, and hired billboards at various locations throughout the city. During the campaign, there was a 25% increase in siteviews for this language version Wikipedia.

    These eight projects are a good example of why it is important to reach potential new readers in their language, and where they are: traffic can be easily impacted in smaller language version Wikimedia projects. You can read more about all projects funded in the campaign report.

    What other lessons stem from these awareness projects? What about the performance of videos, versus slide shows, versus images in social media campaigns? To dive deeper on the impact social media campaigns can have, we focused on two other marketing initiatives led by communities in Asia: Bengali Wikisource, and Tamil Wikipedia. The Bengali Wikisource community created 2 videos and 1 image with text, which were then a/b tested. After testing them, the videos proved to have better results, increasing traffic to the site by 86% during the campaign. The Tamil Wikipedia community created a slideshow on Facebook, using good quality images from Wikimedia Commons. They tested Facebook banners against the slideshows and learned that the slideshows worked better. Through this optimization, they were able to increase traffic to the site by 22% during the campaign.

    You can read more about these two initiatives, including lessons learned on audience segmentation through geography and language, in the community marketing experiments impact report.

    New guidelines for projects that focus on raising awareness of Wikimedia projects

    These marketing initiatives are important to the growth of Wikipedia and other wiki projects: awareness is the first step in building new users, support, and ultimately participation in Wikimedia projects. We know that low awareness of Wikipedia is associated with low usage, and without usage people will never become contributors or advocates for free knowledge.

    It is now easier to apply for grants for projects that focus on raising awareness of Wikimedia projects. With the insights from the two impact reports, we have created two sets of guidelines that focus on projects to raise awareness: one for general promotion campaigns, and one for video campaigns. This documentation is part of the rapid grants program, which now offers an extension of $1,000 over the limit of this grant program (a total of $3,000 USD) for video creation. In these guidelines, you will find details of what the grant will pay for, typical outcomes, things you’ll need, and a list of example projects for guidance.

    We hope that these new guidelines inspire community organizers to continue working to make Wikipedia visible in every corner of the world.

    María Cruz, Communication & Outreach Project Manager, Community Relations, Community Engagement
    Wikimedia Foundation

    Quibble hibernated, it is time to flourish

    11:01, Friday, 29 2019 March UTC

    Writing blog is neither my job nor something that I enjoy, I am thus late in the Quibble updates. The last one Blog Post: Quibble in summer has been written in September 2018 and I forgot to publish it until now. You might want to read it first to get a glance about some nice changes that got implemented last summer.

    I guess personal changes that happened in October and the traditional norther hemisphere winter hibernation kind of explain the delay (see note [ 1 ]). Now that spring is finally there ({{NPOV}}), it is time for another update.

    Quibble went from 0.0.26 to 0.0.30 which I have cut just before starting this post. I wanted to highlight a few changes from an overall small change log:

    • Use stronger password in Quibble related browser tests - T204569
    • Parallelize ext/skin linter
    • Parallelize mediawiki/core linter
    • PHPUnit generates Junit results - T207841
    • readme: how to reproduce a CI build - T200991
    • doc: quibble-stretch no more has php
    • mediawiki.d: Avoid vars that look like core or wmf names
    • Drop /p from Gerrit clone URL - T218844
    • Support to clone repositories in parallel - T211701
    • Properly abort when git submodule processing fails - T198980
    • mediawiki.d: Improve docs about dev settings and combine env sections
    • mediawiki.d: Merge into one file

    Parallelism [ 2 ]

    The first inception of Quibble did not have much thoughts put into it with regard to speed. The main goal at the time was simply to gather all the complicated logic from CI shell scripts, Jenkins jobs shell snippets, python or javascript scripts all in one single command. That in turn made it easier to reproduce a build but with a serious limitation: commands are just run serially which is far from being optimum.

    Quibble would now run the lint commands in parallel for both extensions/skins and mediawiki/core. Internally, it forks run composer test and npm test in parallel, that slightly speed up the time to get linting commands to complete.

    Another annoyance is when testing multiple repositories together, preparing the git repositories could takes several minutes. An example is for an extension depending on several other extensions or the gated wmf-quibble-* jobs which run tests for several Wikimedia deployed extensions. Even when using a local cache of git repositories (--git-cache) the serially run git commands take a while. Quibble 0.0.30 learned --git-parallel to run the git commands in parallel. An example speed up using git cache, several repositories and a DSL connection:

    git-parallel Duration
    16 30 seconds
    1 50 seconds

    The option defaults to 1 which retain the exact same behavior / code path as before. I invite you to try --git-parallel=8 for example and draw your own conclusion. Wikimedia CI will be updated once Quibble 0.0.30 is deployed.

    Parallelism added by myself, @hashar, and got partly tracked in T211701.

    Documentation

    Some part of the documentation referred to a Wikimedia CI containers that were no more suitable for running tests due to refactoring. The documentation as thus been updated to use the proper containers: docker-registry.wikimedia.org/releng/quibble-stretch-php72 or docker-registry.wikimedia.org/releng/quibble-stretch-hhvm. -- @hashar

    In August, Wikidata developers used Quibble to reproduce a test failure and they did the extra step to capture their session and document how to reproduce it. Thank you @Pablo-WMDE for leading this and @Tarrow, @Addshore, @Michael, @Ladsgroup for the reviews - T200991.

    You can read the documentation online at:

    Note: as of this writing, the CI git servers are NOT publicly reachable (git://contint1001.wikimedia.org and git://contint2001.wikimedia.org).

    Submodule failures

    Some extensions or skins might have submodules, however we never caught errors when they failed to process and kept continuing. That later causes tests to fail in non obvious way and caused several people to loose time recently. T198980

    The reason is Quibble simply borrowed a legacy shell script to handle submodules and that script has been broken since its first introduction in 2014. It relied on the find command which still exit 0 even with -exec /bin/false. The reason is that although /bin/false exit code is 1, that simply causes find to consider the -exec predicate to be false, find thus abort processing further predicates but that is not an error.

    The logic has been ported to pure python and now properly abort when git submodule fails. That also drop the requirement to have the find command available which might help on Windows. -- @hashar

    Miscellaneous tweaks

    The configuration injected by Quibble in LocalSettings.php is now a single file when it previously was made of several small PHP files glued together by shelling out to php. The inline comments have been improved. -- @Krinkle

    MediaWiki installer uses a slightly stronger password (testwikijenkinspass) to accommodate for a security hardening in MediaWiki core itself. -- @Reedy T204569

    The Gerrit URL to clone the canonical git repository from has been updated to catch up with a change in Gerrit. Updated r/p to simply /r. -- @Legoktm T218844

    PHPUnit generates JUnit test results in the log directory, intended to be captured and interpreted by CI. -- @hashar T207841

    NOTE: those changes have not all been deployed to Wikimedia CI as of March 28th 2019 but should be next week.

    footnotes

    [ 1 ] Seasons are location based and a cultural agreement, they are quite interesting in their own. They are reversed in the Norther and Southern hemisphere, do not exist at the equator while in India they define six seasons. Thus when I refer to a winter hibernation, it really just reflect my own biased point of view.

    [ 2 ] Parallelism is fun, I can never manage to write that word without mixing up the number of r or l for some reason. As a sideway note, my favorite sport to watch is parallel bars (enwiki).

    Shocking tales from ornithology

    08:03, Friday, 29 2019 March UTC
    Manipulative people have always made use of the dynamics of ingroups and outgroups to create diversions from bigger issues. The situation is made worse when misguided philosophies are peddled by governments that put economics ahead of ecology. The pursuit of easily gamed targets such as GDP is preferrable to ecological amelioration since money is a man-made and controllable entity. Nationalism, pride, other forms of chauvinism, the creation of enemies and the magnification of war threats are all effective tools in the arsenal of Machiavelli for use in misdirecting the masses when things go wrong. One might imagine that the educated, especially scientists, would be smart enough not to fall into these traps, but cases from history dampen hopes for such optimism.

    There is a very interesting book in German by Eugeniusz Nowak called "Wissenschaftler in turbulenten Zeiten" (or scientists in turbulent times) that deals with the lives of ornithologists, conservationists and other naturalists during the Second World War. Preceded by a series of recollections published in various journals, the book was published in 2010 but I became aware of it only recently while translating some biographies into the English Wikipedia. I have not yet actually seen the book (it has about five pages on Salim Ali as well) and have had to go by secondary quotations in other content. Nowak was a student of Erwin Stresemann (with whom the first chapter deals with) and he writes about several European (but mostly German, Polish and Russian) ornithologists and their lives during the turbulent 1930s and 40s. Although Europe is pretty far from India, there are ripples that reached afar. Incidentally, Nowak's ornithological research includes studies on the expansion in range of the collared dove (Streptopelia decaocto) which the Germans called the Türkentaube, literally the "Turkish dove", a name with a baggage of cultural prejudices.

    Nowak's first paper of "recollections" notes that: [he] presents the facts not as accusations or indictments, but rather as a stimulus to the younger generation of scientists to consider the issues, in particular to think “What would I have done if I had lived there or at that time?” - a thought to keep as you read on.

    A shocker from this period is a paper by Dr Günther Niethammer on the birds of Auschwitz (Birkenau). This paper (read it online here) was published when Niethammer was posted to the security at the main gate of the concentration camp. You might be forgiven if you thought he was just a victim of the war. Niethammer was a proud nationalist and volunteered to join the Nazi forces in 1937 leaving his position as a curator at the Museum Koenig at Bonn.
    The contrast provided by Niethammer who looked at the birds on one side
    while ignoring inhumanity on the other provided
    novelist Arno Surminski with a title for his 2008 novel -
    Die Vogelwelt von Auschwitz
    - ie. the birdlife of Auschwitz.

    G. Niethammer
    Niethammer studied birds around Auschwitz and also shot ducks in numbers for himself and to supply the commandant of the camp Rudolf Höss (if the name does not mean anything please do go to the linked article / or search for the name online).  Upon the death of Niethammer, an obituary (open access PDF here) was published in the Ibis of 1975 - a tribute with little mention of the war years or the fact that he rose to the rank of Obersturmführer. The Bonn museum journal had a special tribute issue noting the works and influence of Niethammer. Among the many tributes is one by Hans Kumerloeve (starts here online). A subspecies of the common jay was named as Garrulus glandarius hansguentheri by Hungarian ornithologist Andreas Keve in 1967 after the first names of Kumerloeve and Niethammer. Fortunately for the poor jay, this name is a junior synonym of  G. g. anatoliae described by Seebohm in 1883.

    Meanwhile inside Auschwitz, the Polish artist Wladyslaw Siwek was making sketches of everyday life  in the camp. After the war he became a zoological artist of repute. Unfortunately there is very little that is readily accessible to English readers on the internet (beyond the Wikipedia entry).
    Siwek, artist who documented life at Auschwitz
    before working as a wildlife artist.
     
    Hans Kumerloeve
    Now for Niethammer's friend Dr Kumerloeve who also worked in the Museum Koenig at Bonn. His name was originally spelt Kummerlöwe and was, like Niethammer, a doctoral student of Johannes Meisenheimer. Kummerloeve and Niethammer made journeys on a small motorcyle to study the birds of Turkey. Kummerlöwe's political activities started earlier than Niethammer, joining the NSDAP (German: Nationalsozialistische Deutsche Arbeiterpartei = The National Socialist German Workers' Party)  in 1925 and starting the first student union of the party in 1933. Kummerlöwe soon became a member of the Ahnenerbe, a think tank meant to provide "scientific" support to the party-ideas on race and history. In 1939 he wrote an anthropological study on "Polish prisoners of war". At the museum in Dresden that he headed, he thought up ideas to promote politics and he published them in 1939 and 1940. After the war, it is thought that he went to all the European libraries that held copies of this journal (Anyone interested in hunting it should look for copies of Abhandlungen und Berichte aus den Staatlichen Museen für Tierkunde und Völkerkunde in Dresden 20:1-15.) and purged them of his article. According to Nowak, he even managed to get his hands (and scissors) on copies held in Moscow and Leningrad!  

    The Dresden museum was also home to the German ornithologist Adolf Bernhard Meyer (1840–1911). In 1858, he translated the works of Charles Darwin and Alfred Russel Wallace into German and introduced evolutionary theory to a whole generation of German scientists. Among Meyer's amazing works is a series of avian osteological works which uses photography and depicts birds in nearly-life-like positions (wonder how it was done!) - a less artistic precursor to Katrina van Grouw's 2012 book The Unfeathered Bird. Meyer's skeleton images can be found here. In 1904 Meyer was eased out of the Dresden museum because of rising anti-semitism. Meyer does not find a place in Nowak's book.

    Nowak's book includes entries on the following scientists: (I keep this here partly for my reference as I intend to improve Wikipedia entries on several of them as and when time and resources permit. Would be amazing if others could pitch in!).
    In the first of his "recollection papers" (his 1998 article) he writes about the reason for writing them  - the obituary for Prof. Ernst Schäfer  was a whitewash that carefully avoided any mention of his wartime activities. And this brings us to India. In a recent article in Indian Birds, Sylke Frahnert and others have written about the bird collections from Sikkim in the Berlin natural history museum. In their article there is a brief statement that "The  collection  in  Berlin  has  remained  almost  unknown due  to  the  political  circumstances  of  the  expedition". This might be a bit cryptic for many but the best read on the topic is Himmler's Crusade: The true story of the 1939 Nazi expedition into Tibet (2009) by Christopher Hale. Hale writes about Himmler: 
    He revered the ancient cultures of India and the East, or at least his own weird vision of them.
    These were not private enthusiasms, and they were certainly not harmless. Cranky pseudoscience nourished Himmler’s own murderous convictions about race and inspired ways of convincing others...
    Himmler regarded himself not as the fantasist he was but as a patron of science. He believed that most conventional wisdom was bogus and that his power gave him a unique opportunity to promulgate new thinking. He founded the Ahnenerbe specifically to advance the study of the Aryan (or Nordic or Indo-German) race and its origins
    From there Hale goes on to examine the motivations of Schäfer and his team. He looks at how much of the science was politically driven. Swastika signs dominate some of the photos from the expedition - as if it provided for a natural tie with Buddhism in Tibet. It seems that Himmler gave Schäfer the opportunity to rise within the political hierarchy. The team that went to Sikkim included Bruno Beger. Beger was a physical anthropologist but with less than innocent motivations although that would be much harder to ascribe to the team's other pursuits like botany and ornithology. One of the results from the expedition was a film made by the entomologist of the group, Ernst Krause - Geheimnis Tibet - or secret Tibet - a copy of this 1 hour and 40 minute film is on YouTube. At around 26 minutes, you can see Bruno Beger creating face casts - first as a negative in Plaster of Paris from which a positive copy was made using resin. Hale talks about how one of the Tibetans put into a cast with just straws to breathe from went into an epileptic seizure from the claustrophobia and fear induced. The real horror however is revealed when Hale quotes a May 1943 letter from an SS officer to Beger - ‘What exactly is happening with the Jewish heads? They are lying around and taking up valuable space . . . In my opinion, the most reasonable course of action is to send them to Strasbourg . . .’ Apparently Beger had to select some prisoners from Auschwitz who appeared to have Asiatic features. Hale shows that Beger knew the fate of his selection - they were gassed for research conducted by Beger and August Hirt.
    SS-Sturmbannführer Schäfer at the head of the table in Lhasa

    In all, Hale makes a clear case that the Schäfer mission had quite a bit of political activity underneath. We find that Sven Hedin (Schäfer was a big fan of him in his youth. Hedin was a Nazi sympathizer who funded and supported the mission) was in contact with fellow Nazi supporter Erica Schneider-Filchner and her father Wilhelm Filchner in India, both of whom were interned later at Satara, while Bruno Beger made contact with Subhash Chandra Bose more than once. [Two of the pictures from the Bundesarchiv show a certain Bhattacharya - who appears to be a chemist working on snake venom at the Calcutta snake park - one wonders if he is Abhinash Bhattacharya.]

    My review of Nowak's book must be uniquely flawed as  I have never managed to access it beyond some online snippets and English reviews.  The war had impacts on the entire region and Nowak's coverage is limited and there were many other interesting characters including the Russian ornithologist Malchevsky  who survived German bullets thanks to a fat bird observation notebook in his pocket! In the 1950's Trofim Lysenko, the crank scientist who controlled science in the USSR sought Malchevsky's help in proving his own pet theories - one of which was the ideas that cuckoos were the result of feeding hairy caterpillars to young warblers!

    Issues arising from race and perceptions are of course not restricted to this period or region, one of the less glorious stories of the Smithsonian Institution concerns the honorary curator Robert Wilson Shufeldt (1850 – 1934) who in the infamous Audubon affair made his personal troubles with his second wife, a grand-daughter of Audubon, into one of race. He also wrote such books as America's Greatest Problem: The Negro (1915) in which we learn of the ideas of other scientists of the period like Edward Drinker Cope! Like many other obituaries, Shufeldt's is a classic whitewash.  

    Even as recently as 2015, the University of Salzburg withdrew an honorary doctorate that they had given to the Nobel prize winning Konrad Lorenz for his support of the political setup and racial beliefs. It should not be that hard for scientists to figure out whether they are on the wrong side of history even if they are funded by the state. Perhaps salaried scientists in India would do well to look at the legal contracts they sign with their employers, especially the state, more carefully. The current rules make government employees less free than ordinary citizens but will the educated speak out or do they prefer shackling themselves. 

    Postscripts:
    • Mixing natural history with war sometimes led to tragedy for the participants as well. In the case of Dr Manfred Oberdörffer who used his cover as an expert on leprosy to visit the borders of Afghanistan with entomologist Fred Hermann Brandt (1908–1994), an exchange of gunfire with British forces killed him although Brandt lived on to tell the tale.
    • Apparently Himmler's entanglement with ornithology also led him to dream up "Storchbein Propaganda" - a plan to send pamphlets to the Boers in South Africa via migrating storks! The German ornithologist Ernst Schüz quietly (and safely) pointed out the inefficiency of it purely on the statistics of recoveries!

    From time immemorial, human beings have obsessed over knowledge. It’s one of our most precious resources, and our most jealously guarded. We’ve accumulated it gathering in circles under trees and squeaky chairs in sloped lecture halls, distributed it in etchings on stone tablets and inky letters on paper pages, even sought to hide it by blindfolding interlopers and erecting paywalls out of code.

    Today, the gates that keep knowledge in the hands of the few are coming down. As a global society, we are more literate, cooperative, and connected than ever before. By reading this right now, you are doing more than your early ancestors could have fathomed.

    How has that access to knowledge shaped your life?

    Who would you be without what you know?

    Is the right to seek knowledge freely currently protected where you live?

    What might a future with complete access to the sum of the world’s knowledge look like?

    What forms of knowledge are most important to you—and does your society recognize and value them?

    After all, knowledge isn’t just a set of random facts (“Hey Alexa, what’s the population of Budapest?”), nor is it only the theorems you memorized in school or the coding languages you use at work. It’s family recipes and whisper networks, the ways you tie your shoes and the muscles you recognize by name. It’s contested, nuanced, often inextricable from power and control. It affects the decisions we make, our health and happiness, even how we relate to each other.

    With your help, the Wikimedia Foundation wants to illustrate the expansive role of knowledge in human life. Send us your creative submissions on the theme of what open access to knowledge means to you. We’re accepting work in five categories: short films, visual art (including illustrations, photography, 3D renderings, etc.), poetry, short fiction, and creative essays. The deadline to submit is 11:59pm on 30 April 2019.

    In June, we’ll showcase the top entries under a Creative Commons license in our Heart of Knowledge digital anthology zine, share selected works through the Wikimedia Foundation’s YouTube channel and blog, and host an awards ceremony at the Wikimedia Foundation headquarters in San Francisco, California. The top submission in each category, as determined by our panel of judges, will also win a prize worth up to $350.

    Rules here.

    Submit here.*

    Adora Svitak, Communications Fellow
    Wikimedia Foundation

    *Please contact asvitak@wikimedia.org if you need an alternative method of submission.

    Joining the World Wide Web Consortium

    16:00, Thursday, 28 2019 March UTC

    We’re excited to announce that we’re becoming a member of the W3C, the main international standards organization for the World Wide Web.

    Founded by Tim Berners-Lee in 1994, W3C works with hundreds of organizations to ensure that the web’s basic building blocks—like HTML or CSS—remain consistent across browsers, platforms, and more. You can learn more about what W3C does over on Wikipedia.

    Joining the W3C fits right into our 2030 strategy, which calls on the Wikimedia movement to “become the essential infrastructure of the ecosystem of free knowledge, and [ensure that] anyone who shares our vision will be able to join us.”

    The underlying technologies and standards of the web are a core part of the infrastructure that can facilitate knowledge equity, and so to achieve our vision, we need to participate and collaborate in designing the future of the web.

    As part of working groups, we will be collaborating directly with other major stakeholders on the web. Through attending meetings, providing feedback, helping with the drafting of standards, and performing some of the technical work necessary to put standards together (as well as participating in the decision-making process of their design), we’re going to contribute to shaping a future of the web that helps everyone create and share free knowledge.

    “We are pleased to welcome the Wikimedia Foundation among our membership,” says Alan Bird, W3C’s Global Business Development Leader. “With their 2030 strategy and interests in so many of the areas we advance on the web, we anticipate that the Wikimedia Foundation’s participation will be key in building the services and structures that enable web users.”

    And we too are looking forward to collaborating with them.

    Gilles Dubuc, Senior Software Engineer (Contractor), Performance, Technology
    Wikimedia Foundation

    Quibble in summer

    10:47, Thursday, 28 2019 March UTC

    Note: this post has been published on 03/28 but has been originally written in September 2018 after Quibble 0.0.26 and never got published.


    The last update about Quibble is from June 1st (Blog Post: Quibble in May), this is about updating on progress made over the summer.

    Since the last update, Quibble version went from 0.0.17 to 0.0.26:

    For --commands one pass them as shell snippets such as: --commands 'echo starting' 'phpunit' 'echo done'. A future version of Quibble would make it only accept a single argument though it can be repeated. Or in other terms, in the future one would have to use: --command 'echo starting' --command 'phpunit' --command 'echo done'.

    The MediaWiki PHPUnit test suite to use is determined based on ZUUL_PROJECT. --phpunit-testsuite lets one explicitly set it, a use case is to run extensions tests for a change made to mediawiki/core and ensure it does not break extensions (ZUUL_PROJECT=mediawiki/core quibble --phpunit-testsuite=extensions mediawiki/extensions/BoilerPlate). On Wikimedia CI they are the wmf-quibble-* jobs.

    You can get great speed up by using a tmpfs for the database. Create a tmpfs and then pass --db-dir to make use of it. With a Docker container one would do: docker run --tmpfs /workspace/db:size=320M quibble:latest --db-dir=/workspace/db.

    In the future, I would like Quibble to be faster, it runs the commands in a serialized way and would be made faster by parallelizing at least some of the test commands (edit: done in 0.0.29).


    Changelog for 0.0.17 to 0.0.26

    • T196013 MediaWiki configuration injected by Quibble is now prepended at start of LocalSettings.php, that makes the configuration snippets available to wfLoadExtension() / wfLoadSkin().
    • T197687 - Fix Chrome autoplay policy which prevented Qunit tests to run for Wikispeech https://goo.gl/xX8pDD
    • T198171 - In Chrome do not rate limit history.pushState(), prevents some Qunit tests from passing since they overflow the limit.
    • T195918
      • Enhance inline help for --run and --skip by grouping group them in a stages argument group.
      • New --skip=all to skip all tests
    • T195084 T195918 - Support running any command inside the Quibble environment by using --commands (see below). They are run with a web server exposed (T203178).
    • T22471 T196347 - rebuildLocalisationCache after update.php, fix locking issues when doing the first page request, multiple requests were racing over generating the localization cache.
    • T200017 - Allow overriding the PHPUnit testsuite to run.
    • Do not spawn a WebServer when running PHPUnit tests, its is only needed for Qunit and Selenium tests.
    • Add a link to https://doc.wikimedia.org/quibble/ in the README.rst.
    • T192132 - Quibble is now licensed under Apache 2.0
    • T202710 - Xvfb no more listens on an unix socket.
    • T200991 - Passing --dump-db-postrun will dump the content of the database to the log directory (--log-dir). Thanks @Pablo-WMDE
    • Add support for Zuul cloner --branch and --project-branch, used to test MediaWiki-extensions-DonationInterface master branch against MediaWiki release branches.
    • The environment variable TMPDIR set by Quibble is no more hardcoded to /tmp, it now follows the logic of Python tempfile.gettempdir().
    • When running under Docker, default the log directory to be under the workspace instead of /log.
    • Allow specifying database data directory with --db-dir (default is the temporary directory based on environment variable).

    Telling the whole story of US women’s suffrage

    18:21, Wednesday, 27 2019 March UTC

    To celebrate the 100th anniversary of the enactment of the 19th amendment, the National Archives and Records Administration (NARA) is opening a museum exhibit called Rightfully Hers: American Women and the Vote. It will highlight both the told and untold history of women’s suffrage in America. Consider that even before the amendment’s passage, some women could vote for certain state and national offices depending on where they lived. And still other women (often women of color) could not vote even after it was ratified. The history of a nationwide policy change extends well beyond the letter of the law. Shifts must happen culturally and locally, too. Our course is working to capture and share those rich and complex stories.

    NARA is collaborating with Wiki Education to make sure Wikipedia best represents the full history of women’s suffrage. Through our virtual course, historians, librarians, and citizen archivists have gathered together to learn the Wikipedia editing skills required to present these stories to the public where they’re looking for it.

    Suffragist history often focuses on the well-known white players: Susan B. AnthonyElizabeth Cady StantonMargaret Fuller. It’s important to highlight the full story and the activists working consciously towards universal suffrage. That’s exactly what Wiki Scholars in our course did when they improved the Wikipedia biography of activist Ida B. Wells.

    Ida B. Wells, public domain.

    Wells laid the groundwork for creating change and fostering attitudes of acceptance leading up to the amendment. She founded the Alpha Suffrage Club in Chicago in 1913. The group helped young black women, who were excluded from mainstream suffrage debates, engage politically in and for their communities. The club also played a role in electing city and state officials of color. The two Wiki Scholars worked together across their respective disciplines of expertise to make sure Wells is well-represented on Wikipedia. Now, her article boasts many new sections and sources, providing a more complete picture of her life, legacy, and impact within the suffrage movement.

    Another Wiki Scholar in our NARA course expanded Wikipedia’s coverage of Native American civil rights, which remained restricted even after the passage of the 19th amendment.

    Wiki Scholars also created previously non-existent Wikipedia biographies for lesser-known suffragists who made big impacts. Helen Hoy Greely, for example, was an accomplished activist whose career spanned over five decades. Etta Haynie Maddox fought for women to be able to take the bar exam and practice law in the state of Maryland; she was the first woman in Maryland to practice law. And Mary McHenry Keith was a lawyer and social justice advocate, whose life and career also weren’t represented on Wikipedia before the course.

    Another Wiki Scholar poured their energy into a brand new article about the Prison Special, a train tour around the United States which drew attention to the stories of women who had been arrested, detained, or incarcerated as a result of participating in protests that promoted suffrage. The article was featured on Wikipedia’s main page on January 20th, and received 4,000 page views that day alone.

    The collaborative experience of our course fosters personal and professional skills identified by the American Historical Association that historians with PhDs said they wished they had learned in grad school. Read all about how our NARA course will help you hone your abilities to collaborate, communicate your field to the public, improve your intellectual self-confidence, and more.


    Interested in taking our course? Visit our landing page and sign up to receive updates about start dates. We’ll have another course this spring! To read about the personal experience of one Wiki Scholar, click here.


    HeaderFile:Mary Garrity – Ida B. Wells-Barnett – Google Art Project – restoration crop.jpg, public domain, via Wikimedia Commons.

    Today, the European Parliament voted 348–274 to pass a new copyright directive that includes problematic rules that will harm free knowledge. They did so after years of discussions, revisions, and more recently street protests. We believe that this is a disappointing outcome, the impacts of which will certainly be felt for years to come.

    As Articles 15 and 17 (formerly 11 and 13) of the directive will take effect across the European Union (EU), we expect to see direct repercussions on all online activities. Article 15 will require certain news websites to purchase licenses for the content they display. As a result, many websites that helped people find and make sense of the news may choose not to offer this type of service, making it harder to find high-quality news items from trusted sources online. Article 17 will introduce a new liability regime across the EU, under which websites can be sued for copyright violations by their users. This will incentivize websites to filter all uploads and keep only “safe” copyrighted content on their sites, eroding essential exceptions and limitations to copyright by making platforms the judges of what is and isn’t infringement.

    Still, there are elements to celebrate in the new directive. A new safeguard for the public domain will ensure that faithful reproductions of public domain works remain uncopyrighted, even as they are digitized. Museums, archives, and libraries will now be able to provide digital access to out-of-commerce works that have not yet fallen into the public domain. Research organizations and cultural heritage institutions will be able to engage in text and data mining on works they have lawful access to.

    While we are disappointed, the fight is not over. The impact of the copyright directive will be determined by how lawmakers in each country choose to implement it. As the copyright directive is implemented into national law over the next two years, it presents an opportunity for Europeans to proactively engage with policymakers and ensure national copyright protects internet freedom and empowers everyone to participate in knowledge. Many countries will be opening up their copyright law for amendments for the first time in years. Now is the time to advocate for the good and try to mitigate the harmful parts of the new EU Copyright Directive, and Wikimedia is committed to this task.

    Although Articles 15 and 17 remain in the directive, Wikimedians are already working to ensure that they are implemented safely and interpreted in the best possible light in national law, while also pushing for safeguards that benefit the public like freedom of panorama or user-generated content exceptions.

    It is disappointing that, in the end, the majority of members of the European Parliament chose not to listen to the millions of voices in Europe concerned about the direction this directive has taken. We look forward to making sure that national lawmakers in the EU member states will understand how their actions in future national legislation will affect internet freedom. Stay tuned to our blog and our public policy portal for future updates and ways you can help.

    Jan Gerlach, Senior Public Policy Manager, Legal
    Allison Davenport, Technology Law and Policy Fellow, Legal
    Wikimedia Foundation

    Wikimedia Commons, the repository for educational media content that hosts most of the images used on Wikipedia, has announced its photo of the year.*

    Nearly 3,500 people chose between 57 images in the final round of the competition. Jason Weingart’s Evolution—a composite timelapse showing the development and frightening expansion of a tornado—took the top prize.

    Coming in second was David Gubler’s photo of a lonely bucket train in Chile, and in third was Daniela Rapava’s remarkable shot of a soap bubble being overcome by ice.

    I got in touch with all three winners to learn more about the winning photographs.

    • • •

    Weingart created the picture of the year (above) from eight images he took while stormchasing in Dodge City, Kansas, United States, in May 2016. Of those, seven show the tornado itself, while the one on the far left demonstrates the storm’s structure.

    Weingart, who lives near Austin, Texas, then used Photoshop to blend them together for the final presentation. “I wanted to be the first person to use timelapse photography to document tornadogenesis,” he says.

    The high quality of the individual shots stemmed from Weingart’s choice to rig his camera to take one photograph for every second of video, which meant that he wasn’t stuck trying to cull individual frames from all his footage. (That video is available on YouTube.)

    Weingart originally designed and uploaded it to Wikimedia Commons for the Wiki Science Competition, a contest designed to encourage the upload of scientific educational media for use on Wikipedia and elsewhere. “My intention was to inspire people to learn more about the weather,” he says.

    Amusingly, however, far more people probably saw this shot after it was uploaded to Reddit and Facebook with the caption “[a] mass of tornadoes.” He has no idea who did this, and if we’re honest, it doesn’t seem likely that he ever will. However, it went so viral afterwards that it has its own dedicated page on Snopes, the famed fact-checking website. Their verdict was that the photo had been miscaptioned.

    So what is Weingart’s takeaway from all this? “A great lesson for people … is to fact-check things before you click share.”

    More of Weingart’s work can be found on his website.

    • • •

    Second place: An empty ore train passes in front of Chile's San Pedro volcano. Photo by David Gubler/Kabelleger, CC BY-SA 4.0.

    Second place went to David Gubler’s lonely ore train, a freight movement that winds its way through Chile’s remote Atacama Desert once or at most twice per day and terminates at the Bolivian border. Finding the train was the easy part, as it runs at the same time every day. Actually getting this photo was a bit harder, as even though the train doesn’t move particularly quickly, he and a friend would drive to a good photography location, grab the shot, jump back in the car, leapfrog ahead of the train, and rinse and repeat.

    Gubler thinks that his photo stood out to voters in the picture of the year competition because it “the combination of the colorful locomotives, red desert, snow-capped peak and blue sky is very appealing,” he says, and “the exotic scenery itself may have sparked interest, as I guess many voters are not very familiar with the Atacama Desert.”

    You can find more of Gubler’s work over on Wikimedia Commons.

    • • •

    Third place: A soap bubble slowly frosts over in freezing cold weather. Photo by Daniela Rapava, CC BY-SA 4.0.

    Finally, Daniela Rapava took third place in the picture of the year competition. A native of Slovakia, one of Rapava’s favorite subjects to photograph are frozen bubbles like this. She’s been doing it since 2014, and they remind her of the planets people are able to glimpse from Observatory Rimavska Sobota, where she took this particular shot.

    The bubble is made from water, detergent, and sugar, making it just robust enough to remain intact even while outdoors. It took about seven minutes to frost over in the below-freezing weather that day.

    Rapava’s other work is available on her website.

    Ed Erhart, Senior Editorial Associate, Communications
    Wikimedia Foundation

    *Photos of the year are selected from the preceding calendar year’s worth of new “featured” pictures, a marker of high quality that is awarded after a community vetting process. You can see all of the previous pictures of the year, going back to 2006, on Wikimedia Commons.

    Tech News issue #13, 2019 (March 25, 2019)

    00:00, Monday, 25 2019 March UTC
    TriangleArrow-Left.svgprevious 2019, week 13 (Monday 25 March 2019) nextTriangleArrow-Right.svg
    Other languages:
    Bahasa Indonesia • ‎Deutsch • ‎English • ‎français • ‎italiano • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎中文 • ‎日本語
    When we are to share in the "sum of all knowledge" we share what we know about subjects; articles, pictures, data. We may share what knowledge we have, what others have and that is what it takes  for us to share in the sum of all knowledge. The question is why should we share all this, how to go about it and finally how will it benefit our public and how will it help us share the sum of all knowledge.

    At the moment we do not really know what people are looking for. One reason is that search engines like the ones by Google, Microsoft and DuckDuckGo recommend Wikipedia articles and as a consequence the search process is hidden from us. We do not know what people really are looking for. However, some people prefer the "Wikipedia search engine" in their browser. We can do better and present more interesting search results. From a statistical point of view, we do not need big numbers to gain significant results.

    When we check what the "competition" does we find their results in many tabs; "the web" and "images" are the first two. The first is text based and offers whatever there is on the web. What we will bring is whatever we and organisations we partner with, have to offer. It will be centered on subjects and its associated factoids presented in any language.

    One template to consider is how Scholia presents. It differs. It depends on whether it is a publication, a university, a scholar, a paper. Large numbers make specific presentations feasible and thanks to Wikidata we know what kind of presentation fits a particular subject. A similar approach is possible for sports, politics. It takes experimentation and that is what makes it a Wiki approach.

    Thanks to this subject based approach, language plays a different role. Vital is that for finding the subjects potentially differing labels are available or become available. One important difference with the Google, Microsoft or DuckDuckGo approach is that as a Wiki, we can ask people to add labels and missing statements. This will make our subject based data better understood in the languages people support. Yes, we can ask people to have a Wikimedia profile and yes, we may ask people to support us where we think people looking for information have to overcome hurdles.
    Thanks,
           GerardM

    weeklyOSM 452

    10:15, Saturday, 23 2019 March UTC

    12/03/2019-18/03/2019

    Logo

    OpenStreetMap protest against Art 13 of the EU Copyright Directive 1 | © OpenStreetMap

    [Actual Category]

    • Michael Reichert suggests turning off OSM services or adding banners to protest against the EU plan to implement the new, controversial, Copyright Directive. However, the question remains as to why the protest against EU politics is more important than existing or proposed restrictions of freedom elsewhere.

    About us

    • The transfer of weeklyOSM to the FOSSGIS server and the subsequent necessary maintenance work has been completed. Users might have noticed some minor interruptions in service last weekend whilst this was taking place. We would like to thank Peter Barth and Andreas Hubel for the transfer. We especially want to thank Anthony Bennett aka @internetgeog of www.internetgeography.net for many years of administration of the site, which he has maintained exemplarily since 2014.

    Mapping

    • Jan S proposes better differentiation between various kinds of police facilities. Jan has drafted a proposal and is seeking comments.
    • Members of the Polytechnic University of Milan (Politecnico di Milano) created a deforestation mapping wiki page to improve the understanding and mapping of land coverage to make analysis and visualisation of environmental issues such as deforestation easier.
    • Paul Allen asks whether superroutes are good, bad or ugly. In his post on the tagging mailing list he mentions some pros and cons of the controversial construct. During the discussion it was also asked whether relations of type=route containing child relations should be preferred over superrelations.
    • Antoine Riche pointed to undiscussed wiki changes about cycleway tagging which resulted in a long discussion about how to deal with oneway restrictions.
    • German mappers are voting on the deprecation of relations of type associatedStreet in Germany.

    Community

    Imports

    OpenStreetMap Foundation

    • The minutes of the OSMF Licence Working Group meeting of 14 February have been published.
    • The OSM Operations Team announced on Twitter that one of OSM’s tile caches received a Denial of Service attack by someone using Amazon servers and an Load Impact User-Agent.

    Events

    • The FOSS4G North America, an open geospatial technology and business conference, will take place from 15 to 18 April 2019 in San Diego, USA. The program is available on the conference’s homepage.
    • The next mapathon of MAMAPA will take place on 28 March at the Abendakademie in Mannheim. Beyond the humanitarian objective of the Mapathon, MAMAPA promotes the integration of new immigrants in Germany, in which locals and migrants map in tandem. Through the joint work prejudices and stereotypical thinking are reduced and mutual trust is strengthened.
    • The 7th (seventh!) annual conference organised by OpenStreetMap France will take place from 14 to 16 June in Montpellier (34), on the campus of the Paul Valéry University. Proposals for presentations in 5′, 25′, 55′ or even 2-hour workshops will be accepted until 15 April.

    Humanitarian OSM

    • Russell Deffner calls for help with mapping flooded areas in Mozambique and Zimbabwe exacerbated by Cyclone Idai. More information about current tasks and how you can help can be found on the corresponding wiki page.
    • Nina Strochlic covers in National Geographic how the Bidibidi refugee camp in North Uganda is transforming into a permanent city. She points out how humanitarian mapping, HOT in particular, has helped in documenting the transformation and provided valuable data for planning and managing the quickly evolving area.
    • A story on MapGive, an initiative of the U.S. Department of State’s Humanitarian Information Unit, details how updated aerial imagery has helped crowdsourced humanitarian mapping. MapGive helped mapping efforts of Missing Maps in Uganda and Bangladesh by providing support with current imagery. The blog post shows differences in aerial imagery over the years and a before and after comparison of OSM data around refugee camps in Northern Uganda.

    Open Data

    • The regional conference and conference on Latin American Open Data Abrelatam19 will be held at Quito from 28 to 30 August 2019.

    Software

    • TriMet in 2009 launched the open source trip planner “OpenTripPlanner” that combines public transport transit with walking and biking. The tool, which can use public transport data in General Transit Feed Specification “GTFS” format and is based on OSM data, gained some popularity within the OSM community. The company announced a new version of its tool that can also incorporate data from Uber, SHARE NOW (formerly car2go) and BIKETOWN and uses real-time locations of vehicles and bikes to calculate routes. A beta version with data for Portland, OR, USA is available.
    • The Android app OSMTracker, which is popular with OSM mappers, no longer shows maps as a background to tracks. OSMTracker uses the generic user-agent from the osmdroid library which was recently blocked by the OSM Operations Team. The tile usage policy requires application-specific user agents.
    • Fabian Kowatsch, of HeiGIT, announced that the HeiGIT Big Spatial Data team published a public world-wide instance of the ohsome OSM History Analysis Platform including an API, an updated dashboard and the beta prototype map interface ohsomeHEX (OSM History Explorer). All this allows you to analyse the evolution of any OSM tag for arbitrary regions and time periods in a simple way. Several examples are given in earlier blogs which now can be adapted for arbitrary regions around the world. Ideas for further enhancements are most welcome.

    Releases

    • Version 5 of Openrouteservice was published. The service, developed by The Heidelberg Institute for Geoinformation Technology, bears the name of the Jupiter moon Io.

    Did you know …

    • … the OSM Software Watchlist (release status) by Wambacher? It has been extended to cover another 15 OSM applications.
    • … the tagging of a centre turn lane? This is described using the tags lanes:both_ways= 1 and turn:both_ways = left.

    Other “geo” things

    • Dr. Michele M Tobias asked #gischat on Twitter “Can you tell your GIS/mapping career story in one tweet?” Read the nice stories and please tell your OSM story as well. 😉
    • As Marius Watz has tweeted, the Norwegian municipality Trysil decided not to name streets but numbered thousands of cabin buildings instead. The website nrk.no reported that the step caused chaos for tourists, desperate cottage owners and emergency services.
    • Bryan Housel asks for help to “link branded businesses in OpenStreetMap to wikidata”.
    • The Guardian reports on desire lines: “illicit trails that defy the urban planners”. (Note: there is a small amount of data in OSM identifying such paths, see 1 and 2).
    • In an article on Medium, Caitlin Dewey covers the story of the Fruit Belt neighbourhood in Buffalo, New York which had been replaced on Google Maps by something called “Medical Park”. The article follows the residents and their investigation into how Google may have acquired inaccurate neighbourhood data from Pitney Bowes and how that came to be. She also points out how little control the community had over the mislabelling. In comparison, the neighbourhood had only been added to OSM a few days ago with plenty of room for map improvements in the area.
    • Jorge Andres, a volunteer cartographer for MapAction, discusses, on their corporate blog, how to most effectively represent topographical relief when mapping volcanic hazards. He suggests that 3D visualisations are more effective, and in 2D contour lines tend to obscure significant details.
    • Tom Lee, Mapbox’s lead on policy, testified before the US Senate Judiciary Committee regarding privacy law reform.

    Upcoming Events

    Where What When Country
    Greater Vancouver area Metrotown mappy Hour 2019-03-22 canada
    Tokyo ミャンマーに絵本と地図を届けよう~ミャンマーに届ける翻訳絵本作り&自由な世界地図作り~ 2019-03-23 japan
    Cork Mapping Party @ UCC 2019-03-23 ireland
    Bremen Bremer Mappertreffen 2019-03-25 germany
    Joué-lès-Tours Rencontre Mensuelle 2019-03-25 france
    Graz Stammtisch Graz 2019-03-25 austria
    Portmarnock Erasmus+ EuYoutH_OSM Meeting 2019-03-25-2019-03-29 ireland
    Zurich Missing Maps Mapathon Zurich 2019-03-27 switzerland
    Montpellier Réunion mensuelle 2019-03-27 france
    Université libre de Bruxelles (ULB) National Mapathon 2019-03-27 belgium
    UCL Louvain-la-Neuve National Mapathon 2019-03-27 belgium
    KUL Leuven National Mapathon 2019-03-27 belgium
    UMONS Mons National Mapathon 2019-03-27 belgium
    Montrouge rencontre locale des contributeurs de Montrouge et alentours 2019-03-27 france
    Lübeck Lübecker Mappertreffen 2019-03-28 germany
    VUB Brussel National Mapathon 2019-03-28 belgium
    Mannheim Mannheimer Mapathons 2019-03-28 germany
    ULIEGE Liège National Mapathon 2019-03-28 belgium
    UNAMUR Namur National Mapathon 2019-03-28 belgium
    UGENT Gent National Mapathon 2019-03-28 belgium
    Düsseldorf Stammtisch 2019-03-29 germany
    UCL Louvain-la-Neuve National Mapathon 2019-03-30 belgium
    ULIEGE Liège National Mapathon 2019-03-30 belgium
    Bochum Mappt die Innenstadt – Mappingtag für Einsteiger*innen und Fortgeschrittene 2019-03-31 germany
    Stuttgart Stuttgarter Stammtisch 2019-04-03 germany
    Bochum Mappertreffen 2019-04-04 germany
    Nantes Réunion mensuelle 2019-04-04 france
    Dresden Stammtisch Dresden 2019-04-04 germany
    La Riche La Riche (37)#Ateliers d’initiation à OpenStreetMap 2019-04-06 france
    Kyoto お花見!オープンデータソン in 京都 2019-04-07 japan
    Rennes Réunion mensuelle 2019-04-08 france
    Bordeaux Réunion mensuelle 2019-04-08 france
    Essen Mappertreffen 2019-04-08 germany
    Taipei OSM x Wikidata #3 2019-04-08 taiwan
    Lyon Rencontre mensuelle pour tous 2019-04-09 france
    Munich Münchner Stammtisch 2019-04-09 germany
    Salt Lake City SLC Mappy Hour 2019-04-09 united states
    Viersen OSM Stammtisch Viersen 2019-04-09 germany
    Cologne Köln Stammtisch 2019-04-10 germany
    Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
    Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
    Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
    Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
    Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
    Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
    Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
    Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

    This weeklyOSM was produced by Nakaner, Polyglot, Rogehm, SK53, Softgrow, SunCobalt, TheSwavu, YoViajo, derFred, kartonage, muramototomoya.

    Older blog entries