Today is Wikipedia’s 15th birthday! To commemorate the event, our friends in Argentina have shot a documentary about seven editors on Wikipedia or the various Wikimedia projects. It has English subtitles.
Wikimedia Argentina, a local Wikimedia chapter, told us that they often find that “there is a general ignorance about who edits Wikipedia and writes the encyclopedia’s articles.” Many, they say, believe that Wikipedia editors are hired or are selected for their expertise in specific areas.
These experiences heavily influenced Wikimedia Argentina’s decision to create a video documentary that showcased the editors of Wikipedia. They want to “break the different myths that we found in the collective imagination,” by “showing the first-hand personal stories of different Wikipedians.”
The featured volunteers, all people who faced Wikipedia’s challenges in their first edits but kept trying and helping to approach Wikipedia to others, are:
Beatrice, photographer. Her main contributions are in Wikimedia Commons.
Alberto Robles, a union lawyer. Wikipedia editor since 2006.
Andrea, a psychologist and visual artist. She leads the WikiProject:Women
Jorge fan of the astronomy. Wikipedia editor in Spanish since 2006.
Mauritius, student of librarianship . Leads the digitization project.
Leandro, student of tourism. Editor of Wikipedia since 2004, former librarian.
Lucas Reynoso, secondary student. He is the youngest member of Wikimedia Argentina.
Giselle Bordoy, Communications Manager, Wikimedia Argentina Ed Erhart, Editorial Associate, Wikimedia Foundation
Article edits become musical bubbles. User languages become maps. Volunteer communities become circles of connections. World history becomes a sliding timeline.
Read on for 15 of our favorite data visualization projects from the last 15 years of Wikipedia.
In musical bubbles, recent Wikipedia edits pop across the screen of “Listen to Wikipedia.” The pitch of each note and the size of the bubble indicate the size of the edit. Small changes are high-pitched blips while big changes echo. Watch the top of the screen. You might see a new user announcement set to the sound of strings. Wikipedia is always evolving. And in “Listen to Wikipedia,” you’ll hear the live symphony of its change.
What is the terrain of a language on Wikipedia? What parts of the world are known in Hebrew but invisible in Japanese? In “Terra Incognita,” Wikipedia articles with geo-code data are mapped back to the parts of the world they represent and color coded per language. Select a few languages and explore the planet as a constellation of known and unknown spaces. It’s a colorful Wikipedia world out there, but there is so much terrain to discover.
Which topics do men edit more often than women on Wikipedia? That’s the question that behind the interactive “Wikipedia Gender” chart, designed with a data set in 2012. Along the center y-axis run articles that are revised increasingly more by female contributors. Along the x-axis run articles crafted by more men. In the center of the graph are articles like “Mr. Roger’s Neighborhood” and “A Tale of Two Cities” – which share a near equal gender parity.
As the discourse around AIDS/HIV has changed, so has its Wikipedia article. In “Backstory,” 13 years of revisions on the English-language AIDS/HIV Wikipedia article are charted. The study reveals a constant, contested, and collaborative effort to understand this global pandemic.
Ralph Straumann, Mark Graham, Bernie Hogan, and Ahmed Medhat
What is the online visibility of African nations? In a series of distorted maps, “Africa on Wikipedia” shares a striking set of answers. Using 3.7 Million Wikipedia articles from 2013, and searching for topics geotagged into African nations, the project renders a map of Africa created solely by free knowledge. Then it offers a comparison map on the right. Cross-reference ‘population’, ‘internet connection,’ ‘gross domestic product’ and more against the density of Wikipedia articles to consider how much more visibility is needed for Africa today.
World history is on a sliding scale in “Histography.” Across a resizable timeline, small black dots mark historical milestones. Click to select a moment, and explore the Wikipedia sources. Play with the categories to filter history to ‘music’, ‘inventions’, ‘riots’ and more. Time has rarely appeared so succinct. But within each dot there are thousands of further paths on the Wikipedia articles through history and culture.
Since at least 2008, Wikipedia editors and readers have documented a phenomenon dubbed “getting to Philosophy.” It’s a simple set of rules that slowly but overwhelmingly link a topic back to the Wikipedia page for Philosophy. As of May 2011, 94.52% of articles could be successfully be connect to Philosophy. In the “Wikipedia Radial Graph” the simple rules are followed automatically to connect any search term back to the Philosophy root. Can you find one of the few articles that doesn’t connect?
A year is a long time, demonstrated by pageview patterns on Wikipedia. This project takes pageview data from the twelve months of 2013 and graphs them to show the evolution of popular topics as told by the Italian-language edition of Wikipedia. From film and television to sports and current events, the infographic documents how Italian readers navigated through the year.
Dario Taraborelli, Giovanni Luca Ciampaglia, and Moritz Stefaner
Whether or not to delete a Wikipedia article is a tough decision, and one that can lead to long and drawn-out discussions. This project maps those discussions—an editor suggesting to delete the article results in a swing to the center, while an editor arguing to keep pushes the branch to the right. The result is a tree of deletion discussions documenting visually the hundred longest such conversations of Wikipedia’s history.
This project takes more than ten million items from Wikidata, the free knowledge base providing public-domain data for the Wikimedia projects, and sorts them into broad, colorful categories. Almost a quarter of these refer to individual human beings, while 25,000 refer to association football clubs.
Float through 100,000 of 2014’s most popular Wikipedia articles in the “WikiGalaxy.” Here, related topics are grouped as 500 ‘nebulae’ with colorful points of light referencing articles in dark 3D space. Try ‘fly mode’ to experience the joy of encountering random articles in any and all directions. This is Star Trek for browsing the world’s greatest public library.
Wikipedia has been a collaborative platform almost since its inception, which while allowing for exponential growth, has also caused some teething issues. In “Articles of War”, the most prolific and, ultimately, lamest edit conflicts are tallied and presented as squares of varying size on a virtual pin board. Are humans “owners” or “companions” of cats? Is sulphur sulfur? Which millennium is 2000 in? All debates wrought by Wikipedians in years past.
A project still in the works, “Omnipedia” aims to provide all possible information on any topic by collating information from up to 25 of Wikipedia’s hundreds of language editions. With eight million concepts to scour, the project provides a thorough overview of pretty much everything written about pretty much every topic. It highlights how different languages and cultures handle these data, as well as covering gaps and exploring different avenues of information.
Since 2012, the “Wiki Loves Monuments” project has identified international landmarks lacking freely-licensed photographs. Then photographers around the world can work on this checklist, identifying existing media or capturing new photos to illustrate the monuments. This handy “Map of Wiki Loves Monuments” from 2013’s data enables fast exploration of the itemized cultural heritage sites. Blue markers show monuments with photos. Red markers indicate a photo is still needed. How does your town look?
Wikipedia is built by people, and in the “Wikimedia Community Visualization” it becomes clear how deeply users around the world collaborate. Beautifully illustrated for 27 language groups plus Wikimedia Commons and Meta Wiki, these graphs show users as single dots, with lines connecting users who have corresponded via ‘Talk’ pages. Explore any language group and you’ll find bright blue centers of social interaction. These are some of Wikipedia most talkative users, with spokes of connection branching out. The visualization offers a detailed glimpse at the human network behind Wikipedia. The society of free knowledge is shaped like this.
Haoting Zhang, Communications Design Intern Zachary McCune, Global Audiences Manager Wikimedia Foundation
Adapted from Katherine Maher’s post on the WMF blog.
As Wikipedia marks its 15th anniversary, its community is celebrating with nearly 150 events on six continents.
This Friday marks the 15th anniversary of Wikipedia, the world’s free encyclopaedia that anyone can edit. This week, we celebrate not just Wikipedia, but the birth of an idea: that anyone can contribute to the world’s knowledge. Globally, readers and editors are coming together to celebrate, with nearly 150 events across six continents. From editing marathons in Bangladesh and lectures in Switzerland, to picnics in South Africa and a conference in Mexico, the world is celebrating the joy of knowledge.
Wikipedia launched on January 15, 2001 with a bold vision: a world in which every single human being can freely share in the sum of all knowledge. At the time, the idea that people around the world would collaborate to build an encyclopaedia—for free—seemed unbelievable. Since then, Wikipedia has grown to more than 36 million articles in hundreds of languages, used by hundreds of millions of people all over the world. Wikipedia and its sister projects are still built by volunteers around the world: each month, roughly 80,000 volunteer editors contribute to Wikimedia sites.
“Wikipedia challenged us to rethink how knowledge can be gathered and shared” said Wikipedia founder Jimmy Wales. “Knowledge is no longer handed down from on high, instead it is freely shared by everyone online. Wikipedia seemed like an impossible idea at the time—an online encyclopaedia that everyone can edit. However, it has surpassed everyone’s expectations over the past 15 years, thanks to the hundreds of thousands of volunteers around the world who have made Wikipedia possible.”
We’re celebrating Wikipedia’s global community with a commemorative website and week-long campaign, collecting and sharing the stories of individuals and organizations that have helped develop Wikipedia into the world’s largest collection of collaboratively created free knowledge. These stories show the truly global nature of the Wikimedia community: from Ziyad Alsufyani, a medical student at Taif University in Ta’if, Saudi Arabia who has been editing the Arabic Wikipedia since 2009, to Susanna Mkrtchyan, a professor and devoted grandmother working to give Armenian students better educational opportunities. We will continue to collect stories throughout the month of January.
Today, we celebrate all of the projects, partnerships, events, and joy the Wikimedia movement has inspired over the past 15 years, with many still to come. Wikipedia is much more than a website. Wikipedia and its sister Wikimedia projects represent a global, ever-expanding resource and community for free knowledge. Here are just a few examples:
Wikipedia started in January 2001 in English, but soon expanded to other languages—within the first year, it grew to 18 languages. Today, it is available in nearly 300.
Volunteers constantly edit and improve Wikipedia. Every hour, roughly 15,000 edits are made to Wikipedia. Every day, around 7,000 new articles are created.
Wikipedia became one of the top 10 websites in the world in 2007, and the only non-profit website anywhere near the top.
It’s not just Wikipedia. There are 11 other Wikimedia free knowledge projects, including Wikimedia Commons, with more than 30 million freely licensed images, as well as Wiktionary, Wikisource, Wikivoyage, and more.
The Wikimedia community supports global projects that spread the joy of knowledge. Wiki Loves Monuments, a global photo competition, launched in 2010 to document images of cultural heritage. In 2011, the contest was named the largest photo competition in the world. Companion projects like Wiki Loves Earth,Wiki Loves Africa, and even Wiki Loves Cheese document more knowledge from around the globe.
Volunteers around the world have built hundreds of partnerships with galleries, libraries, museums to make institutional collections more broadly available. These partnerships have contributed to more than 1.5 million images of cultural works on the Wikimedia projects.
If you’d like to help celebrate Wikipedia’s 15th anniversary, you can share on social media what Wikipedia means to you by tagging @Wikipedia and using the hashtag #wikipedia15. To learn more about Wikipedia and the joy it inspires, visit 15.wikipedia.org.
One of the big outstanding questions for many years with Wikipedia was the usage data of images. We had reasonably good data for article pageviews, but not for the usage of images – we had to come up with proxies like the number of times a page containing that image was loaded. This was good enough as it went, but didn’t (for example) count the usage of any files hotlinked elsewhere.
In 2015, we finally got the media-pageviews database up and running, which means we now have a year’s worth of data to look at. In December, someone produced an aggregated dataset of the year to date, covering video & audio files.
This lists some 540,000 files, viewed an aggregated total of 2,869 million times over about 340 days – equivalent to 3,080 million over a year. This covers use on Wikipedia, on other Wikimedia projects, and hotlinked by the web at large. (Note that while we’re historically mostly concerned with Wikipedia pageviews, almost all of these videos will be hosted on Commons.) The top thirty:
(Full data is here; note that it’s a 17 MB TSV file)
It’s an interesting mix – and every one of the top 30 is a video, not an audio file. I’m not sure there’s a definite theme there – though “public domain history” does well – but it’d reward further investigation…
The Wikimedia Foundation (WMF) today took an important step toward running its finances in a more transparent and accountable way. Luis Villa, the WMF’s Senior Director of Community Engagement, explained (see video clip below, ~10 minutes) how the organization had incorporated a November 2015 statement in which the Funds Dissemination Committee (FDC; elected by the Wikimedia community, and largely independent of the organization):
…laments that the Wikimedia Foundation’s own planning process does not meet the minimum standards of transparency and planning detail that it requires of affiliates applying for its own Annual Plan Grant (APG) process.
That statement was a strong and pointed criticism — something that is not always well received in the Wikimedia world (or, for that matter, in the world at large). It’s criticism I echoed in a recent blog post, about the lack of transparency around major grants awarded to the WMF.
Nevertheless, the Board of Trustees approved the FDC’s overall recommendation without public comment on December 9, 2015; and today, Villa’s more detailed commentary reveals that the organization understands the problem; that the recommendation was timely, and therefore easy for the WMF to incorporated into its Annual Plan development process; and that the WMF has modified its internal processes to prioritize the kind of external review recommended by the FDC.
Kudos to all who brought this about — to the FDC, for taking the first step, and to the WMF, for incorporating legitimate criticism in a healthy way. Wikimedia’s stakeholders will be well served by the process, and I look forward to seeing the outcomes.
Shortest way from Dyserth, Wales to Kamienna-Stara 1
Mapping
Osmose was released in mid-December for Australia. Andrew Davidson notes that the most common mistake is “approximate waterway”.
Hakuch proposes to remove the tagging of name_1 and alt_name_1 from the wiki.
The Belgian-Dutch border might change this year. There is already a discussion on this change in this note.
Take part in the MapRoulette Challenge which is about checking if there’s a pedestrian street crossing (crosswalk) or not. This verification goes on until it’s all done.
This site evaluates the number of POI’s for a given city.
Votings on proposals:
topic
link
end date
amenity=swimming_pool versus leisure=swimming_pool
Miguel Garcia from the Spanish blog Culturatorium, which deals with art, technology and culture in general, has portrayed Manuel Torío and his activities in OSM.
OpenCageData interviews Kokou Elolo Amegayibo (and others) about OpenStreetMap in Togo (West Africa)
Paul Norman identified the total number of currently active Mappers from 2007 to the end of 2015 and displays these as a graphic in his blog. One can clearly see a constant increase with seaonal fluctuations.
Santiago La Rotte published in the Colombian magazine “El Espectador” an article entitled “Discover what digital maps can serve.”He describes the situation after the avalanche of Salgar of 18 May 2015. (automatic translation)
Nice Video about how to host an OSM mapathon available also in the HOTOSM channel.
Maps
The OpenRailwayMap Blog reports that Austrian main semaphore signals and speed light signals have been rendered for a few weeks.
The developer of Luna Render (announced in one of the recent weeklyOSM reports) now presented a map style related to old Soviet military maps.
Alexander Matheisen reports at talk mailing list that OpenLinkMap will be shut down on January 27. A successor and maintainer is very welcome. The code is on github.
You can’t be everybody’s darling. Nevertheless Google tries it with GoogleMaps, if it comes to China.
switch2OSM
Carnaval is very important in Germany, especially in Cologne. The city of Cologne published a special/temporary map for the so called 5th season.
Falk, a former producer of printed maps launched Tiger Geo – an Outdoor GPS navigation device which uses OSM based maps.
Software
Thanks to OpenStreetMap and beautiful Overpass API you now can get all information for a particular point with the brand-new QGIS plugin. OSMInfo shows information about objects from OpenStreetMap using Overpass API.
Vectiler is a OSM vector tile obj builder and exporter with ambiant occlusion baking
Dgoodayle published screenshots of his Unity3d test code. He generated from OpenStreetMap data City models to render in real time, which he wants to use for a game development.
Plans for the geoscience revolution: Rob Rose, the new director of William & Mary’s Center for Geospatial Analysis, talks about his plans to increase the scope of GIS skills and concepts throughout the university.
Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please don’t forget to mention the city and the country in the calendar.
This weekly was produced by Nakaner, Rogehm, bogzab, derFred, escada, jinalfoflia, mgehling, stephan75, wambacher.
The much-debated belief that a supreme being created the physical universe was the most-edited article in Wikipedia’s first year, 2001. But in the genesis of the open-sourced encyclopedia powered by the world, that didn’t amount to a lot of edits: just 179 edits made it the top article for all of Wikipedia’s first year.
As Wikipedia turns 15 on Friday, things have evolved a great deal. The English-language Wikipedia is just one of many Wikimedia projects today, and it’s 5,053,647 articles have now been edited 808,187,367 times (as of press time). Parsing those edits by year brings into focus a fascinating evolution of the current millennia online, revealing not just what people clicked on or read or even shared. The most-edited articles show what Wikipedians created and shaped and gave to the world as this millennia began to unfold.
The most-edited article this past year, Deaths in 2015, inspired 18,271 edits[1]—more than 100 times the total that put creationism on top 15 years ago. Deaths in [year] are staples in the English Wikipedia’s editing history; they are the most-edited articles in each year between 2007 and 2015, and the second-most in 2001, 2003, and 2006. As such, we’ve mostly omitted them in the list.
Obituaries of the year show up through the list of each year’s most-edited articles, but some surprises and zeitgeist zings also pop up among the most edited articles in each of Wikipedia’s 15 years. The aggregate list from all 15 years follows at the bottom.
Our thanks go to Stephen LaPorte for obtaining the year-by-year data. You can see the full list on Hatnote.
2001: Most edited article: Creationism (149 edits) Also on the list at #3: Feminism, which today boasts 225 references and resources at Wikipedia’s sister projects, Wikimedia Commons, Wikiquote, and Wikiversity. Painting by Michelangelo, public domain
2002: Wikipedia’s Main Page got the most work in its second year, with 449 edits, after being moved from “homepage” (now an actual article). Also on this list in Year 2: List of Canadians, topped by the end of that year by author Margaret Atwood and songwriter Leonard Cohen. Screenshot by English Wikipedia editors, freely licensed under CC BY-SA 3.0.
2004: Wikipedia got political in 2004, with Republican George W. Bush getting the most scrutiny from editors with (5,527 edits). After a community vetting process, the article is today rated as a “good” article. Coming in second, as in other arenas that year, was Democrat John Kerry. Photo by Matthew Trump, freely licensed under CC BY-SA 3.0.
2005: George W. Bush was again the most edited article, this year with 20,894 edits; the havoc-wreaking Hurricane Katrina was right behind the president. #3 got meta: the Wikipedia article on Wikipedia. Photo by Petty Officer 2nd Class NyxoLyno Cangemi, public domain
2006: That year’s Lebanon War drew 15,067 edits. #3 on the list was Wii (12,735 edits), the home video game console released by Nintendo on November 19 of that year. Photo by Evan-Amos, public domain
2007: The annual deaths of the year topped the list, as it did for the following eight years. It was followed by List of WWE personnel (9,825 edits)—one of the most-edited pages of all time, even though it only appears on this list once. #3 was the Virginia Tech shooting (9,116 edits) that killed 32 in April 2007. Photo by Megan Elice Meadows, freely licensed under CC BY-SA 2.0.
2008: #2 was a controversial personality much discussed in that US election year: vice presidential candidate Sarah Palin (10,429 edits). #3 got meta again: 2008 (9,155 edits). Photo by Bruce Tuten, freely licensed under CC BY-SA 2.0.
2011: The Arab Spring had an impact on Wikipedia as well as the Middle East: the #2 and #3 most-edited articles were Libyan Civil War (9,311 edits) and Egyptian Revolution of 2011 (6,470 edits), respectively. Photo by Mona, freely licensed under CC BY 2.0.
2012: Unrest popped up again with the Syrian Civil War (#2; 7,760 edits), but popular culture took #3: the absolutely inescapable viral pop music hit “Gangnam Style” (5,028 edits), still the most-viewed video on YouTube. Photo by Eva Rinaldi, freely licensed under CC BY-SA 2.0.
After getting the data for each year, we wondered what the most-edited articles on the English-language Wikipedia for all 15 of its years. It turns out that it’s a controversial president who served for eight years, followed by wrestlers.
George W. Bush, 2004 and 2005’s most edited article, topped this list—an unsurprising place, given that it’s was named as the most controversial topic on the English Wikipedia in Global Wikipedia, published in 2014. Several other articles on this list are mirrored on the most controversial list, such as list of World Wrestling Entertainment, Inc. personnel, which came in second by only a few thousand edits. History comes up with Adolf Hitler and World War II, while religion includes the Catholic Church and the article for Jesus.
Of course, Wikipedia’s oddities also spring up. While Britney Spears is a world-famous music artist, one would probably not put her into world history alongside the Catholic Church, Adolf Hitler, or World War II. Also making appearances is the list of programs broadcast by ABS-CBN, one of oldest television stations in Asia and the first in the Philippines, and Wikipedia’s meta article on itself
January 15th marks Wikipedia’s 15th birthday. In those 15 years, the encyclopedia that anyone can edit has become one of the few resources that people can use to research a topic, or settle a bar bet.
But among academics, Wikipedia has had a rockier path to acceptance. Today, the Wiki Education Foundation has worked with 478 instructors at 282 universities as they assigned their students to contribute content to Wikipedia for their coursework. We’ve supported more than seventeen thousand students across the United States and Canada. But it started with just a handful of pioneering instructors.
Origins
In 2000, MIT launched a major open-educational resource initiative, the OpenCourseWare project. That project became a siren call for universities to open up access to their own materials.
When Wikipedia came to the scene in 2001, it quickly became a go-to source of information for students. For many instructors, though, it mostly inspired hand-wringing.
Certainly, Wikipedia’s standing as a “source” for papers hasn’t improved, nor should it. Nobody should rely on an encyclopedia in academic research. It can be a starting point and should never be an end point, and that’s nothing new.
But Wikipedia has gained traction in academia. As early as 2002, some instructors challenged students to understand Wikipedia as a writing platform. That assignment spread to universities such as Columbia, Cornell, Dartmouth, Harvard, MIT, and Yale. These courses typically asked students to improve Wikipedia’s content related to the course, in addition to, or instead of, writing a traditional term paper. A mantra emerged: “Don’t cite it, write it.”
At that time, Dr. Anne McNeil began teaching a PhD course, “Physical Organic Chemistry”, at the University of Michigan. She’d ask chemistry students from four PhD programs to expand Wikipedia articles on chemistry. Her colleagues immediately saw the value.
“Everyone thought it was a cool idea,” McNeil said. “We often ask students to write ‘reviews’ of the literature, but then the only readership is the faculty member. Here, everyone — both students and faculty — saw the relevance of digesting the literature and then adding this new content to Wikipedia. Everyone benefits from the effort.”
Some resistance
Support for the idea wasn’t universal. Taking on a Wikipedia assignment back then was a bold stance. Though Nature published research in 2005 suggesting that Wikipedia was only slightly less reliable for certain science topics than Britannica, instructors still tended to consider Wikipedia only from the perspective of the student reader, not the student contributor.
“Wikipedia was not perceived as a learning tool when I started teaching with it in 2004,” said Dr. Bob Cummings, at the University of Mississippi, who remembers many of his colleagues still hadn’t heard of it by then. “If they had heard about it, they linked it to plagiarism. Many students were citing it as a source, but taking more text than they should. So most educators only conceived of it as an internet source in student research papers.”
“I thought of it as more of a curiosity than anything else,” said Dr. Diana Strassmann, another pioneering Wikipedia educator. She first taught with the assignment at Rice University in 2007, asking students to write biographies of feminist economists.
She incorporated Wikipedia into a different course she began teaching in 2011, asking students to create substantive contributions on topics relating to poverty, justice, and gender.
“I was increasingly aware of how Wikipedia did not well-cover the subject matter covered in my courses, and of its growing use as a resource,” Strassmann said. “It seemed far more impactful for my students to create material for the encyclopedia than to write papers just for me.”
She continues to teach with Wikipedia for unique assignments at both Rice University and the University of Chicago.
Other early adopters and advocates include Adrianne Wadewitz, whose work on behalf of education and open knowledge cannot be adequately described in the space of this essay. Though we lost Adrianne in 2014, the collaborative essay she drafted, alongside education pioneers Anne Ellen Geller and Jon Beasley-Murray, is a significant milestone for Wikipedia and education. The trio shared thoughts from a 2010 panel discussion at the Writing Across the Curriculum conference, and formed a thoughtful and influential “manifesto”, “Opening up the academy with Wikipedia.”
A turning point
The success of these pioneers, and a handful of others, helped inform the Wikipedia Education Program when it launched in 2010. That pilot project sought to expand and improve public policy topics on Wikipedia by connecting university students in the field to contribute content.
That marked a turning point for Wikipedia, connecting instructors to share ideas about what works, and what doesn’t. Soon after, in another milestone, a major academic association, the Association for Psychological Science, announced its own Wikipedia Initiative.
“The endorsement of the APS was a very big milestone,” Cummings said. That initiative called on its members not only to contribute to Wikipedia, but to encourage students to do so as well.
Since then, learnings from hundreds of classrooms have been poured into training materials and guidebooks to help instructors build assignments that help students learn about, and contribute to, Wikipedia.
In the last two years, other academic associations have endorsed Wikipedia assignments. The Midwest Political Science Association, the National Women’s Studies Association, the American Sociological Association, the American Society of Plant Biologists, and the Linguistic Society of America have all launched their own initiatives through partnerships with the Wiki Education Foundation.
By 2014, more than 98% of undergraduates said they used Wikipedia. Crucially, the action taken most often with Wikipedia’s content among those undergraduates was verification — checking to see if what Wikipedia said accurately reflected what they’d read in class. That tells us that as Wikipedia has permeated the culture, students are becoming more aware of how to read Wikipedia through a critical lens.
Similarly, as the process of building a Wikipedia article became better understood, more instructors are seeing its value.
“The first key shift was moving the professorate from the seeing Wikipedia as a flawed source, to seeing it as a platform for collaborative knowledge creation,” said Cummings. “Teachers who resisted the use of Wikipedia often saw it as a part of the overall degradation of information literacy due to the arrival of the internet, and hated the lack of authority, accuracy, and relevance that Wikipedia represented.”
Today, it’s seen as an opportunity. The citation requirement for Wikipedia encourages students to dive into their libraries, and to make use of their access to academic resources and databases.
Communicating for Wikipedia’s enormous readership means thinking carefully about how to express the knowledge a student has learned. The warning that Wikipedia is not a reliable source is an invitation to develop media literacy skills. Instead of writing off Wikipedia, instructors have students write on Wikipedia. And they ask students to explore what goes into a quality reference.
As we enter 2016
As of today, US and Canadian students have edited about 31,000 articles, including 3,700 new articles, through the Wiki Education Foundation’s classroom program (Diana Strassmann is the board chair of this organization, and Bob Cummings is a board member).
After 15 years of Wikipedia’s use in education, the stage has been set in academia to make the next step possible: The Wiki Education Foundation has partnered with the Simons Foundation and Google to connect classroom instructors and Wikipedia volunteers on a common mission. The Wikipedia Year of Science 2016 is a site-wide and multidisciplinary campaign to improve science content on Wikipedia. Students learn science communications skills under the supervision not only of their instructor, but of the entire community of Wikipedia readers and editors.
January 15 marks the 15th birthday of Wikipedia. As the connections between Wikipedia and academia develop, we believe Wikipedia’s best is yet to come.
As Wikipedia marks its 15th anniversary, its community is celebrating with nearly 150 events on six continents. Meanwhile, the Wikimedia Foundation is announcing an endowment to sustain Wikipedia for the future. Photo by Tanya Habjouqa, freely licensed under CC BY-SA 3.0 IGO.
This Friday marks the 15th anniversary of Wikipedia, the world’s free encyclopedia that anyone can edit. This week, we celebrate not just Wikipedia, but the birth of an idea: that anyone can contribute to the world’s knowledge. Globally, readers and editors are coming together to celebrate, with nearly 150 events across six continents. From editing marathons in Bangladesh and lectures in Switzerland, to picnics in South Africa and a conference in Mexico, the world is celebrating the joy of knowledge.
As part of this milestone, the Wikimedia Foundation is pleased to announce the Wikimedia Endowment, a permanent source of funding to ensure Wikipedia thrives for generations to come. The Wikimedia Endowment will empower people around the world to create and contribute free knowledge, and share that knowledge with every single human being. Our goal is to raise $100 million over the next 10 years. The Endowment has been established, with an initial contribution by the Wikimedia Foundation, as a Collective Action Fund at the Tides Foundation.
Wikipedia launched on January 15, 2001 with a bold vision: a world in which every single human being can freely share in the sum of all knowledge. At the time, the idea that people around the world would collaborate to build an encyclopedia—for free—seemed unbelievable. Since then, Wikipedia has grown to more than 36 million articles in hundreds of languages, used by hundreds of millions of people all over the world. Wikipedia and its sister projects are still built by volunteers around the world: each month, roughly 80,000 volunteer editors contribute to Wikimedia sites.
“Wikipedia challenged us to rethink how knowledge can be gathered and shared.” said Wikipedia founder Jimmy Wales, “Knowledge is no longer handed down from on high, instead it is freely shared by everyone online. Wikipedia seemed like an impossible idea at the time—an online encyclopaedia that everyone can edit. However, it has surpassed everyone’s expectations over the past 15 years, thanks to the hundreds of thousands of volunteers around the world who have made Wikipedia possible.”
We’re celebrating Wikipedia’s global community with a commemorative website and week-long campaign, collecting and sharing the stories of individuals and organizations that have helped develop Wikipedia into the world’s largest collection of collaboratively created free knowledge. These stories show the truly global nature of the Wikimedia community: from Ziyad Alsufyani, a medical student at Taif University in Ta’if, Saudi Arabia who has been editing the Arabic Wikipedia since 2009, to Susanna Mkrtchyan, a professor and devoted grandmother working to give Armenian students better educational opportunities. We will continue to collect stories throughout the month of January.
Today, we celebrate all of the projects, partnerships, events, and joy the Wikimedia movement has inspired over the past 15 years, with many still to come. Wikipedia is much more than a website. Wikipedia and its sister Wikimedia projects represent a global, ever-expanding resource and community for free knowledge. Here are just a few examples:
Wikipedia started in January 2001 in English, but soon expanded to other languages—within the first year, it grew to 18 languages. Today, it is available in nearly 300.
Volunteers constantly edit and improve Wikipedia. Every hour, roughly 15,000 edits are made to Wikipedia. Every day, around 7,000 new articles are created.
Wikipedia became one of the top 10 websites in the world in 2007, and the only non-profit website anywhere near the top.
It’s not just Wikipedia. There are 11 other Wikimedia free knowledge projects, including Wikimedia Commons, with more than 30 million freely licensed images, as well as Wiktionary, Wikisource, Wikivoyage, and more.
The Wikimedia community supports global projects that spread the joy of knowledge. Wiki Loves Monuments, a global photo competition, launched in 2010 to document images of cultural heritage. In 2011, the contest was named the largest photo competition in the world. Companion projects like Wiki Loves Earth, Wiki Loves Africa, and even Wiki Loves Cheese document more knowledge from around the globe.
Volunteers around the world have built hundreds of partnerships with galleries, libraries, museums to make institutional collections more broadly available. These partnerships have contributed to more than 1.5 million images of cultural works on the Wikimedia projects.
If you’d like to help celebrate Wikipedia’s 15th anniversary, you can share on social media what Wikipedia means to you by tagging @Wikipedia and using the hashtag #wikipedia15. To learn more about Wikipedia and the joy it inspires, visit 15.wikipedia.org.
Katherine Maher, Chief Communications Officer
Wikimedia Foundation
About the Wikimedia Endowment
The purpose of the Wikimedia Endowment is to serve as a perpetual source of support for the operation and activities of Wikipedia and its sister projects. It will empower people around the world to create and contribute free knowledge, and share that knowledge with every single human being. The Endowment has been established, with an initial contribution by Wikimedia Foundation, as a Collective Action Fund at Tides Foundation. Tides is a public charity with a 40-year track record of holding and managing charitable funds for nonprofit organizations. An Advisory Board, nominated by the Wikimedia Foundation and appointed by Tides, will make recommendations to Tides related to the Endowment. Tides or the Wikimedia Foundation may choose to transfer the Endowment from Tides to the Wikimedia Foundation, or other charities identified by the Wikimedia Foundation. At that point, the Endowment would continue to be a permanent, income-generating fund to support the Wikimedia projects.
Only two weeks of 2016 have already passed and there are months to pass till September when Wiki Loves Monuments 2016 starts. We hope that our participants will stay tuned!
Right now we continue pleasing your eyes with great images from the contest. Here are winning photos from Wiki Loves Monuments 2015 in Albania & Kosovo national round.
Together with several others Michael Milo authored a book about mental health. Adding humans to Wikidata is trivially easy, Mr Milo is relevant as an author of the book so that explains the initial notability.
Given that mental health is not well represented on any Wikipedia, it helps when Wikidata has at least some information. Asking Mr Milo was easy; he can be found on social media, So the source of the information on his profile came in the form of personal communication.
A picture paints a thousand words and Mr Milo promised to upload a photo to Commons. Once we have similar information about all the authors of the book, it becomes easier to write about them on Wikipedia. All the authors are notable in their own right. It just takes someone to write or improve their articles. Thanks, GerardM
Keeping your MediaWiki site up to date with the latest version is, like many sysadmin tasks,
a never-ending chore. In a previous article I covered how to
upgrade minor revisions of MediaWiki with patches. In this one, I'll cover
my solution to doing a "major" upgrade to MediaWiki. While the official upgrade instructions are good, they don't cover everything.
MediaWiki, like Postgres, uses a three-section version number in which the first two
numbers combined give the major version, and the number on the end the revision of
that branch. Thus, version 1.26.2 is the third revision (0, then 1, then 2) of the
1.26 version of MediaWiki. Moving from one major version to another (for example 1.25
to 1.26) is a larger undertaking than updating the revision, as it involves significant
software changes, whereas a minor update (in which only the revision changes) simply
provides bug fixes.
The first step to a major MediaWiki upgrade is to try it on a cloned, test version of your wiki.
See this article on how to make such a clone. Then run through the steps below to find any problems
that may crop up. When done, run through again, but this time on the actual live site.
For this article, we will use MediaWiki installed in ~intranet/htdocs/mediawiki, and
going from version 1.25.3 to 1.26.2
Preparation
Before making any changes, make sure everything is up to date in
git. You do have your MediaWiki
site controlled by git, right? If not, go do so right now. Then check you are on the main branch
and have no outstanding changes. It should look like this:
$ cd ~/htdocs/mediawiki
$ git status
# On branch master
nothing to commit, working directory clean
Download
Time to grab the new major version. Always get the latest revision in the current
branch. For this example, we want the highest in the 1.26 branch, which as of this
writing is 1.26.2. You can always find a prominent link on mediawiki.org. Make sure you
grab both the tarball (tar.gz) and the signature (.tar.gz.sig) file, then use gnupg to verify it:
$ wget https://releases.wikimedia.org/mediawiki/1.26/mediawiki-1.26.2.tar.gz
$ wget https://releases.wikimedia.org/mediawiki/1.26/mediawiki-1.26.2.tar.gz.sig
$ gpg mediawiki-1.26.2.tar.gz.sig
gpg: assuming signed data in `mediawiki-1.26.2.tar.gz'
gpg: Signature made Sun 20 Dec 2015 08:13:14 PM EST using RSA key ID 23107F8A
gpg: please do a --check-trustdb
gpg: Good signature from "Chad Horohoe <[email protected]>"
gpg: aka "keybase.io/demon <[email protected]>"
gpg: aka "Chad Horohoe (Personal e-mail) <[email protected]>"
gpg: aka "Chad Horohoe (Alias for existing email) <[email protected]>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 41B2 ABE8 17AD D3E5 2BDA 946F 72BC 1C5D 2310 7F8A
Copy the tarball to your server, and untar it in the same base directory as your
mediawiki installation:
$ cd ~/htdocs
$ tar xvfz ~/mediawiki-1.26.2.tar.gz
Copy files
Copy the LocalSettings.php file over, as well as any custom images (e.g. the logo, which I like
to keep nice and visible at the top level):
Setup the images directory. The tarball comes with a dummy directory containing a few unimportant files. We want to replace
that with our existing one. I keep the images directory a level up from the actual mediawiki
directory, and symlink it in. This allows for easy testing and upgrades:
$ cd ~/htdocs/mediawiki-1.26.2
$ rm -fr images/ ## Careful, make sure you are in the right directory! :)$ ln -s ../images/ .
Copy extensions
Now it is time to copy over the extensions. MediaWiki bundles a number of extensions in
the tarball, as they are considered "core" extensions. We do not want to overwrite these
with our old versions. We do want to copy any extensions that exist in our old
mediawiki directory, yet not in our newly created one. To help keep things straight and
reduce typing, let's make some symlinks for the existing (old) MediaWiki and for the
current (new) MediaWiki, naming them "aa" and "bb" respectively. Then we use "diff" to help
us copy the right extensions over:
$ cd ~/htdocs
$ ln -s mediawiki aa
$ ln -s mediawiki-1.26.2 bb
## Visually check things over with:$ diff aa/extensions bb/extensions | grep 'Only in aa' | awk '{print $4}' | more
## Do the copying:$ diff aa/extensions bb/extensions | grep 'Only in aa' | awk '{print $4}' | xargs -iZ cp -r aa/extensions/Z bb/extensions/Z
Extensions may not be the only way you have modified your installation. There could
be skins, custom scripts, etc. Copy these over now, being sure to only copy what is
truly still needed. Here's one way to check on the differences:
$ cd ~/htdocs
$ diff -r aa bb | grep 'Only in aa' | more
Check into git
Now that everything is copied over, we can check the 1.26.2 changes into git. To do
so, we will move the git directory from the old directory to the new one. Remember to let anyone who might
be developing in that directory know what you are doing first!
$ mv aa/.git bb/
## Don't forget this important file:$ mv aa/.gitignore bb/
$ cd mediawiki-1.26.2
$ git add .
$ git commit -a -m "Upgrade to version 1.26.2"
$ git status
# On branch master
nothing to commit, working directory clean
Extension modifications
This is a good time to make any extension changes that are needed for the new version.
These should have been revealed in the first round, using the cloned test wiki. In our case,
we needed an updated and locally hacked version of the Auth_remoteuser extension:
$ cd ~/htdocs/mediawiki-1.26.2/extensions
$ rm -fr Auth_remoteuser/
$ tar xvfz ~/Auth_remoteuser.tgz
$ git add Auth_remoteuser
$ git commit -a -m "New version of Auth_remoteuser extension, with custom fix for wpPassword problem"
Core modifications
One of the trickiest part of major upgrades is the fact that all the files are simply replaced.
Normally not a problem, but what if you are in the habit of modifying the core files because sometimes
extensions cannot do what you want? My solution is to tag the changes prominently - using a PHP comment
that contains the string "END POINT". This makes it easy to generate a list of files that may
need the local changes applied again. After using "git log" to find the commit ID of the 1.26.2
changes (message was "Upgrade to version 1.26.2"), we can grep for the unique string and
figure out which files to examine:
$ git log 1a83a996b9d00444302683fb6de6e86c4f4006e7 -1 -p | grep -E 'diff|END POINT' | grep -B1 END
diff --git a/includes/mail/EmailNotification.php b/includes/mail/EmailNotification.php
- // END POINT CHANGE: ignore the watchlist timestamp when sending notifications
- // END POINT CHANGE: send diffs in the emails
diff --git a/includes/search/SearchEngine.php b/includes/search/SearchEngine.php
- // END POINT CHANGE: Remove common domain suffixes
At that point, manually edit both the new and old version of the files and make the
needed changes. After that, remember to commit all your changes into git.
Final changes
Time to make the final change, and move the live site over. The goal is to minimize the downtime,
so we will move the directories around and run the update.php script on one line. This is an excellent
time to notify anyone who may be using the wiki that there may be a few bumps.
## Inform people the upgrade is coming, then:$ mv mediawiki old_mediawiki; mv mediawiki-1.26.2 mediawiki; cd mediawiki; php maintenance/update.php --quick
$ rm ~/htdocs/aa ~/htdocs/bb
Testing
Hopefully everything works! Time to do some testing. First, visit your wiki's Special:Version page and
make sure it says 1.26.2 (or whatever version you just installed). Next, test that most things are still
working by:
Logging in, and...
Editing a page, then...
Upload an image, plus...
Test all your extensions.
For that last bullet, having an extension testing page is very handy. This is simply an unused page on the
wiki that tries to utilize as many active extensions as possible, so that reloading the page should quickly
allow a tally of working and non-working extensions. I like to give each extension a header with its name,
a text description of what should be seen, and then the actual extension in action.
That's the end of the major upgrade for MediaWiki! Hopefully in the future the upgrade process will
be better designed (I have ideas on that - but that's the topic of another article). One final check you can do is to
open a screen and tail -f the httpd error log for your site. After the upgrade, this is a helpful
way to spot any issues as they come up.
Talk to your librarians about Wikipedia 15. Photo by Joi, freely licensed under CC BY 2.0.
It’s important to recognize the lasting impact of Wikipedia on the online research environment: Wikipedia has become the default location for every type of researcher, both casual and professional, to start their research. After 15 years, Wikipedia has over 35 million articles in hundreds of languages, many of which have references and external links that guide researchers to authoritative sources about the topics they are researching.
Without a doubt, this makes Wikipedia one of the most important research tools in the world. It’s the largest hand-curated annotated bibliography ever, and is one of the biggest referrers to scholarly publications and one of the biggest sources for readers of medical information. However, Wikipedia’s strengths sometimes hide its systematic gaps and failings—there are many pieces of information on Wikipedia that can’t be verified by a source, or are missing because of our community’s systemic biases.
The libraries community has a huge number of opportunities to help solve these gaps, from educating their patrons about how Wikipedia works to curating local cultural heritage knowledge.
Librarians are a diverse community, working with diverse patrons. However, there is one skill which every librarian has been trained in: helping answer research and reference questions. The skill of finding references for patrons and public researchers has great value to Wikipedia.
The Wikipedia Library, a program focused on improving Wikipedia’s research, is asking librarians all over the world to take 15 minutes to celebrate Wikipedia’s 15th birthday, by adding one more reference to Wikipedia. Join the campaign!
Every Wikipedia page needs more citations to authoritative sources to become a better starting point for researchers. We ask the library community,to take responsibility for these gaps and join in the campaign.
If you help organize a library coalition, network, or consortium, we invite you to sponsor this campaign by participating in our social media—using the hashtags #Wikipedia15 and #1Lib1Ref—and encouraging your members to participate!
You can learn more about engaging your library network.
Recovery is not a method. According to Anthony it is a vision. Anthony describes the origin of recovery and it has not much to do with the 12 step program of Alcoholics Anonymous like the Wikipedia article says.
Even when you know all these things, starting to make changes is a daunting prospect. The article does not invite you learn more about recovery, it lacks focus and trivia like a possible etymology detracts from the subject. It becomes too long to read.
However, changing a Wikipedia article is frightening. Even when information is manifestly wrong, there is all the baggage required before you can edit an article. When I wanted to "fix" an incorrect link, I wanted to preserve the fact that the link was not a red link. I removed the #redirect and it was "wrong". I wrote a minimal stub, I even included a source and it was "wrong". The article is destined for deletion so the problem is likely to remain. Apparently nobody cares, I informed about the issue on talk pages; it took me already too much ..
I remember the days when it was welcomed to be brave.
The coverage of mental health is sub standard. My question is how to improve this. The question why it needs improvement is easy and obvious. The prevalence of mental health is such that high quality information is extremely relevant. When people are to recover, they have to rely largely on their own resources, their own ability, their own sense of self.
Working on Wikipedia articles is not my cup of tea. Working towards an editathon on the subject is. It will still have the same amount of stress but it will not be my stress. It will be real Wikipedians and people knowledgeable on the subject who will together make a difference. Thanks, GerardM
I’m excited to welcome the Wiki Education Foundation’s first research intern. Kevin Schiroo’s internship will focus on data science research that will help us understand the programmatic impact we have on Wikipedia content. Kevin’s project involves investigating what broad categories content development is happening in on Wikipedia, and investigating which categories our student editors are having the biggest impact in. This data will be particularly helpful as we kick off the Year of Science this year.
Kevin is pursuing his Ph.D. in Computer Science at the University of Minnesota. Since May, he’s been a graduate research assistant at the University of Minnesota’s GroupLens lab, investigating the relationship between Wikipedia editors’ workflow and time contributions, specifically focusing on how these contributions lead to varying improvements in article quality based on different characteristics of articles. Kevin has a B.A. from Coe College and has previously worked for IBM and interned for the Department of Defense.
Outside of work, Kevin enjoys coding on several side projects as well as woodworking and larger construction projects. He is currently in the process of building a wooden chest starting from a fallen tree as well as finishing up a garage that he built this summer. The furthest he has ever been in over his head in his projects is 15 feet below the street pavement trying to fix a sewer line.
Kevin will be working with Wiki Ed part time through the spring term, then full time during the summer. Welcome, Kevin!
Getting oodles of money to improve search is wonderful. A lot of good work will likely be done. It is also a great moment to consider existing questions that have not been resolved. It is all about having no result and how it might be improved giving the unique opportunities of the Wikimedia projects.
The first thing to consider is a favourite story of the first chairperson of the Dutch chapter. He volunteered at a library and people complained that they could not find what they were seeking for. He tracked what people could not find and realised that there is a people problem; they cannot spell. He aggregated the numbers and included a redirect for the really problematic words. People were happy. Librarians were happy because they now had a way of divining what books to buy on what topic; they knew what people were looking for and the library could not provide.
This is also possible for a Wikimedia search. When we know what people are looking for and cannot find, we can provide information anyway. The information on Mrs Boevink has been fleshed out but it only exists on Wikidata. When it has a Wikipedia article, it may not be in your language. It can already be provided as a search result but that means that "other" projects are taken seriously.
When a subject is not found at all, it means that people have an interest. It means that an article on such a subject will actually be read. This is a metric that people would enjoy; how well read are the articles that I started. What impact do I have. It is only a thought but it will motivate people to write Wikipedia articles. This is particularly important to the small Wikipedias. When they write what people want to read, it will prove a process that positively affects the community.
So oodles of money have come in. Now lets be serious of finding Waldo. Thanks, GerardM
The one thing that makes Wikipedia strong are its wiki links. When they work they are great.. when they work they are. The article on the Spearman Medal is a case in point. This medal is conferred by the British Psychological Society to psychologists. There were 19 links and two were wrong. One link was to a soccer and one to a football player. The award is conferred since 1965 so there ought to be quite a number of red links With two sportsmen attributed to winning the Spearman Medal there was an error rate of 20%. With all the red links it is easy to be more informative using Wikidata. With such statistics it is obvious to make the argument that replacing links with links through Wikidata will enhance quality in the English Wikipedia. This is unlikely to happen. Wikipedians seem to be more concerned in finding fault elsewhere than considering the quality of their own project. Particularly when "outsiders" point them to the error in their ways. It is psychology in action. Thanks, GerardM
I’ve pretty much finished moving a set of ‘template’ Nyunga-language Wiktionary entries into my userspace on Wiktionary, from where they can be copied into mainspace. There are a few dramas with differing character-sets between definitions in some of the word lists I’ve got, so a couple of letters are missing. There’s plenty that are there though, and mainly I’m interested now to see if this idea of copying, pasting, and then copy-editing these entries is going to be a sensible workflow.
I thought about bulk importing these directly into place, but the problem with that is (quite apart from the first fact that none of these wordlists have machine-readable part-of-speech data) that almost all of them are going to need cleaning up and improving. For example, “kabain nin nana kulert” is in there as an entry. It means “perhaps someone ate it and went away”, and (I’m guessing) isn’t an idiom and so really oughtn’t have it’s own entry. It can however be used as a citation in every single one of its constituent words. That’s something that I think is best left up to a human, rather that forcing a human to clean up a bot’s mistakes. Or take “tandaban” which has a definition of “jump, to [9]” (and the square bracket references are throughout this dataset and are not explained anywhere that I’ve been able to find). This should just be translated as “jump” with a link to the English verb; again, a script could handle that, but the myriad of incoming formats would take too much time to code.
Maybe I’m just not being clever enough about preparing the data, and an import script, in a rich enough way. But that could take ages before ever this data sees the light of day on Wiktionary; the approach I’ve used means that it’s there now for anyone who wants to work with it. There are also so very many improvements that a human editor can make along the way, that it seems we’ll have better data for fewer words… and that seems to be the correct trade-off. Wiktionary is a ‘forever’ project, after all!
Of course, the plan is to be able to extract the data after it’s been put in its proper place, and I’ve started work on a PHP library for doing just that. I’d rather do the code-work on that end of it, and put in the time for a human-mediated import at the beginning end.
All of this is a long-winded way of putting out there on the web, in this tiny way, an invitation for anyone to come and help see if this import is going to work at all! Will you help?
A lot happened at the Wikimedia Developer Summit over the past week, I had a fantastic time and enjoyed getting to meet up with everyone again. Here's a quick recap:
Learned about the status of the Differential migration, I feel more reassured about the workflow now, just not sure about when it's going to happen.
Attended a very productive meeting with the MediaWiki Stakeholder's group, Mark (hexmode) has written up a good summary of the meeting. I'm optimistic about the future.
Held an impromptu session about shadow namespaces, which left me with lots of questions to answer. I haven't had a chance to summarize the notes yet, will do so later this week.
Had an exciting main room discussion about supporting non-Wikimedia installs of MediaWiki and our other software, which continued out into the hallways. I think we need to continue with more research and talking with hosting providers about how MediaWiki is actually used. For a while now I've been concerned with whether we're able to get our users to actually upgrade. WikiApiary says that 1.16.x is nearly as widely used as 1.25.x (the current legacy release).
Had an early morning session about beta rollouts, usage of BetaFeatures, and communication channels.
Attended the "software engineering" session about dependency injection and then SOA. I mainly just listened in this one.
Went to a session by community liasons about interacting with communities and stuff. Also mainly just listened.
Finally, had a really productive session led by bawolff about code review, and how we can improve the situation.
I hacked on quite a few different projects, more on that later :)
It’s been a while that I presented a selection of my favorite new Wikipedia images here. However, over the past 12 months, volunteers around the world have been out in the field shooting nature and wildlife. By uploading them under a free license, these volunteers made their work available to a broad public. The results […]
"Recovery is a deeply personal, unique process of changing one’s attitudes, values, feelings, goals, skills and/or roles. It is a way of living a satisfying, hopeful, and contributing life even with limitations caused by the illness. Recovery involves the development of new meaning and purpose in one’s life as one grows beyond the catastrophic effects of mental illness."
Recovery in psychiatry is proven effective. Mr Anthony does not have an article in any Wikipedia and, that can easily be understood as being part of the stigma that the subject holds.
In constructing a Wikidata item for Mr Anthony, there is the website on the Centre of Psychiatric Rehabilitation, it is part of Boston University. It has a page on Mr Anthony and it is definitely a source. Many publications and awards are listed and as you may expect many of those awards have no articles either.
In order to honour Mr Anthony, the awards had to be added. Several of the organisations that conferred the award had to be added as a consequence. That is where the page on Mr Anthony becomes a problem. A single reference to an award is not that great. But what to do if it is hard to find on the Internet? Thanks, GerardM
Wikidata provides free and open access to entities representing real world concepts. Of course Wikidata is not meant to contain every kind of data, for example beer reviews or product reviews would probably never make it into Wikidata items. However creating an app that is powered by Wikidata & Wikibase to contain beer reviews should be rather easy.
A base data set
I’m going to take the example of beer as mentioned above. I’m sure there are thousands if not millions of beers that Wikidata is currently missing, but at the time of writing this there are 958 contained in the database. These can be found using the simple query below:
Any application can use data stored within Wikidata, in the case of beers this includes labels and descriptions in multiple different languages, mappings to wikipedia articles and external databases for even more information, potential images of said beer, the type of beer and much more. Remember the Wikidata dataset is ever evolving and the IDs are persistent.
Application specific data
Lets say that you want to review the beers! You could set up another Wikibase installation and SPARQL endpoint to store and query review and rating information. Wikibase provides an amazingly flexible structure meaning this is easily possible. Reviews and ratings could be stored as a new entity type, linking to an item on Wikibase or an item could be created mapping to a Wikidata item containing statements of review or rating data. Right now documentation is probably lacking but this is all possible.
Of course I am shouting about Wikibase first as Wikidata is powered by it and thus integration should be easier, however there is no reason that you couldn’t use any other database mapping your application specific information to Wikibase item Ids. MusicBrainz is already doing something like this and I am sure there are other applications out there too!
Sharing of knowledge
Knowledge is power, Wikipedia has proven that free and open knowledge is an amazing resource in an unstructured text form. Wikidata is a step up providing structured data. Imagine a world in which applications share basic world information building a dataset for a common worldwide goal. In the example above, add an image of a beer in one application, have it instantly available in another application, translate a description for one user and have it benefit millions.
The Wikimedia Developer Summit is an event with an emphasis on the evolution of the MediaWiki architecture and the Wikimedia Engineering goals for 2016. Last year the event was called the MediaWiki Developer Summit.
As with last year the event took place in the Mission Bay Center, San Francisco, California. The event was slightly earlier this year, positioned at the beginning of January instead of the end. The event format changed slightly compared with the previous year and also included a 3rd day of general discussion and hacking in the WMF offices. Many thanks to everyone that helped to organise the event!
I have an extremely long list of things todo that spawned from discussions at the summit, but as a summary of what happened below are some of the more notable scheduled discussion moments:
T119032 & T114320 – Code-review migration to Differential
Apparently this may mostly be complete in the next 6 months? Or at least migration will be well on the way. The Differential workflow is rather different to that which we have be forced into using with Gerrit. Personally I think the change will be a good one, and I also can not wait to be rid of git-review!
T119403 – Open meeting between the MediaWiki Stakeholders Group and the Wikimedia Foundation
There was lots of discussion during this session, although lots of things were repeated that have previously been said at other events. Toward the end of the session it was again proposed that a Mediawiki Foundation of some description might be the right way to go and it looks as if this might start moving forward in the next months / year (see the notes).
Over the past years Mediawiki core development has been rather disjointed due to the WMF assigning a core team, dissolving said core team and thus responsibilities have been scattered and generally unknown. Having a single organization to concentrate on the software, covering use cases the WMF doesn’t care about could be a great step forward to Mediawiki.
The notes for this session can be found here and covered many RFCs such as multi-content revisions, balanced templates and general evolution of content format. Lots of super interesting things discussed here and all pushing Mediawiki in the right direction (in my opinion).
T113210 – How should Wikimedia software support non-Wikimedia deployments of its software?
Notes can be found here. Interesting points include:
“Does MediaWiki need a governance structure outside of Wikimedia?” which ties in with the stakeholders discussion above and a potential Mediawiki foundation.
“How can we make extension compatibility work between versions?”. Over the past year or so some work has gone into this and progress is slowly being made with extension registration in Mediawiki and advances in the ExtensionDistribution extension. Still a long way to go.
“Should Wikimedia fork MediaWiki?”. Sounds like this could get ugly :/
“Do we need to stick with a LAMP stack? Could we decide that some version in the not distant future will be the last “pure PHP” implementation?”. I can see lots of the user base being lost if this were to happen..
Were you one of the people contributing a pageview to these articles? Photo by Runner1616, freely licensed under CC BY-SA 3.0.
2015 was the year of the movie on the English-language Wikipedia, with nine film-related articles in the top 25—and one climbing all the way up to #3 in the last two months of the year.
In contrast, only seven film articles appeared in the top 25 in 2013 and 2014 combined.
Early on January 6, pageview data for the English-language Wikipedia’s most popular articles of 2015 came in. Film led any other category by large margins: Star Wars: The Force Awakens came in at #3 with 23.5 million views, followed by Avengers: Age of Ultron at #9 with 17.4 million hits.
Force Awakens was a popular article throughout the year, but interest in the topic skyrocketed around the movie’s December release. As a topic, Star Warsactually took 11 of the top 25 slots in the week before its release and 10 of 25 in the week after; the article on the overall franchise was #13.
Other 2015 movies slotting in the top 25 were Furious 7 (#15), Jurassic World (#16), and Fifty Shades of Grey (#18)—something that likely come as little surprise to avid film fans, as Jurassic, Star Wars, Avengers, and Furious are, respectively and as of publishing time, numbers three through seven on Wikipedia’s list of highest-grossing films of all time. While Fifty Shades of Grey (film) actually received 7,073,570 views, it appears that an even greater number of people either accidentally navigated to or wanted to read about the book.
The annual list of Bollywood films, a perennial favorite, hit #7, and two individuals important in film made the list as well. Chris Kyle’s biographical film American Sniper was released in the United States at the tail end of 2014, and the resulting controversy clearly carried over into 2015; Kyle’s article was viewed nearly 27.8 million times. Paul Walker (#24) died in a car crash during during the filming of Furious 7.
Television only factored with Game of Thrones (#12) and the drug cartel kingpin Pablo Escobar (#19), who was depicted in Netflix’s Narcos and was the most popular article on the English Wikipedia for five of six weeks in September.
These stats are very different from the two other full years of pageview data we have. In 2013, the top 25 had seven TV series: Breaking Bad, Walking Dead, Game of Thrones, Big Bang Theory, Arrow, How I Met Your Mother, and Doctor Who. By 2014, this dropped to four shows: two different Games of Thrones articles, Walking Dead, True Detective, and Agents of SHIELD.
In neither year did any film, aside from the annual lists of film and Bollywood films, feature in the top 25.
Are we seeing a shift back to film? Is the so-called “golden age” of 2000s TV coming to an end? Has Hollywood perfected the art of a hit franchise film? Is this an anomaly year with an unusual number of attention-grabbing blockbusters? Is this an artifact of the data that we’re overinterpreting?[1] Only time will tell.
Oddities of interest include the first-place deaths in 2015, an article that was also the most-edited article of the year by a large margin with nearly 27.9 million views[2]—but few no individual deaths feature in the list, Chris Kyle notwithstanding. Stephen Hawking, a famous theoretical theorist, landed at #6 for 2015 thanks in large part to an extraordinary 19-week run in the weekly top 25 after the release of The Theory of Everything in 2014.
Perhaps most notable is the decidedly lacking number of current news events. The two terror attacks in Paris, for instance, were two of the most-edited articles on the English Wikipedia in 2015, they were only #533 and 680 on the raw list. Aside from ISIS, the terror group fighting for control of a swath of territory in the Middle East, the only examples come from celebrities. Donald Trump slotted in at #17: the outspoken real estate mogul and Republican candidate for President of the United States has been in the US news on an almost constant basis since he declared his intention to run for president in June. Kanye West came in at #11, although over one-third of his 16.5 million hits came in a two-weekspan after “loser.com” began redirecting to his Wikipedia article.
This data was collected and collated by researcher Andrew West, a Senior Research Scientist at Verisign Labs. You can read through weekly lists with insightful and occasionally witty commentary from Milowent and Serendipodous in the Signpost or Wikipedia’s Top 25 Report. For films specifically, Wikipedia statistics are a prime component of Variety‘s Digital Audience Ratings. For interested coders, a new pageview API is available.
On a methodological note, these entries have been screened for for spam and botnets. The raw data has been published on Wikipedia, where you will see that there are many articles with below 2% or above 95% mobile views, an almost certain indicator of false popularity. Without this check, articles like the small German town of Angelsberg—with 78.1 million views, all (100.0%) from mobile—would be featured even though readers aren’t actually looking for them.
Ed Erhart, Editorial Associate
Wikimedia Foundation
[1] One point in favor of this interpretation is that pageview numbers across years are not directly comparable, as pageview data prior to October 2014 did not include mobile readers. We assume here that even though the 2013–14 counts are not complete snapshots, the overall order would not significantly change.
Recently I have been spending lots of time looking at the Wikimedia graphite set-up due to working on Grafana dashboards. In exchange for what some people had been doing for me I decided to take a quick look down the list of open Graphite tickets and found T116031. Sometimes it is great when such a small fix can have such a big impact!
After digging through all of the code I eventually discovered the method which sends Mediawiki metrics to Statsd is SamplingStatsdClient::send. This method is an overridden version of StatsdClient::send which is provided by liuggio/statsd-php-client. However a bug has existed in the sampling client ever since its creation!
The fix for the bug can be found on gerrit and only a +10 -4 line change (only 2 of those lines were actually code).
The result of deploying this fix on the Wikimedia cluster can be seen below.
Decrease in packets when deploying fixed Mediawiki Statsd client
You can see a reduction from roughly 85kpps to 25kpps at the point of deployment. This is over a 50% decrease!
Decrease in bytes in after Mediawiki Statsd client fix deployment
A decrease in bytes received can also be seen, even though the same number of metrics are being sent. This is due to the reduction in packet overhead, a drop of roughly 1MBps at deployment.
The little things really are great. Now to see if we can reduce that packet count even more!
I’m pleased to announce three new opportunities for experienced Wikipedians to apply as Visiting Scholars: Rollins College will sponsor one remote position, and the University of San Francisco will sponsor two.
The Wikipedia Visiting Scholars program helps university libraries provide high-quality resources, such as academic databases and ebooks, to experienced Wikipedians. These Wikipedians can write even better content, and increase the impact of the sponsor library’s special holdings in a topic area of mutual interest.
Rollins College’s Department of Archives and Special Collections is looking for a Visiting Scholar to improve articles about 19th- and 20th- Century American literature, and/or the history and culture of the American South during the same period.
The The University of San Francisco’s Department of Rhetoric and Language would like a Visiting Scholar to work on articles relevant to rhetoric, literacy, and/or California social movements.
The Gleeson Library, also at the University of San Francisco, is seeking a Visiting Scholar to develop articles about social justice reformers, reform movements, and/or Ignatian/Jesuit Catholic traditions.
For more information about the Visiting Scholars program, or to apply for a position, visit our Visiting Scholars page. For additional information about Rollins College or the University of San Francisco, including an overview of library resources and additional details about these positions, see the Rollins College and The University of San Francisco Visiting Scholars pages on Wikipedia.
Support an OSM based printed map for Nicaragua [1]
About us
We wish our readers a Happy New Year! Hopefully this year comes with a lot of edits and awesome new data to be added. Additionally, we welcome Rolf (Rogehm from Germany) and Moises (@moiarscan from Spain) who helped a lot at the end of last year. We are however always looking for additional helping hands especially from Latin America. A happy new Year in the OSM-World 2016!
You like OSM? you like working in an international team? You speak English and Spanish? You like investigation? You like the semanarioOSM? You hate typos? THEN we need YOU! We weeklyOSM needs you! We have different tasks for you: Collect Links, write articles, translate articles, review the written blog and some more.
The weeklyteam from the Czech Republic – Tomas, Vop and Marián don’t only translate the German Wochennotiz oder weeklyOSM, they add each week an interesting editorial and news from the Czech Republic and from Slovakia. Read an example the automatic translation of weeklyOSM 283: Deutsch — English — Español.
Mapping
The deletion of many ski drag lifts in Austria led to a forum discussion (translation) about the best way to map them.
User mikelmaron learned that several houses will be constructed in his neighbourhood. As the space which will host the new houses is limited, mikelmaron has some thoughts on how the district will look like in the future.
A IPv6 problem of a JOSM user lead to a long discussion about the misbehaviour of JOSM (or Java) wrt. IPv6.
On January 2nd 2016 Grant Slater activated the Visual Editor on the OSM wiki. This kind of WYSIWYG editor allows to edit the wiki without fiddling with wiki syntax.
[1] The Nicaraguan OSM community wants to print the first public transportation map of the capital’s 42 bus routes in Managua and is asking for your support. The data has been crowd-sourced by more than 150 volunteers using over OpenStreetMap.
Imports
Michael Gratton wonders how can he map residential and commercial property boundaries found in the NSW LPI data (Australia).
Humanitarian OSM
Frederik Ramm (User: woodpeck) gave a short interview to Deutschlandradio Kultur (German cultural radio) about the subject of mapping of refugee camps, for examples the one in Dabaab (Kenya).
The MapKibera project, started in 2009, made its mission to map the infrastructure in Nairobi and to communicate it. They intend to accomplish their mission by giving training and orientation about different topics (GPS, internet, communication services) to the population of the biggest slum in Nairobi. This information resulted in maps, info boards, community achievements, safety, cleanliness and above all respect. The contact to the local authorities was also achieved. Different single working groups came together. The empty space which was on the map disappeared and the people in the region have now, thanks to OSM , a better life!
Missing maps has been hosting multiple mapthons some of which have focused on mapping specifically for resilience projects in Ecuador and Colombia. There will also be a local Mapathon in Columbia hosted by Humberto Yances Humberto Yances in Bogotá on 20th January.The GIS team from the American Red Cross will be traveling to Ecuador in the first two week of February to map in Haquillas in the Provice El Oro together with the Ecuadorianian Red Cross.This will be followed by two weeks of community mapping in La Guajira specially Riohacha and Nazareth in February.
The Salzburg geoinformatics community supports international aid organizations by their efforts to supply and care about refugees. This is done by automatically counting tents and analyzing satellite images to help finding suitable drilling spots for water wells.
Maps
Tristram Gräbener found out, that although all Roads led to Rome, all roads start from Notre Dame in Paris. (… at least all roads to destinations in France ) Tristram has some experiences with routing, so he describes how to visualize that all roads in France start here “au point zero des routes de France”. Fits to: Steve Coast’s launched campaign on Kickstarter.
SMOG or fresh air? The first worldwide Air-Map shows the quality of air in 206 cities in 40 countries around the globe, with somehow terrifying results.
switch2OSM
The trip planner from the Dutch railway company (Nederlandse Spoorwegen or NS) now uses OpenStreetMap based map.
Developer Tim Teulings (aka Framstag) has written a review about the framework Libosmscout for the passed year 2015 and a prospect about new features to come in the year 2016. Also there is News about the automatic build process of Libosmscout.
The fresh released LunaRender is aiming at ad-hoc rendering of OSM XLM data to SVG, see the OSM wiki for details. Examples are presented in the author’s blog. It is based on the lua programming language, binaries for Windows OS are ready to download.
About the OSM editor Merkaartor, core developer Krakanos is looking for support to publish this program also on Apple OSX. Besides he reports about recent status and aims for a planned version 0.19.
Nathanael Long is working on his bachelor thesis for new approaches for pedestrian routing. So for example sidewalks should be derived from conventional OSM data without having to create hundreds of new footways.
Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please don’t forget to mention the city and the country in the calendar.Please use [edit blog detail] to enter category
This weekly was produced by Alexandre Magno, Hedaja, Peda, Rogehm, Rub21, SomeoneElse, bogzab, Manfred, escada, jinalfoflia, stephan75, wambacher, widedangel.
A paper of mine published in 2014 started with this simple (but interesting, I think) question ;)
As you might know, Wikipedia is not available only in English but there are almost 300 Wikipedias written in other languages.
So what we did? We computed the percentage of females and males among registered users on 289 language editions of Wikipedia.
The pdf of “Gender Gap In Wikipedia Editing: A Cross-Language Comparison” is available for you to read.
But I suggest you to try to answer to the following questions before reading the answers (which are in the paper) so that you might play a bit with your stereotypes and prejudices about culture and women around in the world ;)
1) Which language edition of Wikipedia has the largest percentage of registered users setting their gender as female? What is this percentage? It is more or less than 50%?
And 2) what is the language of the Wikipedia with the smallest percentage of women? How close to 0% might this be …?
3) Try to order the following language editions of Wikipedia from the largest percentage of female registered users to the smallest: Arabic, Bulgarian, Catalan, Chinese, French, German, Hindi, Japanese, Korean, Persian, Swedish, Thai. Where does the largest Wikipedia (the English one) is placed?
4) Moreover, considering that setting the gender on Wikipedia is optional and actually few users do it (see details in the paper). Which percentage of users set their gender on English Wikipedia? What is the Wikipedia in which most users set their gender? What is this percentage?
Note that, as written in the paper, of course languages do not map directly to countries. For example, Spanish Wikipedia is heavily edited from Spain but also Latin America and a similar point can be made from Arabic Wikipedia. India has many official languages Hindi, Bengali, Malayalam, Tamil, Marathi but also English. On the other hand, Italian Wikipedia or Catalan Wikipedia are much more “localized”.
Note also that in the paper we arbitrarily decided to consider only editions with at least 20.000 registered users since we computed percentages on registered users (a Wikipedia with 2 users setting their gender would have had percentages of 0%, 50% or 100% clearly not informative) and this filtering step reduced our sample to 76 Wikipedias with a large number of registered users (at least 20.000).
Note also that data refers to March 16, 2013 but we released the Python script as open source so you can re-run it if you are curious about the current situation. You can get the script on Github.
Former Wikimedia ED Sue Gardner (right) championed strong views about restricted grants and transparency. Have those values survived into the era of Lila Tretikov (left)? Photo by Victor Grigas, licensed CC BY-SA
I wrote and edited a number of grant proposals and reports on behalf of the Wikimedia Foundation (WMF) from 2009 to 2011. In that role, I participated in a number of staff discussions around restricted grants, and transparency in the grant process. I was continually impressed by the dedication to transparency and alignment to mission and strategy.
As of 2015, however, the four people most strongly associated with those efforts at WMF have all left the organization; and I am concerned that the diligence and dedication I experienced may have left the organization along with them. Yesterday’s announcement of a $250,000 grant from the Knight Foundation increases my concern. That grant is apparently restricted to activities that are not explicitly established in any strategy document I’ve seen. It is also not specifically identified as a restricted grant.
In the WMF’s 2015-16 Annual Plan (which was open for public comment for five days in May), this phrase stood out:
Restricted amounts do not appear in this plan. As per the Gift Policy, restricted gifts above $100K are approved on a case-by-case basis by the WMF Board.
There does not appear to be any companion document (or blog posts, press releases, etc.) covering restricted grants.
When I worked for WMF, four people senior to me maintained strong positions about the ethics and mission-alignment relating to restricted grants:
Sue Gardner, Executive Director
Erik Möller, Deputy Director
Frank Schulenburg, Head of Public Outreach
Sara Crouse, Head of Partnerships and Foundation Relations
They strongly advocated against accepting restricted grants (primarily Gardner), and for publishing substantial portions of grant applications and reports (primarily Möller). At the time, although we worked to abide by those principles, we did not operate under any formal or even semi-formalized policy or process. I am proud of the work we did around restricted grants, and I benefited greatly in my understanding of how organizational needs intersect with community values. These principles influenced many activities over many years; in public meeting minutes from 2009, for instance, Gardner articulated a spending area (data centers) that would be appropriate for restricted grants.
Today, however, none of us still works for Wikimedia (though Gardner retains an unpaid position as Special Advisor to the Board Chair).
In the time since I left, there has been very little information published about restricted grants. The English Wikipedia article about the Wikimedia Foundation reflects this: it mentions a few grants, but if I’m not mistaken, the most recent restricted grants mentioned are from 2009.
Restricted grants can play a significant role in how an organization adheres to its mission. Last year, Gardner blogged about this, advocating against their use. While her observations are valuable and well worth consideration, I would not suggest her view settles the issue — restricted grants can be beneficial in many cases. But irrespective of her ultimate conclusion, her post does a good job of identifying important considerations related to restricted grants.
The principles of Open Philanthropy, an idea pioneered by Mozilla Foundation executive director Mark Surman, and long championed by Wikimedia Advisory Board member Wayne Mackintosh, align strongly with Wikimedia’s values. The Open Philanthropy doctrine emphasizes (among other things) publishing grant applications and reports and inviting scrutiny and debate.
In its grant-giving capacity, the Wikimedia Foundation appears to practice Open Philanthropy (though it doesn’t explicitly use the term). It has published principles for funds dissemination:
Protect the core
Assess impact
Promote transparency and stability
Support decentralized activity
Promote responsibility and accountability
Be collaborative and open
Those principles are not mere words, but are incorporated into the organization’s grant-giving activities. For example, the WMF’s Annual Plan program, which funds chapters and affiliates, requires applicants to submit proposals for public review for 30 days, and to make public reports on past grants. The Project and Event Grants program also requires open proposals and open reports.
But the Wikimedia Foundation appears to still lack any clear standard for transparency of the restricted grants it receives. (There is less urgency for openness in the case of unrestricted grants, which by definition do not obligate the recipient to shift its operational priorities. But conditions are sometimes attached to unrestricted or restricted grants, such as the appointment of a Trustee; these should be clearly disclosed as well.) The WMF Gift Policy merely asserts that “Restricted gifts [of $100k+] may be accepted for particular purposes or projects, as specified by the Foundation [and with Board approval].”
Addendum: I have been reminded that in November 2015, the Wikimedia Foundation’s Funds Dissemination Committee — which advises the Board of Trustees on the Annual Plan Grants mentioned above, but has no formal authority over the WMF itself — voiced strong criticism of the Wikimedia Foundation’s lack of adherence to the standards it requires of affiliates. The critique is well worth reading in full, but this sentence captures its spirit:
The FDC is appalled by the closed way that the WMF has undertaken both strategic and annual planning, and the WMF’s approach to budget transparency (or lack thereof).
In December 2015, the Wikimedia Board of Trustees removed one of its own members, Dr. James Heilman — one of the three Trustees selected by community vote. Though the full story of behind this action has not emerged, Dr. Heilman has maintained that his efforts to increase the organization’s transparency were met with resistance.
What can the WMF’s current practices around restricted grants, and grants with conditions attached, tell us about its commitment to transparency? Can, and should, its transparency around grants be improved? I believe there is much room for improvement. The easiest and most sensible standard, I believe, would be for the WMF to adopt the same transparency standards in the grants it pursues, as it requires of the people and organizations it funds.
I'd like to challenge the ok and not* assertions. I think they're a bad practice.
ok
Using ok() indicates one of two problems:
The software (or testing strategy) is unreliable. (Unsure what value to expect.)
The author is lazy and uses it as shortcut for a proper comparison.
The former necessitates improvement in the code being tested. The latter comes with two additional caveats:
Less debug information. (No actual/expected diff). Without an expected value provided, one can't determine what's wrong with the value.
Masking regressions. Even if the API being tested returns a proper boolean and ok is just a shortcut, the day the API breaks (e.g. returns a number, string, array, function, Promise or other object) the test will not catch it.
The software is unreliable. (Unsure what value to expect.)
The test uses an unreliable environment. (E.g. variable input data, insufficient isolation or mocking.)
The author is lazy and uses it as shortcut for a proper comparison.
Common example:
var index = list.indexOf( item );
// Meh
assert.notEqual( index, -1 );
// Better?
assert.equal( index, 2 );
I've yet to see the first use of ok or not* that wouldn't be improved by writing it a different way. Though I appreciate there are scenarios where notEqual can't be avoided in the short term (e.g. when the intent is to detect a change between two return values).
In January 2003 Apple announced Safari, their new web browser for Mac. [1] The Safari team had just spent 2002 building Safari atop KHTML and KJS, [2][3] the KDE layout and javascript engines developed for Konqueror. The Safari team kept the codebase quite modular. This allowed Apple-branding and other propietary features to stay separate whilst also having a sustainable open-source project (WebKit) that is standalone and compilable into a fully functional GUI application. The Mac OS version of WebKit is composed of WebCore and JavaScriptCore – the frameworks that encapsulate the OSX ports of KHTML and KJS respectively. (Apple was already using KJS in Sherlock. [3])
Chromium
In 2008 Google introduced Chrome and started the open-source project Chromium. Chromium was composed of WebKit's WebCore and the V8 javascript engine (instead of JavaScriptCore). Google later forked WebCore into Blink in 2013, thus abandoning any upstream connection with WebKit.
While Chromium is a single code-base with bindings for multiple platforms, WebKit is not. Instead, WebKit is based around the concept of ports.
These ports are manually kept in sync. Some maintained by third parties (e.g. not by "webkit.org"). Some ports are better than others. "WebKit", as such, is closer to an abstract API than an actual framework.
WebKit
A few popular ports:
Safari for Mac
Mobile Safari for iOS
Safari for Windows (abandoned)
QtWebKit (by Nokia; due to it being implemented atop Qt, it works on Mac/Linux/Windows)
Android browser (abandoned, uses Chromium now)
Chromium (abandoned, uses Blink now)
WebKitGTK+
WebKit itself doesn't do much when it comes to network, GPU, javascript, text rendering. Those are not "WebKit". Each port binds those to something present in the OS – or another application layer. E.g. QtWebKit defers to Qt, which in turn binds to the platform.
PhantomJS
PhantomJS is a headless browser using the QtWebKit engine at its core.
The current release cycle of PhantomJS (1.9.x) is based on Qt 4.8.5, which bundles QtWebKit 2.2.4, which was branched off of upstream WebKit in May 2011. Due to the many layers in between, it will take a long time for PhantomJS to get anywhere near the feature-set of current Safari 8. PhantomJS by design is nothing like Safari but, if anything, it is probably like an alpha version (branched from svn trunk) of Safari 4. Which is why, contrary to Safari 5, PhantomJS has only partial support for ES5.
Chromium has its abstraction layer at a higher level (platform independent). When run headless, it is exactly like an actual instance of Chrome on the same platform. When used in a virtual machine on a remote server, one doesn't even need to be "headless". We can use regular Chromium (under Xvfb). In theory the visual rendering through Xvfb and VM hypervisor could be different, however.
A working paper[1] in economics provides several novel results shedding light on Wikipedia’s much discussed gender gap, focusing on three aspects: The causes of the gender gap in contributors, its impact on Wikipedia’s content, and how outreach measures that highlight the gender gap influence participation on Wikipedia.
It uses several sources of data, including the edit histories of all registered English Wikipedia users who have stated their gender in the user preferences, a survey and experiment with 1000 Amazon Mechanical Turk users (from the US only, who were paid $1.50 for a 20 minutes task), and a dataset of biographical articles with the subject’s gender obtained from Wikidata (excluding “celebrities like actors, athletes, and pop stars”, focusing on “professionals”, e.g. politicians and scientists, and cultural figures like writers and composers), together with pageview data.
Regarding causes of the gender gap, the author provides an overview of existing research, for example dismissing the so-called second shift as an explanation (“There are no gender differences in the amount of free time”, p.3) and pointing out that “women contribute no less than men to another example of online public good provision, writing user reviews for products and services”.
From the survey, the author concludes that “almost half of the gender gap in Wikipedia writing is explained by gender differences in two characteristics: frequency of Wikipedia use and belief about one’s competence … The gender difference in the belief about competence could be due to women being less competent or due to women underestimating their competence. The survey data does not allow to distinguish these.” (While the paper is otherwise well-informed about pre-existing research, it would have benefited from connecting this result to the work of Shaw and Hargittai; see our review of their paper “Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia”).
Moving on to the effect of the editor gender gap on Wikipedia’s content, the paper finds “that women are about twice as likely as men to contribute to Wikipedia articles about women”, based both on the edit histories dataset and the Mechanical Turk survey. Intriguingly, “the number of readers per editor is higher for articles about women, and the share of articles that no one reads is larger in the case of articles about men”. In other words, readers prefer articles about women, editors prefer articles about men. The author indicates that the readership discrepancy mostly comes from the tail end of low-traffic biography articles:
“On a typical (median) day in September 2014, no one read 26 percent of the biographies of men versus only 16 percent of the biographies of women.”
The third part consisted of an experiment designed to “test whether providing information about gender inequality in Wikipedia changes editing behavior”. Mechanical Turk respondents were divided into two groups that were provided with different introductory information about Wikipedia:
“Wikipedia has been criticized by some academics and journalists for having only 9% to 13% female contributors and for having fewer and less extensive articles about women or topics important to women.” (a quote from the article Gender Bias on Wikipedia)
vs.
“Wikipedia started in 2001. English-language Wikipedia has over 4.5 million articles.”
They were then “asked to imagine a hypothetical situation in which they edit a person’s Wikipedia page. Respondents were asked to look at Wikipedia articles and find some relevant information from the web that is missing from a Wikipedia article. … In the end, they were also asked how likely they are to edit Wikipedia in the future.”
The first version, highlighting the criticism of Wikipedia’s gender gap, is “associated with a 35 percent decrease in the likelihood of editing Wikipedia in the future”, i.e. discouraged rather than encouraged respondents from contributing, which the author calls “somewhat unexpected”. This negative effect is concentrated among men: “The information that the majority of Wikipedia editors are men, leads men to reduce their editing effort, but it does not change the behavior of women.” As summarized by the author:
“The result provides an example where encouraging gender equality can partially backfire. Wikipedia has set a goal to increase the share of female editors. One way to achieve this is by discouraging male editors. However, this might not be desirable … The implication for Wikipedia and other forms of media is that it is important to balance the efforts of attracting new contributors and keeping the current ones.”
She also points out that “there are other examples in the literature where informational treatment has backfired”.
The paper is highly innovative and adds several novel results (with direct relevance for Wikipedians’ work to combat this kind of systemic bias), some of which are not mentioned in this summary. The author seems justified in calling it “the first comprehensive study of gender inequality in a new media environment such as Wikipedia”. A weakness of the part of the paper that studies the effect of editors’ gender on their contributions might be its partial reliance on the gender as stated in their accounts’ user preferences. The author stresses that her methodology is robust against potential under-reporting by one gender (for examples, female editors being less willing to publish their gender in this way because of concerns about harassment). However, she adds that the validity of the results rests on the assumption “that editors don’t systematically report wrong gender. Since the default option is not specifying one’s gender, I would not expect that they are massively reporting wrong gender.” In contrast, a 2011 paper by other authors (“WP:CLUBHOUSE”, see Signpost summary) that used the same methodology (and concluded that e.g. women vandalize Wikipedia more often than men) explicitly pointed to the possibility that their results might be affected by deliberately wrong reporting (although this might mostly concern vandals with few edits overall, i.e. less relevance to the questions studied here). The paper also falls victim to a survivor bias fallacy when interpreting an otherwise interesting result as “female editors [having] increased from 3.7 percent in 2002 to a peak of 11.5 percent in 2011. In 2013, 10.4 percent of the active editors were female.” The option to state a gender in one’s user preferences was only introduced in 2009, so it is possible that, for example, there was a much higher percentage of women editing Wikipedia in 2002 who however left before they had the opportunity to state their gender seven years later.
Teaching Wikipedia: The Pedagogy and Politics of an Open Access Writing Community
This dissertation[2] looks at the opportunities for writing pedagogy offered by the Wikipedia:Education program. It provides an interesting, though not comprehensive, overview of the literature in the field, and then proceeds to describe and analyze a number of educational assignments that the author has carried out on Wikipedia through their 2011 course. The author concludes that the “teaching with Wikipedia” approach is generally beneficial to students in a number of ways, from improving their writing and research skills, to an increase in student’s rhetorical skills, and understanding of topics relating to knowledge creation. The main limitations of the study, acknowledged by the author, is that it is based on a small sample of students (the course seems to have only about seventeen participants). Nonetheless, it is a useful addition to our still limited understanding of the practice and benefits of the use of Wikipedia in an educational setting.
“Wikipedia, sociology, and the promise and pitfalls of Big Data”
This paper,[3] or perhaps an essay or an Onion piece (2,500 words, with little original research), entitled “Wikipedia, sociology, and the promise and pitfalls of Big Data”, is a strange beast. Published in the journal Big Data & Society, it doesn’t really address the topic of big data; instead presenting a sociologically-informed and critical discussion of a number of aspects of Wikipedia that, while interesting, seems out of place in an academic journal, and reads more like an academic blog entry. The authors display a reasonable familiarity with Wikipedia, though they make a few factual mistakes (such as suggesting that Wikipedia:WikiProject Sociology was formed with the assistance of the American Sociological Association in 2004; in fact ASA has not been aware of WP:SOCIO until late 2000s and its support for it has been limited to linking to the WikiProject from their Wikipedia Initiative Page).
Based on their literature review, the authors don’t hesitate to make some strong claims about Wikipedia, primarily in the vein of Wikipedia becoming less friendly to new editors, though most of those claims are more or less supported by the sources cited. The authors’ research question is how the discipline of sociology is framed on Wikipedia, with special attention to the concepts of notability of academics (WP:PROF) and the gender imbalance of the Wikipedia biographies of sociologists. Unfortunately, as this is not a proper research piece, the authors’ findings are rather sparse, and primarily concern the fact that topics covered by the WikiProject Sociology and its related portal are poorly structured, that Wikipedia’s biographies of sociologists are mostly about male subjects (the article omits, however, the question of gender bias in academia – aren’t most sociologists male anyway…? ), and that WP:PROF guideline may not be enforced too strictly for sociological biographies. It was an enjoyable reading, but overall, as seen in the article’s sections which are entitled Abstract, Declaration of conflicting interests, Funding and Notes, there is something important missing – the article proper. As the authors make a point of stressing (twice) the chaotic and unorganized nature of Wikipedia’s coverage of sociological topics, I can’t help but feel that the article, which also fails to drive home any particular and well organized point, could well fit that description too.
Wikipedia may effect the stock market in a “governing” way, says Crowd Governance: The Monitoring Role of Wikipedia in the Financial Market[4]. It looks at how the stock market and insider trading reacts to the creation of a Wikipedia article about a traded firm. Using a sample of 413 articles on S&P500 firms, it was found that stock prices significantly drop on the days their Wikipedia article is created. Furthermore prices drop further for companies that have more insider traders, or which are more institutionally owned. This goes to show, the authors say, that Wikipedia governs the stock market by “reducing information asymmetry”. Firm information on Wikipedia would seem to benefit the public more than information in newspapers, that is bad news for Wall Street.
Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
“Understanding the ‘Quality Motion’ of Wikipedia Articles Through Semantic Convergence Analysis”‘[5] From the abstract: “This study aims to check if Wikipedia’s [quality] ratings really reflect its stated criteria. According to Wikipedia criteria, having abundant and stable content is the key to article’s quality promotion; we therefore examine the content change in terms of quantity change and content stability by showing the semantic convergence. We found out that the quantity of content change is significant in the promoted articles, which complies with Wikipedia’s stated criteria.”
“Wikipedia’s Politics of Exclusion: Gender, Epistemology, and Feminist Rhetorical (In)action”[6] From the abstract: “In this article, I explore how Wikipedia functions as a rhetorical discourse community whose conventions exclude and silence feminist ways of knowing and writing. Drawing on textual analysis of Wikipedia’s editorial policies, as well as interviews with female users, I argue that Wikipedia’s insistence on separating embodied subjectivity from the production of knowledge limits the site’s ability to facilitate any substantial, subversive feminist rhetorical action.”
“Knowledge Quality of Collaborative Editing in Wikipedia: an Integrative Perspective of Social Capital and Team Conflict”[7] From the abstract: “Despite the abundant researches on Wikipedia, to the best of our knowledge, no one has considered the integration of social capital and conflict. Besides, extant literatures on knowledge quality just pay attention to task conflict, while relational conflict is rarely mentioned. Meanwhile, our study proposes the nonlinear relationship between task conflict and knowledge quality instead of linear relationships in prior studies. We also postulate the moderating effect of task complexity.”
“Collective remembering of organizations: Co-construction of organizational pasts in Wikipedia”[8] From the abstract: “The authors analyze 1,459 edits of Wikipedia pages of ten organizations from various industries. Quantitative content analysis detects Wikipedia edits for their reputational relevance and reference to formal sources, such as corporate communication or newspapers. Furthermore, the authors investigate to which degree current corporate communication in form of 177 press releases has an influence on the remembering process in Wikipedia. … The analysis of press releases shows that current frames provided by corporate communication finds only little resonance in the ongoing remembering processes in Wikipedia.”
References
↑Marit Hinnosaar (2015): Gender Inequality in New Media: Evidence from Wikipedia. No 411, Carlo Alberto Notebooks from Collegio Carlo Alberto. PDF
↑Vetter, Matthew A. Teaching Wikipedia: The Pedagogy and Politics of an Open Access Writing Community. Thesis, 2015, Doctor of Philosophy (PhD), Ohio University, English (Arts and Sciences). PDF
↑Julia Adams, Hannah Brückner: Wikipedia, sociology, and the promise and pitfalls of Big Data. DOI:10.1177/2053951715614332, Dec 2015
↑Huijing Deng, Bernadetta Tarigan, Mihai Grigore, Juliana Sutanto: Understanding the ‘Quality Motion’ of Wikipedia Articles Through Semantic Convergence Analysis. Proceedings of HCI in Business, Second International Conference, HCIB 2015, held as Part of HCI International 2015, Los Angeles, CA, USA, August 2-7, 2015, DOI:10.1007/978-3-319-20895-4_7
↑Zhan, Liuhan; Wang, Nan; Shen, Xiao-Liang; and Sun, Yongqiang, “Knowledge Quality of Collaborative Editing in Wikipedia: an Integrative Perspective of Social Capital and Team Conflict” (2015). PACIS 2015 Proceedings.Paper 171. http://aisel.aisnet.org/pacis2015/171
↑Michael Andreas Etter , Finn Årup Nielsen, (2015) “Collective remembering of organizations: Co-construction of organizational pasts in Wikipedia”, Corporate Communications: An International Journal, Vol. 20 Iss: 4, pp.431 – 447 DOI:10.1108/CCIJ-09-2014-0059
This weekend (January 8-10), I’ll be attending the Linguistic Society of America’s annual conference in Washington, D.C. In November, the Wiki Education Foundation announced our educational partnership with the Linguistic Society of America (LSA) to target and improve linguistics topics on Wikipedia.
Next week is Wikipedia’s 15th birthday, the first draft of the long awaited strategic plan of the Wikimedia Foundation will be published for comment, and yesterday was the start of its annual “all staff” meeting. Meanwhile… there is a battle going on at the top for its soul.
staff morale at the WMF is at an all time low (only 7% feel informed, only 10% feel confident in senior leadership, as reported in The Signpost);
there has been a “transparency gap” including the mid-2015 major, yet secretly planned, “re-org” of the engineering department (see the list on the talkpage of WMF.WTF);
WMF’s Executive Director Lila Tretikov with Board member Jimmy Wales
It is my supposition that this is not a list of unrelated incidents, but that this is part of a wider theme: That a portion of the Board of Trustees and the Executive Director of the Wikimedia Foundation believe that it should be treated as a technology organisation in the style of a dot-com company, out of step with the staff and without the awareness of the community. By contrast, it’s always been my belief that the Wikimedia Foundation is an education charity that happens to exist primarily in a technology field. Of course software engineering is crucial to the work we do and should represent the major proportion of staff and budget, but that is the means, not the end.
All this background makes next week’s WMF draft Strategic Plan a very important document. For the 2010-15 plan there was a massive community consultation project but this time around there was only a 2-question survey. As Philippe Beaudette, the Community Facilitator on that original strategy process and latterly the WMF Director of Community Advocacy (who also recently left the organisation), said to me [with permission to publish here]:
The Wikimedia Foundation has one unique strategic asset: the editing community. Other orgs have great tech resources, tons of money, good software, and smart staff… but none of them have the editing community. I am, frankly, saddened by the fact that this one unique strategic asset is not more central to the developing strategy.
The November staff presentation gives a strategy preview that speaks of three priorities (slides 28-30): “1. Engage more people globally (reach) 2. Facilitate communities at-scale (community) 3. Include broader content (knowledge)”; and describes a need to “prioritise core work” (slides 32-33). All laudable goals, but they only include “example objectives” such as “build capacity”, “improve trust”, and “improve tools”.
Nevertheless, I suspect that the major strategic direction has already been privately determined. In short, it appears there will be an attempt to create the internet’s Next Big Thing™ at the expense of improving the great thing that we already have.
In May, as noted by Risker, “Search and Discovery, a new team, seems to be extraordinarily well-staffed with a disproportionate number of engineers at the same time as other areas seem to be wanting for them.”
The June staff presentation “strategy preview” talks about creating a “knowledge engine where users, institutions and computers around the world contribute and discover knowledge”. The FAQ page for the “Discovery department”, describes this project as “…improving the existing CirrusSearch infrastructure with better relevance, multi language, multi projects search and incorporating new data sources for our projects.”
As mentioned above, we now have two new Silicon Valley executives appointed to the Board of Trustees. They join the previously appointed member of the board Silicon Valley venture-capitalist Guy Kawasaki, as well as internet entrepreneur Jimmy Wales himself. There is no one appointed for their professional experience in education, charities, communities or developing countries.
While I agree with the general premise that the search system on the Wikimedia projects can be improved, I don’t know anyone who thinks “an indexed & structured cache [of] Federated Open Data Sources” should be THE strategic priority. Starting something entirely new like Federated Search is HARD and trying to include external sources (that link also suggests trying to also integrate the US Census, and the DPLA) is even harder, especially when there are so many existing technical needs. Quoting Philippe again; “for instance, fixing the inter-relationships between languages and projects, or creating a new admin toolset for mobile, or paying down our technical debt, or establishing a care/command/control operation for critical tools to ensure their sustainability, etc….”.
The Funds Dissemination Committee (on which I sit as a community-elected member) declared in November that it is “…appalled by the closed way that the WMF has undertaken both strategic and annual planning, and the WMF’s approach to budget transparency (or lack thereof).” In response the WMF is considering submitting its 2016-17 annual plan, based on the aforementioned strategic plan, to a “process on-par with the standards of transparency and planning detail required of affiliates going through the Annual Plan Grant (APG) process”.
We will see over the next weeks to what degree the apparent shift towards a Silicon Valley mindset – whether the staff and community like it or not – is indeed true. As the then-Chair of the Board Jan-Bart de Vreede said in describing Lila Tretikov’s appointment as Executive Director,
We are unique in many ways, but not unique enough to ignore basic trends and global developments in how people use the internet and seek knowledge…I hope that all of you will be a part of this next step in our evolution. But I understand that if you decide to take a wiki-break, that might be the way things have to be.
When I was young, I visited my uncle for a few days during my holidays. He was a milkman in Haarlem, a gregarious person and he had his addresses where he went for a cup of coffee.
At one address, they had spirograph, a wonderful toy, I played with it for a few days and I loved it. When I went home, he "paid" me and instructed me to buy sprirograph for myself. I told him that I could not do this.
He called my parents and told me about it. They assured him that I could if I wanted to. I may have been nine or ten and, to me it was the kind of thing that is wonderful to play with when you come across it but not the kind of thing that I would always play with.
Over the past few months I've been working on an updated Debian package for MediaWiki. After working with Luke, Faidon, Moritz, and a few others, we now have a mostly working package! It's being developed in Gerrit (the mediawiki/debian repository), so it hopefully will get more visibility, and we can keep it in sync with current development.
If you're interested in testing, I've uploaded it to people.wikimedia.org for now.
Sports fans vandalize Wikipedia articles, and sports bloggers write about it—often. Photo by Master Sgt. Lance Cheung of U.S. Air Force, public domain.
On November 22, football (soccer) team Manchester City lost 4–1 at home to Liverpool. Some Liverpool fans commemorated the drubbing by vandalizing the Wikipedia page for the City of Manchester Stadium, and the press took notice.
“Manchester City stadium now ‘owned by Jürgen Klopp’, claims Wikipedia page,” read a story on The Guardian’s online sports section. Klopp is Liverpool’s manager. The Guardian story reported that the Wikipedia article had been edited to show Liverpool’s coach owned the stadium, which had supposedly been renamed, and that a Liverpool player now operated it.
The Guardian‘s story was published at 11:28 a.m. GMT, and shared on social media 4,498 times.
Yet before the story ever appeared online, the vandalism reported by The Guardian was reverted. Within an hour of publication, the Wikipedia article was “protected,” or closed to anonymous edits.
The speed of the reversion could not chase down online reporting of the vandalism. Few actually saw the bad edits—yet according to Google News, 1,541 articles reported the vandalism, including The Mirror and BBC News Online.
Anyone can edit any Wikipedia page, and for exuberant fans, that temptation can prove too much to resist.
“I was just trying something new,” a Facebook user who posted about vandalizing the Manchester City page later told Wikipedia on Facebook. “I was like, hey is it possible if I add something by myself, even if it was wrong? So I decided to do it.” He said he is a football fan who otherwise greatly appreciates Wikipedia.
On the other end of the vandalism was Smartse, a Wikipedia editor since 2006 with more than 32,000 edits. “I was reading The Guardian, and noticed an article about the vandalism. So I checked the history and protected it for a few days,” Smartse said via email. “I would say that it was being taken out of proportion since it was hardly any different to all the other vandalism high traffic articles get. Publicising vandalism like this isn’t helpful in general as it will only encourage others to do the same. Even if it’s funny, someone still has to revert it.”
Wikipedia vandalism can reach some very high places. On October 20, Barack Obama noted vandalism to the article on US women’s soccer team star Carli Lloyd at a White House press appearance. In July, Lloyd’s Wikipedia page was vandalized to state that she was the president of the United States after she scored three goals in the final game of the World Cup. The president joked that Lloyd knew more about being president than some of the current candidates.
The vandalism to Lloyd’s page was reverted in fiveminutes; still, a Google News search shows more than 6,000 news articles cited the prank.
In a similar incident, goalkeeper Tim Howard’s Wikipedia page was changed to list him as Secretary of Defense after his 16 saves in a 2014 World Cup game. US Secretary of Defense Chuck Hagel called Howard to congratulate him—a fact reported by The Wall Street Journal, USA Today, and Yahoo! Sports, among many others.
For every inspired vandalism edit, there are many that are not. As millions of fans curious about Howard’s life story were simply trying to learn facts about him, his Wikipedia article was repeatedly vandalized.
TV networks have long declined to broadcast fans interrupting games. A spokeswoman for CBS Sports toldThe New York Times a decade ago that “It’s our policy to turn our cameras away from any exhibitionist behavior. We’re not going to provide the vehicle for these people.” But the Web has changed that, elevating streakers on the field and other pranks on websites and social media. “The exhibitionists who interrupt sporting events no longer have to rely on the reluctant gaze of a television camera to advance their notoriety,” the Times wrote in 2005.
Some say online reporting of Wikipedia vandalism should have ended then.
“Media reports of Wikipedia vandalism are more than a decade obsolete,” said Howard Rheingold, who helped to develop early crowdsourced projects and social networks, and taught about them at the University of California at Berkeley and Stanford University. “Wikipedia has always worked,” Rheingold said, because the number of people “who have the power to revert to a previous version with one click has vastly outnumbered the number of vandals.”
The reports of vandalism continue, some even noting that the edits are gone in an instant, but immortalized in blogs.
The New England Sports Networkreported on Dec. 28 about vandalism to New England Patriots’ player Matt Slater’s Wikipedia page the day before. The edit cited in the blog post disappeared in five minutes, and NESN noted “the sarcastic jokes were quickly removed.” The post then showed the vandalized articles, noting:
“But nothing ever really disappears on the Internet, as the following screen shots prove.”
Jeff Elder, Digital Communications Manager
Wikimedia Foundation
Here at Wiki Strategies, we are excited to celebrate Wikipedia’s 15th birthday on January 15, 2016. January also marks my 10 year anniversary as a Wikipedian.
Wikipedia is the world’s most extensive platform for collaboration, as well as being the most widely-read publication in history. It touches the lives and work of nearly everyone, in many different ways. So, to mark the occasion, Wiki Strategies will present 15 video interviews with leading thinkers in different industries and disciplines. How does Wikipedia affect practices like journalism, social justice, education, or politics? We’ve been asking around, and we think you’ll find these perspectives fascinating.
We will kick off this video series at Wikipedia’s 15th Birthday Party in San Francisco, one of many events around the world. The San Francisco event will be held on Saturday, January 16, 2015; it’s a half-day event, including presentations, panel discussions, video links with Wikipedians in other cities, and of course…CAKE! Read more and register here.
For this event, I will interview four people, and lead a panel discussion. Our panelists will include (subject to updates):
Cathy Casserly, Vice President of Learning Networks for EdCast, and a widely respected leader in the Open Educational Resources movement
Eugene Eric Kim, founder of Faster Than 20; designer in 2009 of Wikipedia’s broadly inclusive strategic planning process, which engaged 1,000 people in an open planning process.
Below, you can get to know each of our panelists through their work — though I don’t think you’ll hear them mention Wikipedia in any of these videos! Those conversations are yet to come; I hope you’ll join us, whether online or in person.
It’s 2016 and it seems like a good time to attempt some new type of explanation of things. Things in general, I mean, and things internety. Or, maybe not ‘explanation’ so much as formless rambling. That’s easier on the brain, given the amount of sleep I’ve been getting (i.e. sod all).
I’m four days in to the new working year, and some good bits of code are already shaping up (file attachment fields and schema-editing in Tabulate, hopefully both ready to roll before too much longer). Some odd bits of enterprise bureaucracy have nearly fallen on my head but for the most part missed me (whereon I’ve attempted the old I-didn’t-see-anything trick, and carried on regardless).
I had a couple of weeks off, and explored some great bits of the south west. So nice to be back at Wilyabrup (not climbing, just looking, and some mapping). And I didn’t even take my GPS to Walpole; good to be not attempting to Record Everything for a while.
Things for this year, perhaps: Wikisource proofreading; importing Nyunga words into Wiktionary; carry on with Tabulate; print CFB at long last; go to Wikimania; try to write every day; get MoonMoon working again properly for Planet Freo. But mostly: stop re-evaluating everything and just get on with what’s (reasonably and probably not perfectly) good enough and worthwhile. Code less! Work on content and data more; code only what’s required.
Thursday marks the beginning of the American Historical Association’s 2016 annual meeting. I’m inviting all attendees to join my workshop on using Wikipedia in higher ed history courses, and a roundtable on digital history.
As a website with 500 million monthly visitors, Wikipedia plays a significant role in 21st century history documentation. Readers reference Wikipedia to learn about everything, from ancient history, to local history, to current events.
But who writes the content, and how does bias play a role in what ends up on Wikipedia? Does it align with accounts written by professional historians? How can historians improve Wikipedia?
A proven way for experts to improve Wikipedia is bringing it into their classrooms. Evaluating Wikipedia’s articles — and how they change —is an effective way to teach students about historiography. One instructor reflected on her experience using Wikipedia in a history course and observed:
“Perhaps most importantly, my students learned a crucial lesson: theater history — indeed, all history — is a living, changing thing. I always push my students to read textbooks and scholarship with a critical eye, because what we tend to call “history” is really historiography — stories about the past written by people. Peer-review processes at publishing houses ensure that the books we read are accurate, well researched, and authoritative; but even the most respected and skilled scholars can get the story wrong. Other scholars must come along, armed with newly discovered insights or evidence, to revise these histories. On Wikipedia, the challenges involved in writing history are fully visible. My students learned that resources like Wikipedia are only as good as the careful, thoughtful contributions that people choose to make.”
I’ll be at the conference to discuss Wiki Ed’s programs, and how you can share Wikipedia with the next generation of experts. If you’re in the Atlanta area and would like to set up a meeting, please email me: [email protected].
Where to find Wiki Ed
Thursday, January 7th
Workshop: Wikipedia and Digital Literacy
Hilton Atlanta, 2nd floor, Room 210
Session 1: 9:00–10:00am
Session 2: 10:00–11:00am
Roundtable: Debates in Digital History
Hilton Atlanta, 2nd floor, Salon B, 11:00 a.m.–12:00 p.m.
Friday, January 8th
Lightning round: Digital Pedagogy in and out of the Classroom
On Tuesday, January 12, I’ll be presenting Wikipedia assignments to librarians and instructors at Hunter College in New York.
Wikipedia writing assignments are a great way for students to engage with course literature. From humanities, to social sciences, to STEM fields, when students research their topic to write a Wikipedia article, they’re writing for an active, public audience. That develops core communication skills. Students actively engage with their topics by selecting reliable sources, weighing available knowledge, and developing deeper understanding of their readings.
In a 90-minute session, I’ll share what Wiki Ed has learned from supporting hundreds of higher education instructors across the United States and Canada. The workshop will explore why assignments like these make such a huge impact on student learning and public knowledge.
Our session will take place at Hunter College, Tuesday, January 12, at 1:30 p.m. Join us in Room 404 on the 4th floor of the Leon & Toby Cooperman Library (the Main Library) at 68th & Lexington (map).
Any higher education instructors in the greater New York area are invited to attend. Please register in advance as seating is limited! Open our online registration form to RSVP.
For some ideas on how Wikipedia assignments can help teach research and critical writing skills, read this short and brilliant list from Hunter College’s own Web and Digital Initiatives Librarian, Chanitra Bishop.
If you know someone in the New York area interested in Wikipedia assignments, feel free to share the registration link, or email me: [email protected].
In a recent Wikipedia Signpost Op-Ed, Andreas Kolbe wrote about Wikidata and references. He comes to the conclusion that Wikidata needs more (non-Wikipedia) references, a statement I wholeheartedly agree with. He also divines that this will never happen, that Wikidata is doomed, while at the same time somehow being controlled by Google and Microsoft; I will not comment on these “conclusions”, as others have already done so elsewhere.
Andreas also uses my own Wikidata statistics to make his point about missing references on Wikidata. The numbers I show are useful, IMHO, to show the remarkable progress of Wikidata, but they are much too crude to draw conclusions about the state of references there. Also, the impression I get from Andreas’ text is that, while Wikipedia has some issues, references are basically OK, whereas they are essentially non-existent in Wikidata.
So I thought I’d have a look at some actual numbers, especially comparing Wikipedia and Wikidata in terms of references.
One key issue is that there is no build-in way to get metrics about statements and references from Wikipedia. I therefore developed my own approach. Given a Wikipedia article, I use the REST API to get HTML for the article. I then count the number of reference uses (essentially, <ref> tags) in the article; note that this number is larger then (or at least equal to) the number of references at the bottom of the page. Then, I strip the HTML tags, and count the number of sentences (starts with an upper-case character, has at least 50 characters, ends with a “.”); the numbers were confirmed manually for a few example articles through other sentence counting tools on the web, and yielded similar results. I then assume that each sentence in the article contains one statement (or fact); in reality, there are likely many such statements (such as the first sentence of a biographical article), but I am aiming for a lower boundary here. (Any sentence not containing a statement/fact should be deleted from Wikipedia anyway.) A useful metric from both the number of reference uses, and the number of statements (=sentences), is the references-per-statement (RPS) ratio.
For Wikidata, a similar metric can be calculated. For practical purposes, I skip statements of the “string” type, as they are mostly external references in themselves (e.g. VIAF identifiers); I also skip “media”-type statements, as they should have “references” in their file description page on Commons. For references, I do not count “imported from Wikipedia”, as these are not “real” references, but rather placeholders for future improvement. Again, a RPS ratio can be computed.
I then calculated these ratios for 4,683 Featured Articles from English Wikipedia and their associated Wikidata items (data). As these articles have been significantly worked over and approved by the English Wikipedia community, they should represent the “best case scenario” for Wikipedia.
Indeed, the RPS ratio is higher for Wikipedia in 87% of cases, which would mean that Wikipedia is better referenced than Wikidata. But keep in mind that this represents the best of the best of the best of English Wikipedia articles, fifteen years in the making, compared to a three-and-a-half-year old Wikidata (and references were not supported for the first year or so). This is as good as it gets for Wikipedia, and still, Wikidata has a better RPS in about 13% of cases.
Even more interesting IMHO: Taking the mean of both number of statements and number of references for both Wikipedia and Wikidata, respectively, and calculating the RPS ratios for those means, yield 0.32 for Wikipedia and 0.15 for Wikidata. This seems counter-intuitive, given the previous 87/13 “ratio of ratios”. However, further investigation shows that only 1305 (~28%) of Wikidata items have any references at all, but where there are references, they usually outshine Wikipedia; about half of the items with at least one reference have a better RPS ratio than the respective Wikipedia article. This seems to indicate a “care factor” at work; where someone cared about adding references to the item, it was done quite well. Wikidata RPS ratios range up to 1.5, meaning two statements are, on average, supported by three references, whereas Wikipedia reaches “peak RPS ratio” at 0.93, or slightly less than one reference per statement.
I believe these numbers show that Wikidata can equal and surpass Wikipedia in terms of “referencedness”, but it is a function of attention to the items. Which in turn is a matter of man- and bot-hours spent. Indeed, for the Wikidata showcase items (the equivalent of Featured Articles on Wikipedia), the Wikidata RPS ratio is better that that of the associated English Wikipedia article in 19 out of 24 cases (~80%).
So will Wikidata ever catch up to Wikipedia in terms of RPS ratio? I think so. The ability of Wikidata to be reliably edited by a machine allows for improvement by automated and semi-automated bots, tools, games, on-wiki gadgets, etc. which allow for much steeper editing rate, as I demonstrated previously for images, where Wikidata went from nothing to second place in about two years, and is now angling for the pole position (~1.1M images at the moment). I see no reason to doubt this will happen to references as well.
A new grant from the Knight Foundation will improve search and discovery on Wikipedia. Photo by Julo, public domain.
The Wikimedia Foundation will launch a new project to explore ways to make the search and discovery of high quality, trustworthy information on Wikipedia more accessible and open with $250,000 from the John S. and James L. Knight Foundation. Funding will support an investigation of search and browsing on Wikipedia and other Wikimedia projects, with the goal of improving how people explore and acquire information.
Wikipedia includes more than 35 million articles across hundreds of languages. Its standards for neutral, fact-based and relevant information have made it a reliable resource for nearly half a billion people every month. With more than 7,000 articles created every day and 250 edits made per minute, Wikipedia is constantly growing and improving. Its open, nonprofit model, allows anyone to participate and contribute. This project will help improve discoverability of this vast resource of community-created content.
Over the last decade, the world has seen a surge in digital information. People today can access vast amounts of information online, mostly through a small number of closed technologies. Through this project, the Wikimedia Foundation will test ways to make relevant information more accessible and investigate transparent methods for collecting, connecting, and retrieving this information consistent with the values of Wikipedia and the open web.
With Knight support, the Wikimedia Foundation has begun six months of deep research, testing, and prototyping on user search habits and practices on Wikipedia and other Wikimedia projects. Using these platforms as testing grounds, the organization will examine questions around content preferences, queries, the quality and relevance of results, and what information people consume and why. It will conduct open discussions with the Wikimedia community to help inform the project. A public-facing dashboard will display results and metrics from this discovery and lessons will be shared widely.
“Finding an article on Wikipedia is like opening the first page in the book of knowledge. We have an obligation to our communities to make this first experience captivating for every user. We share Knight Foundation’s belief in the power of open information in building engaged, strong communities. We are excited for the potential of this project to bring free, relevant, trustworthy knowledge to every person,” said Wikimedia Foundation Executive Director Lila Tretikov.
“As the amount of digital content continues to grow, helping people search for and discover relevant information so they can make decisions important to their lives is becoming increasingly essential,” said John Bracken, Knight Foundation vice president for media innovation. “This project will help uncover more effective, transparent ways to do just that, drawing on the Wikimedia Foundation’s commitment to an open and free Internet.”
For more information, please see our FAQ.
Wikimedia Foundation Communications
Since writing Reading the Comments I often think about how to best explain why it is people can act so rotten online.
I recently put together a graphic that uses the "bad apple" idiom.
The three sources of rottenness also, roughly, follow the development of theories about online behavior.
However, I believe the temperature (media effects), state of the barrel (group culture), and presence of worms (disordered personalities) are all still relevant.
When researchers first started talking about flaming back in the 90s, they tended to focus on the effects of "Computer-Mediated Communication" (CMC).
Researchers spoke of reduced social cues, media richness, and social information processing; they offered theories of hyperpersonal media and of deindividuation effects.
When considering why people can be so rotten, I think these media effects are related to the effect of temperature on apples.
The hotter it is, the faster the apples in a barrel will spoil.
We can see this when Lindy West's cruelest troll apologized: "it finally hit me. There is a living, breathing human being who is reading this shit. I am attacking someone who never harmed me in any way."
Interacting online had distanced him from the consequences of his actions.
Using digital communication can increase the temperature and the likelihood of something rotten happening.
In the new millennium, these of media effects were supplemented by a focus on environment and culture.
Watt, Lea, and Spears wrote that "theoretical revisions have moved away from the central importance of communication bandwidth."
They also argued that people still have inhibitions and norms when online, people just look to more salient norms.
Trolling had become its own culture, with its own norms.
As Coleman wrote in 2011: "trolls have transformed what were more occasional and sporadic acts, often focused on virtual arguments called flaming or flame wars, into a full-blown set of cultural norms and set of linguistic practices."
In this light, someone like Violentacrez didn't become wholly uninhibited by norms; he was doing what had become the norm in his corner of Reddit.
Just as an apple in a rotting barrel is likely to go bad, someone hanging out in a rotten subreddit is more likely to do the same.
Finally, although folks have long been armchair-diagnosing others, researchers are beginning to consider personality.
Buckels, Trapnell, and Paulhus found measures of sadism, psychopathy, and Machiavellianism are positively correlated with trolling and that were was a strong relationship between "online commenting frequency, trolling enjoyment, and trolling behavior and identity."
Although I object to using the term "troll" to label any undesirable behavior, I relate this to the worm that spoils an apple.
Maybe folks Violentacrez and weev would test highly on this "dark tetrad" of personality variables.
At the extreme, I sometimes reference Luka Magnotta, who was diagnosed with paranoid schizophrenia in his teens.
As I wrote in Reading the Comments, in his twenties he became notorious for his online exploits, including suffocating cats; he eventually killed and dismembered a man---all posted on YouTube.
In 2012 he fled to Europe where he continued posting videos taunting police and thanking "his fans" for their attention and support.
Magnotta was eventually arrested in an Internet cafe in Germany reading stories (and likely commenting) about himself.
Although disordered folks are a minority, they can have a disproportionate effect, especially online.
All of these things, temperature, environment, and worms contribute to a rotten barrel of apples.
Similarly, media effects, culture, and the disordered do the same online.
Meet the two new members of the Wikimedia Foundation’s Board of Trustees: Kelly Battles (left), and Arnnon Geshuri (right). Photos by Myleen Hollero, freely licensed under CC BY-SA 3.0.
Today the Wikimedia Foundation announced two new members to its Board of Trustees: Kelly Battles and Arnnon Geshuri. The new Trustees bring deep expertise in strategy and financial oversight, and diversity and organizational development, as well as a commitment to advancing Wikimedia’s vision of free knowledge for the world.
“We considered dozens of candidates from all over the world, with not-for-profit and technology experience, and the highest professional standards.” said Dariusz Jemielniak, Chair of the Wikimedia Foundation Board Governance Committee and Board Trustee. “Kelly’s finance and auditing skills will be essential to the Board’s oversight and budgeting responsibilities. Arnnon’s expertise in talent development and cultural diversity will be indispensable for the development of the Wikimedia Foundation, and communications and transparency within the Wikimedia movement. We look forward to partnering with them.”
A veteran financial executive, Kelly brings more than 25 years of experience in financial management and administrative oversight for leading technology companies and non-profit organizations. She currently serves as Chief Financial Officer of Bracket Computing, a cloud virtualization company in Mountain View, California. Her earlier roles included Chief Financial Officer at Host Analytics, Vice President of Finance at IronPort, and Director of Strategy and Corporate Development at Hewlett Packard.
“As a non-profit supporting one of the most popular websites in the world, the Wikimedia Foundation has a unique responsibility to practice transparent, effective stewardship of donor resources,” said Kelly. “I am excited to lend my financial and strategic experience to an organization dedicated to making knowledge more freely available to the world.”
Arnnon brings more than 20 years of experience in developing strong organizational cultures with diverse, passionate employees. He is currently the VP of Human Resources at Tesla Motors, where he shepherds Tesla’s unique culture and oversees all global people operations, analytics, and staffing. Before joining Tesla, Arnnon served as Senior Director of HR and Staffing at Google, where he built the company’s talent acquisition and diversity strategy, growing the organization to more than 20,000 people in five years. Earlier in his career, Arnnon served as Vice President of People Operations and Director of Global Staffing at E*TRADE Financial.
“I have always believed in the power of open, transparent knowledge. Wikipedia represents some of the best aspects of our changing world: deeper knowledge, collaboration, and, ultimately, understanding,” said Arnnon. “This opportunity is a true privilege for me and I am thrilled to help support this powerful mission.”
“Kelly and Arnnon bring a special combination of expertise, integrity, and love for our mission. From Arnnon’s people and culture expertise to Kelly’s strong financial management background, both members bring valuable skills to strengthen our Board and help grow the Wikimedia movement for future generations. Most importantly, they bring a deep commitment to making knowledge more freely available for people around the world,” said executive director Lila Tretikov.
Kelly and Arnnon were approved unanimously by the Wikimedia Foundation Board of Trustees. Both terms are effective Jan 1, 2016 and will last for two years.
Mr Stroebe is an emeritus professor, a psychologist who worked for German, American and Dutch universities. From 1992 he worked at the university of Utrecht but that does not make him Dutch.
People could be mistaken for him being Dutch because the description on Wikidata says so: "Dutch social psychologist". Disambiguating Mr Stroebe with Reasonator shows clearly the qualitative difference between automated and manual descriptions.
The difference between the two is an old argument. People may like them but factually manual descriptions are inferior. It takes little effort to improve automated descriptions on a big scale while nobody really looks at descriptions of individual people like Mr Stroebe. Manual descriptions are translated and who dares to suggest that with translations the quality of the descriptions improves? The opposite is easy to argue. With translations, issues become even harder to remedy.
Wiki Ed is welcoming the new year with a new Dashboard system!
The new Dashboard has two major new features. There’s an online training system built in for instructors and students, and a new onboarding system.
The New Instructor Orientation is available to anyone who hasn’t recorded their name and email address on the Dashboard. New instructors, and previous instructors who haven’t registered, will see the training when they log in.
The course Timeline interface is also new and improved. We noticed a big problem last year as we prepared to roll out the Dashboard course page system. It wasn’t easy to rearrange or edit the content on an assignment plan. We heard from several program participants that major changes, or creating a timeline from scratch, could be a frustrating experience.
The new version includes a basic “what-you-see-is-what-you-get” editor, and a more accessible interface. There’s still room for improvement, but I think it’s a major step forward. Let us know if you have suggestions or run into any problems! You can email me: [email protected].
Sage Ross Product Manager, Digital Services
Photo:Notes from our online training planning sprint at WINTR.
January 4 2016. Semantic MediaWiki 2.3.1, is a bugfix release and has now been released. This new version is a minor release and provides s bugfix for the current 2.3 branch of Semantic MediaWiki.See the page Installation for details on how to install, upgrade or update.
Contrary to what people think, at this time Wikidata could provide the most value particularly to the big Wikipedias by replacing the current red links. Red links are really simple. When an article exists, it will link it and that is exactly where things go wrong; when an article is added for a homonym, you need disambiguation. This is where Wikidata makes a difference.
Wikidata is ready because it includes all the articles of all Wikipedias and thereby its capacity to disambiguate is superior. Moving existing red links to Wikidata items may be automated because articles in other languages may have that link already. New red links in articles are checked for a need for disambiguation and when there is no need, a new Wikidata item is created.
In the Reasonator the concept cloud shows links as used in Wikipedia articles. Typically these links could exist as statements on the item. For Mr Ormel, a Dutch psychiatrist, there is only one article so it is not a complicated example. In the text there is (now) a red link to Mr Frank Verhulst. There is now an item for Mr Verhulst and as people convert red links, the statements on the item may be reflected in a tool like the concept cloud.
Another benefit is that links can be verified by tools like Kian. When a subject is about psychiatry for instance, a link to a person who was a football professional all his life is likely wrong and in need for disambiguation.
Arguably changing the red links in Wikipedia can be easily enhanced using information from Wikidata. It will improve quality in a meaningful and measurable way. Many of the functional parts already exist and it is therefore mainly a task of cobbling things together. Thanks, GerardM
The Sydney Opera House lit in French tricolor after the November 2015 Paris attacks. In only a month and a half, this article received one of the highest edit totals of any article on the English Wikipedia in 2015. Photo by Ludopedia, freely licensed under CC BY-SA 3.0.
Each day, volunteer Wikipedia editors make thousands of edits to maintain and expand Wikipedia, the world’s free encyclopedia.
These unpaid contributors volunteer an enormous amount of time to bring the sum of all knowledge to the world, free of charge. While it would be extremely difficult to come up with a reasonable estimation of invested time, we can tell you these people created more than 2.5 million articles and, along with several programmed “bots”, made over 115 million edits in the last year alone.[1]
In fact, the English-language Wikipedia actually reached its five-millionth article this year. Even individually, without counting any other language Wikipedia, it is authored by millions of volunteers from all over the world—from more than eight million logged-in accounts and an untold number of anonymous editors.
With all of this in mind, we wondered what articles received the most attention from Wikipedia editors in 2015. The Wikimedia Foundation’s Aaron Halfaker helpfully put together a list of the most-edited articles on the English-language Wikipedia in 2015, similar to a list he created last year.
Most of the list breaks down into three general categories. Topics like “Deaths in 2015” are perennial top-edited pages and will surprise few if anyone.[2] Others reflect popular culture and major events, such as the two tragic terror attacks in Paris or Jurassic World; this becomes much more prevalent as you move down the list. Last, articles like “Geospatial Summary of the High Peaks/Summits of the Juneau Icefield” are principally the work of one author making many thousands of edits.
In either case, they reflect the intense effort Wikipedians—the demonym for Wikipedia editors—put into unlocking the world’s knowledge.
[1] These totals do not include December 2015, for which data was not available as of publishing time. Averaged over eleven months, this is over four edits per second. [2] Wikipedians chronicle the deaths by month, so the page now redirects to a so-called “list of lists of deaths.”
Ed Erhart, Editorial Associate Wikimedia Foundation
German summary: ORES is eine künstliche Intelligenz, die Vorschläge zur Bekämpfung von Vandalismus machen kann. Nachdem sie auf einigen Wikis bereits erfolgreich eingesetzt wurde, hilft sie jetzt auch bei der Qualitätsverbesserung bei Wikidata. Amir Sarabadani und Aaron Halfaker beschreiben die Entwicklung und den Einsatz von ORES ein einem Gastbeitrag auf englisch.
Today we want to talk about a new web service for supporting quality control work in Wikidata. The Objective Revision Evaluation Service (ORES) is an artificial intelligence web service that will help Wikidata editors perform basic quality control work more efficiently. ORES predicts which edits will need to be reverted. This service is used on other wikis to support quality control work. Now, Wikidata editors will get to reap the benefits as well.
Making quality control work efficient
In the last month, Wikidata averaged ~200K edits per day from (apparently) human editors. That is about 2 edits per second. Assuming that each edit takes 5 seconds to review (which is quite fast for MediaWiki to even load a diff page), that means we would need to spend ~277 hours every day just reviewing incoming edits. Needless to say, this does not really happen and that is a problem since vandalism sticks until someone notices it.
An automated scoring system like ORES can dramatically reduce that workload. For example ORES gives you a score for a revision indicating the probability of it being vandalism. This edit for example scored 51% which means it is probably okay but this edit scored 100% which means it is probably vandalism (and it is vandalism). Through this simple web interface, ORES makes it easy to tap into a high-power artificial intelligence that is trained to detect damaging edits.
We can use these predictions to sort edits by the probability that they are damaging. This allows recent changes patrollers to focus their attention on the most problematic edits. By focusing on the 1% highest scored edits, we can catch nearly 100% of vandalism — reducing the amount of time and attention that patrollers need to spend reviewing the recent changes feed (even with bots excluded) by 99%. You can read more in this Phabricator task. This kind of efficiency improvement is exactly what Wikidata needs in order to operate at scale.
A note of caution: These machine learning models will help find damaging edits, but they are inherently imperfect. They will make mistakes. Please use caution and keep this in mind. We want to learn from these mistakes to make the system better and to reduce the biases that have been learned by the models. When we first deployed ORES, we asked editors to help us flag mistakes. Past reports have helped us improve the accuracy of the model substantially.
How to use ORES
ORES is a web API which means it results are more suitable for tools than humans. You give ORES a revision ID and ORES responds with a prediction. Right now, ORES knows how to predict whether a Wikidata edit will need to be „reverted“. You ask ORES to make a prediction by placing the revision ID to score in the address bar of your browser. For example:
The are already some tools available for Wikidata that integrate ORES with the user interface. We recommend trying the ScoredRevisions gadget. We hope that, by posting this blog, more tool devs will learn of ORES and start experimenting with the service.
Not just quality control
We encourage you to think creatively about how you use the signal that ORES provides. As we have said, damage detection scores can be used to highlight the worst, but the scores can also be used to highlight the best. Recent research shows how we can flip the damaging predictions upside down to detect good-faith newcomers and direct them towards support & training (Halfaker, Aaron; et al.„Snuggle: Designing for efficient socialization and ideological critique“(PDF). ACM CHI 2014). While newcomer retention on Wikidata is still high compared to the large Wikipedias, we’re following a similar trend. Since Wikidata is still a young Wikimedia project, we have an opportunity to do better during this critical stage by making sure that we both efficiently revert vandalism and route good-faith newcomers to support and training.
I forgot last year to re-affirm (2014; includes links to previous years’ public domain day posts):
Unless stated otherwise, everything by me, Mike Linksvayer, published anywhere, is hereby placed in the public domain.
With that out of the way, I want to question the public domain of works that were subject to copyright upon publication but no longer are due to expiration of the term of copyright. Public Domain Day celebrates such works no longer subject to the private censorship regime as of January 1 each year, and mourns the lack of such work in some jurisdictions such as the United States (none 1999-2019, unless another retroactive extension pushes the date back further).
Copyright is unjust. Works created under that regime are tainted. Extreme position: the disappearing of works subject to copyright is a good, for those works are toxic for having been created under the unjust regime. Compare with born free works, initially released under a free/open license (i.e., creators substantially opted out of regime). Even born free works were created in the context of an unjust regime but we have to start somewhere.
Born free works are a start at re-shaping the knowledge economy away from dependence on the unjust regime, a re-shaping which is necessary to transfer prestige and power away from industries and works dependent upon the unjust regime and towards commons-based production. Works falling out of copyright due to expiration do not tilt the knowledge economy toward commons-based production. Worse, copyright-expired works distract from the urgent need to produce cultural relevance for born free works.
Celebrating works falling out of copyright celebrates the terrible “bargain” of subjecting knowledge to property regimes (harming freedom, equality, and security) in order to incent the over-production of spectacle. Compare with born free works, which provide evidence of the non-necessity of subjecting knowledge to freedom infringing regimes.
Note the title of this post starts with “question” rather than “against” — my aim is not really to claim that copyright mitigation through measures such as limited terms of restriction is bad (as noted above, such a claim really would be extreme, in the sense of being very difficult to justify) but to encourage prioritization of systematic repair through commons-based production. There are many (but not nearly enough) people with commitments to copyright mitigations, limited terms in particular, and use of term expired works even more particularly. Further, there presumably will be some attempt at further retroactive extension in the U.S. before 2019, and though I will probably complain about non-visionary rear guard actions, I don’t doubt that stopping bad developments such as further retroactive extension is in the short term relatively easy and should be done.
Thus this “questioning” leads me to merely want:
Copyright mitigations to be useful for commons-based production (limited terms are such; contrast with many mitigations which make using works possibly subject to copyright somewhat less costly but not in a way which is useful for commons-based production).
Commons-based production efforts to actually take advantage of newly unrestricted works to a greater extent than freedom infringing industries do. Wikimedia projects (especially Wikisource and Wikimedia Commons, with cultural relevance via Wikipedia) do an excellent job, but meeting this very tall order probably requires many additional hugely successful initiatives that are able to create cultural relevance for free works, including works falling into the public domain and works building on such.
Making repair part of knowledge policy discourse, at least on the part of liberalizing reformers: a debate about mitigation or opposition to expansion is always an opportunity to position and advocate for repair; that is favoring commons-based production. This could lead to contemplation of what I’d consider a genuine political bargain: allow works subject to copyright to remain so but favor commons-based production for new works.
A subject like psychiatry is all too often ignored, neglected and not a topic people spend equal effort on. When people rely on Wikipedia as a primary source for information it is vital that they find concepts like recovery. It is what gives them hope. It is what tells them that even when they suffer from psychiatric ailments there is hope. They can learn to manage their situation, they can become more integrated, achieve the goals they strive for.
Mrs Boevink is a leading light in the Netherlands. She has been pivotal in a movement that empowers people to pick up their lives and make the most of it. She is a published author, published often and published with many others. A person as notable deserves an article in Wikipedia but for now, Wikidata will do.
When people talk about quality, it is easily abstract. Mrs Boevink was instrumental in the development of the HEE method. It has been proven as an effective intervention of aiding people in their process of recovery. This "intervention" has no link yet in Wikidata and, I have no clue how to indicate that it has been certified as such.
By including information like this, it is easy to learn about recovery, about HEE, about Mrs Boevink and, it enables people to inform themselves. Insurance companies make it really hard to fund the people best positioned to provide HEE. They manipulate information by informing the public just so. Such manipulations become hard when Wikipedia provides its NPOV about this topic.
Both Wikipedia and Wikidata are works in progress. They do improve their quality and it does not take a genius to understand how Wikipedias in a language are manipulated. It takes some sober reflection to understand that an existing lack of information enables manipulation as easily. Subjects that are associated with discrimination, stigmatisation are exactly where people are vulnerable and where basic, encyclopaedic information is really needed. Thanks, GerardM
Dr. James Heilman is a founder of the WikiProject Med Foundation, the first and only formal, topic-based organization to grow out of a Wikipedia WikiProject (corrected). In June 2015, he was elected to the Board of Trustees. The election had a substantially higher turnout than any prior Wikimedia election; he earned the second-most votes of any candidate in the organization’s history. That election was regarded by many (including myself, as well as Wikipedia historian Andrew Lih in a New York Times op-ed piece) as a referendum on recent actions of the Wikimedia Foundation, including releasing an unprecedented software feature in 2014 that undercuts elected volunteers’ ability to edit pages, and ignoring an open letter, authored by myself and signed by more than 1,000 people, requesting actions related to that software feature.
Wikipedia is arguably the most widely read publication in history; but its Board of Trustees has very little accountability. Only three of the ten Trustees are popularly elected; all three of the incumbents were voted out in the 2015 election.
Now you can locate old, parked or museum-piece locomotives in the historic places map if they have been tagged and mapped appropriately in OSM.
User baditaflorin created a cheat sheet with the most important shortcuts for JOSM that would make editing with JOSM more efficient.
Felix Delattre wants to map the 42 bus routes in Managua and Ciudad Sandino (Nicaragua) in OSM and asks for help.
Marc Zoutendijk points out that there are numeours examples of meaningless values in the “source” tag on OSM. In his series on seldom used tag values he provides some amusing examples but concludes that the errors are mainly harmless as they are a minority and seldom used.
User JinalFoflia from Bangalore proposes a re-tagging of 0.12% picnic_sites. A lot of incumbents are answering.
She has re-tagged the picnic sites with leisure=picnic_site and amenity=picnic_site tags to tourism=picnic_site tag, the changesets related to this can be found in this diary entry.
Richard Cantwell states that OSM is going great in Ireland. His PDF article in Geoconnexion suggests that OSM “has become a shining example of how crowdsourcing can work”.
The board of the OSMF on request from the Data Working Group have indefinitely banned a user who threatened another mapper. Appeals against this decisions will only be considered after police and legal investigations are completed.
Tom Taylor (active HOTtie) passed away on December 24th.
BushmanK in his dairy asks what comes first, Map or Database? His post also asks should we tell newbies the truth?, but the principles he discusses extend beyond “newbies” to how OSM in general is explained.
Muindialis have described how they created a multi-scale topographic map of the world with an OSM main features overlay. A demonstration of the client application can be viewed here.
Markus Mayr answers the question: Why tree imports? To produce beautiful maps.
Mapzen has a Christmas present for all the users who have asked for a batch geocoder
Calling all couch-mappers at the end of the year: Please click once, sit back and enjoy the world that you have mapped just over and over again with the finest details.
Humitos and Ella Quimica have mapped a restaurant on the Peruvian coast, shared the knowledge about OpenStreetMap, programming with the owner and were rewarded with a complimentary ceviche dish. weeklyOSM wonders whether this activity falls into the category of “paid mapping”
Tools for getting data out of OpenStreetMap and into Desktop GIS.
Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please don’t forget to mention the city and the country in the calendar.
This weekly was produced by Alexandre Magno, Rogehm, SB79, Bogus, Manfred, escada, jinalfoflia, mgehling, stephan75, wambacher.
Can one apply the life hacking ethos of analyzing and optimizing to the
romantic sphere? Some do! If you read Amy Webb, Chris McKinlay, and Val
Aurora you will see a focus on two things in particular: matching and
selection.
Webb and McKinlay hacked dating systems to figure out how to improve
their profiles. Neither broke into dating services to purloin data, but
they did use fake accounts. Webb realized that pasting her resume into
JDate was not working.1 So, she created fake profiles of ten men
she’d like to date and used these puppets to learn the successful
strategies of her competitors. She found that successful women’s
profiles were short, nonspecific, used optimistic language, and that the
photos were well done and showed a bit of skin (e.g., shoulders). When
she applied this to her own profile, she became the most popular person
on the site.
On OKCupid, a Q&A based matching service, McKinley also used fake
profiles, this time collect answers to common questions from thousands
of women.2 He then grouped 20,000 women into seven clusters based on
their responses to the most popular questions. Example clusters included
women who enjoyed their pets, those who had tattoos, and those who were
religious. He targeted two of the clusters of greatest interest with
custom profiles. MicKinley selected the 500 most popular questions with
both groups, answered honestly, and used an algorithm to optimize the
ranking of candidates. He then wrote a script to visit the top-ranked
women’s profiles; OkCupid automatically notifies users of such visits,
and many women reciprocated and sent messages.
Of course, hacking a dating system to find hundreds of possibly well
matched candidates introduces another problem: how to choose? Depending
on how you look at it, this is where Poulsen’s selection strategy of
“brute force” failed. He went on eighty-eight dates before finding
someone he would begin a relationship with! Conversely, Webb and Aurora
showed that selection, too, is amendable to analysis and optimization.
Webb devised a two-tier system of traits, weighted by importance, with a
high threshold for whom she would date. In fact, she had this scoring
system before her data mining and analysis, but she could find anyone
who met her 700 point minimum. But with a successful profile in hand,
and many potential candidates, she was eventually contacted by an 850,
and he would eventually become her husband. Like Webb, Val Aurora
recommended matching hacks, like getting professional pictures
taken.3 But she seems to have exceeded Webb in the sophistication of
her selection. Aurora also developed a spreadsheet, which she publicly
shared for other people to use.
In my blog post about how to have more fun online dating, I mentioned
the spreadsheet I made to help with dating. Yes, a spreadsheet. For
dating. Because when you’re feeling romantic, you just want to fire up
Excel and input some data! Nothing like an evening of writing formulas
to get you in the mood for love!4
Both McKinlay and Webb went on to write books about their experiences
and strategies, both of which end happily.5 Aurora, too, is now in a
relationship.
However, I am struck that these approaches of optimizing matching and
selection presume a good fit. “The one” is out there, they just need to
be found. An alternative theory is that people should grow towards one
another in a relationship. This is seen in stories of arranged
marriages, as exemplified by Aziz Ansari’s own parents in his book
Modern Romance.6 Aurora alludes to this in her reflection of why
she created the spreadsheet in the first place.
My original intention for making this tool was to make me more aware
of and responsive to my “dealbreakers” – things that meant a
relationship wasn’t possible. But while making and using this tool, I
discovered that my own ideas about what was a “dealbreaker” were
frequently wrong. I am now in a happy relationship with someone who
had six of what I labeled “dealbreakers” when we met. And if he hadn’t
been interested in working those issues out with me, we would not be
dating today. But he was, and working together we managed to resolve
all six of them to our mutual satisfaction. Talking to my friends, I
found that this was a pretty common experience.7
This is true for me. I met my partner of fifteen years during a rather
liberal period. At the time, I decided to enjoy meeting people instead
of trying to find a relationship. Hence, her (occasional) smoking wasn’t
the deal breaker it had been in the past—a habit she, fortunately, gave
up.
The importance of working on issues, the practice of building and
maintaining a relationship, is this also amendable to hacking? The only
example of this I’ve found is David Finch’s Journal of Best Practices:
A Memoir of Marriage, Asperger’s Syndrome, and One Man’s Quest to be a
Better Husband.8 This isn’t so much about the mutual optimization of
a relationship, but hacking his approach to it. And while his best
practices aren’t “big data” quantitative, they are analytical, including
the “final best practice”: “Don’t make everything a best practice.”9
For many life hackers, the hacking mindset seems to give way at some
point. Is this the line where relationships becomes more an art than a
science? Is hacking love less appropriate with an N of 1?
David Finch, Journal of Best Practices: A Memoir of Marriage,
Asperger's Syndrome, and One Man's Quest to Be a Better Husband
(New York: Scribner, 2012). ↩
Our story on John Oliver’s ‘fowl’ jokes was one of the blog’s most popular posts of the year. Photo by TechCruch, freely licensed under CC BY 2.0.
On December 15, the Wikimedia Foundation’s Victor Grigas, in collaboration with several Wikimedia community members, released #Edit2015, a look back at the wonder, pain, and triumph that happened in the world over the last year.
With that in mind, we here at the Wikimedia Blog decided to take on a decidedly narrower scope: what stories were the most popular on the blog this year? What did our nearly 700 thousand unique visitors, generating over 1.1 million views, flock to?
The top five is decidedly heavy on Foundation announcements—chief among them being actions we’ve taken to protect our users.
We sued the US’s National Security Agency to challenge its mass surveillance practices, an announcement that had over 70,000 views, by far the most views for the blog this year. Jimmy Wales said at the time that “We’re filing suit today on behalf of our readers and editors everywhere … Surveillance erodes the original promise of the internet: an open space for collaboration and experimentation, and a place free from fear.”
You can follow our Legal department’s frequent updates with the Wikimedia v. NSA category; we are currently appealing Judge T.S. Ellis, III’s ruling that we lack standing to bring the challenge.
We believe encryption makes the web stronger for everyone. In a world where mass surveillance has become a serious threat to intellectual freedom, secure connections are essential for protecting users around the world. Without encryption, governments can more easily surveil sensitive information, creating a chilling effect, and deterring participation, or in extreme cases they can isolate or discipline citizens. Accounts may also be hijacked, pages may be censored, other security flaws could expose sensitive user information and communications.
”
–Yana Welinder, Victoria Baranetsky, and Brandon Black
Other announcements included the disclosure that 381 accounts on the English Wikipedia had been blocked for so-called ‘black hat’ editing (#3), and the release of a new artificial intelligence service (“ORES”) that “highlight[s] potentially damaging edits, [allowing] editors to triage them from the torrent of new edits and review them with increased scrutiny” (#5). This development is quite exciting; Aaron Halfaker, the lead researcher, hopes that the new automated service will conversely increase the number of human editors by reducing their workload.
Pop culture made several appearances as well. Our post on John Oliver’s chicken-related (“fowl”) jokes and their impact on several Wikipedia articles was amusing enough to land at #9 on the list. In a lengthy diatribe about the plight of independent chicken farmers vis a vis the major chicken producers they contract with, he asked his audience to vandalize the Wikipedia pages of congressional representives who had supported the producers—”unless they want [a “chicken f****r”] label to follow them for the rest of their lives, they might want to think [about their votes], because “chicken f****r” accusations do not come off a Wikipedia page easily. Or if they do, they tend to go right back up.”
The photographs that researchers believe feature the first-ever smile and photobomb—two of many images that the National Library of Wales donated to Wikimedia Commons—was intriguing enough to close out our list.
As an honorable mention, we would be remiss in leaving out “My life as an autistic Wikipedian” (July 31; #12)—a remarkably poignant account from editor and Wikimedia Foundation staff member Guillaume Paumier.
Thank you for reading the Wikimedia Blog; we hope for an even more successful 2016.
Ed Erhart, Editorial Associate Wikimedia Foundation
Roughly a year and a half ago I started writing a collection of PHP libraries to make interaction with the Mediawiki API and extension APIs super easy. The base library has just made it to 2.0.0!
This library is the first of the addwiki collection that has actually reached 1.0.0 let alone 2.0.0! All of the other libraries, including wikibase-api and mediawiki-api are still a work in progress with lots to be added. The next likely to be released will be the wikibase-api library once I try to also add async functionality there!
A snippet of the async functionality added in mediawiki-api-base can be seen below:
// Get an API object and login
$api = MediawikiApi::newFromPage( 'https://en.wikipedia.org/wiki/Berlin' );
$api->login( new ApiUser( 'username', 'password' ) );
// Initiate each request but do not block
$requestPromises = array(
'Page1' => $api->postRequest( FluentRequest::factory()->setAction( 'purge' )->setParam( 'titles', 'Page1' ) ),
'Page2' => $api->postRequest( FluentRequest::factory()->setAction( 'purge' )->setParam( 'titles', 'Page2' ) ),
'Page3' => $api->postRequest( FluentRequest::factory()->setAction( 'purge' )->setParam( 'titles', 'Page3' ) ),
);
// Wait on all of the requests to complete.
$results = Promise\unwrap( $requestPromises );
// You can access each result using the key provided to the unwrap function.
print_r( $results['Page1'], $results['Page2'], $results['Page3'] )