Platform engineering

Help Wikimedia squash software bugs

Posted by Valerie Juarez on March 19, 2013

Help us squash bugs!

Reporting a new software bug is only the first step towards fixing the issue; the sorting and prioritization steps come next, usually referred to as “triage”.

Wikimedia’s Bug Squad has started hosting bi-monthly Bug Days as a part of our QA weekly goals. During a bug day, bug triagers and developers sort bug reports, usually from a specific component or reports fitting a specific criteria. Triaging reports includes testing reports to confirm they are valid, prioritizing reports so developers can efficiently address pressing issues, and identifying and marking duplicate reports to avoid duplicated work.

Our next Bug Day is today (March 19th), and we’ll work on bug reports in Mediawiki extension > LiquidThreads. For more information on the event and how to participate, check out the event page.

We have already held three Bug Days. The first Bug Day on January 29, 2013 focused on reports that had not been changed for over a year. We retested many reports to see if they were still valid for newer versions of MediaWiki and MediaWiki Extensions. We also requested more information from reporters whose reports needed clarification. We addressed 30 reports out of about 170. Because of the recent Gerrit upgrade, our bug day on February 19, 2013 addressed bug reports in the Git/Gerrit component. Our focus was addressing upstream issues in Gerrit that may have been fixed with the update. For upstream bugs that were not fixed with the update, status reports were left on our corresponding Bugzilla reports. We addressed about 24 bug reports out of 70 open reports in Git/Gerrit. Our latest bug day focused on General/Unknown reports in the MediaWiki product, which is known to be catch-all for bug reports. 38 reports were triaged. Many were retested and confirmed, prioritized, and moved out of General/Unknown into their proper components.

You can contribute to the Wikimedia Foundation by triaging bug reports. Follow the Calendar of QA events, to keep up with upcoming Bug Days and other testing events. You can also find an announcement of upcoming Bug Days on Bug Management’s Triage page. Bug Days are not just for bug triagers; developers are welcome to join and help by ‘taking’ reports and submitting bug fixes. Fixing bugs is a great way for new volunteer developers to get started, and joining a Bug Day would be a great way to find a few bugs to fix.

Bug Days support the Wikimedia Foundation by ensuring the quality of bug reports and bringing focus to reports that may not have had attention in a while. It is difficult for developers to keep up with the number of bug reports that reside and move into Bugzilla every day. Bug Days and bug triaging can help developers efficiently address these issues.

Valerie Juarez, Bug Management Intern

Categories: MediaWiki, Platform engineering, Technology | 8 Comments »
Tags: bugs, Bugzilla

How to create a good first Bug Report

Posted by Valerie Juarez on March 18, 2013

A software bug is an error or flaw in a program that produces incorrect or unintended results. Developers work hard to produce software that looks and works as intended, but bugs are as inevitable as death and taxes. The Wikimedia Foundation uses the bug tracker Bugzilla as the system for users to report bugs they encounter while using MediaWiki and Wikimedia sites.

Can you make it happen again?

So, you think you’ve run into a bug and want to report it to Wikimedia’s bug tracker. The first thing to do is to try to reproduce the bug with concrete steps. These steps help developers reproduce the bug, which allows them to investigate the source of the issue. If the bug does not appear consistently, you can still file it, and developers will likely ask you questions to gather more information about the bug.

Has it already been reported?

Life cycle of a software bug

Once you have attempted to replicate the bug, you can log in or register with Bugzilla. As your registered email address will be visible to other logged in users, consider creating a free email account to register with Bugzilla. Bugzilla notifies you of changes to your bug report through email. Before you file a bug report, it would be helpful to see if a report is already filed about the bug you found. This reduces the chances of people duplicating work on the same issue. When you file a bug, Bugzilla checks for duplicates, but you can also spend time independently searching.

If you find a similar report, see if you can add more information than what was originally reported. For example, the original report may be from an older version of MediaWiki, so it would be helpful for you to add a comment that the bug still appears, and to list the newer version, if you can. Maybe the original version does not have steps to reproduce; in that case, add a comment that your listed steps can reproduce the bug. If you do not find a duplicate report, you can go on to filing a new bug. You may end up unintentionally filing a duplicate report, and that’s ok. It’s better to report a second time than not at all.

Where does it belong?

When filing a bug report, the first thing you’re asked to do is choose a product to file the bug in. These products represent software projects, and it can be tricky to choose the right one. You have to consider what sort of error you have. Does the error seem to be with the MediaWiki software itself? The error could be in a MediaWiki extension.

Once you select a product, you’re brought to the page where you enter information about the bug. Here you can go through the components associated with the product you chose and read their descriptions. If the bug doesn’t seem to fit into any of those components, go back and select another product and look through those components. If you’re still not sure or you’re in a hurry, file the bug in the MediaWiki product and the General/Unknown component (MediaWiki > General/Unknown).

If the problem doesn’t seem to be with the MediaWiki software but with the configuration of a Wikimedia site, you should file the bug in Wikimedia > General/Unknown. Filing a bug in the right product and component helps developers address the bug sooner, because developers working on that specific component usually monitor incoming bug reports. However, bug triagers will move misplaced reports to the right product and component, so do not worry.

What does it say?

Now you should write a summary of the bug you found. Be specific in writing your summary. Vague, generic summaries like “Does not work” or “Feature request” are not helpful to get a quick idea what your report is about. Your summary is what developers, bug triagers, and other reporters will see when they are looking through bug lists in a component or that have been returned as search results.

As stated above, when you fill in a summary, Bugzilla lists possible duplicates. If you see a similar report, follow the steps above and comment if you have some new information to provide. If you don’t see a duplicate at this point then you can continue on and fill in a description. The description is where you can elaborate on the problem described in the summary. Here you list your steps to reproduce, what you expect to happen, and then what actually happens. You can also list other details like what browser you’re using if it seems like it is relevant to the report. Clicking the Add an Attachment button below the description will allow you to attach a file, e.g. a screenshot, to help enhance the quality of the report. Once you feel you have described the problem sufficiently, you can click Submit Bug.

You’re done!

Alright! You filed your bug! It may not be perfect, but that’s no problem. There is always somebody to help and improve them. You can look forward to receiving bug mail to keep you up to date on changes, which includes status changes and comments, with your report. Check your user preferences in Bugzilla to view and update what changes trigger an email. You may get comments requesting more information to help diagnose the issue. If you want to see an example of a developer fixing a bug, check out this video of a bug getting fixed.

Valerie Juarez, Bug Management Intern

Categories: Platform engineering, Technology | 9 Comments »
Tags: bugs, Bugzilla

What Lua scripting means for Wikimedia and open source

Posted by Sumana Harihareswara on March 14, 2013

Yesterday we flipped a switch: editors can now use Lua, an innovative programming language, to generate sections of wiki pages on all our sites. We’d like to talk about what this means for the open source community at large, for Wikimedians, and for our future.

Why we did this

In the old wikitext templating system, this is part of Template:Citation/core. Any Wikipedia article citing a source will cause our CPUs to run through this instructionset. With Lua, we’ll be able to replace this.

When we started digging into the causes of slow pageload times a few years ago, we saw that our CPUs ate a lot of time interpreting templates — useful bits of markup that programmatically told MediaWiki to reuse little bits of text. Templates are everywhere on our sites. Good Wikipedia articles heavily use the citation templates, for instance, and you’ve seen the ubiquitous infoboxes on every biography. In fact, editors can write code to generate substantial portions of wiki pages. Hit “View source” sometime to see how.

But, because we’d never planned for wikitext to become a programming language, these templates were terribly inefficient and hacky — they didn’t even have recursion or loops — and were terrible for performance. When you edit a complex article like Tulsi Gabbard, with scores of citations, it can take up to 30 seconds to parse and display the page. Even as we worked to improve performance via caching, query profiling, new hardware, and other common means, we sometimes had to advise our community to remove functionality from a particular template so pages would render faster.

This wouldn’t do. It was a terrible experience for our users and especially hard for our editors, who had to wait for a multi-second roundtrip after every “how would this page look?” preview.

So our staffers and volunteers worked on Scribunto (from the Latin for “they shall write”), a MediaWiki extension to allow editors to embed Lua scripts instead of wikitext for templating. And volunteers and Foundation staffers have already started identifying pages that are slow to render and converting the most inefficient templates. We have 488,731 templates on English Wikipedia alone right now. The process of turning many of those into Lua scripts is going to affect everyone who reads our sites — and the Scribunto project has already started giving back to the Lua community.

Us and Lua

For instance, our engineer Brad Jorsch wrote mw.ustring.lua, a Unicode module reusable by other Lua developers. This library is good news for people who write templates in non-Latin characters, and for anyone who wants a version of Lua’s standard String library where the methods operate on characters in UTF-8 encoded strings rather than bytes.

And with Scribunto, we empower those frustrated Wikimedians who have been spending years breaking their knuckles making amazing things in wikitext; as they learn how much easier it is to script in Lua, we hope they’ll be able to use those skills in their hobbies, schools, and workplaces. They’ll join forces with the graduates of Codecademy, World of Warcraft, and the other communities that teach anyone to program. New programmers with basic knowledge of computer science who want to do something real with their new skills will find that Lua scripting on Wikimedia sites is a logical next step for them. Our implementation only differs slightly from standard Lua.

And since Scribunto is an extension that any MediaWiki administrator can install, we hope the MediaWiki administrators out there will enjoy using Lua to more easily customize their wikis for their users.

Structured data and new ways to display it

Scribunto lays the foundations for exciting work to come when the Wikidata structured data project comes further online (the Wikidata interface is still in development and being deployed in phases). We know that Lua will be an attractive way to integrate Wikidata information into pages, and we hope a lot of (currently) unstructured data will get structured, helping new applications emerge.

Now that Lua and Wikidata are more mature, we can look forward to enabling more functionality and plugging in more libraries. And as we continue deploying Wikidata, people will make interesting improvements that we currently can’t predict. For instance, right now, each citation is hard to programmatically dissect; the Cite template takes many unstructured parameters (“author1,” “author2,” etc.) We structure these arguments by convention, but the data’s not structured as CS folks would have it, and can’t be queried via APIs, remixed, and so on.

A screenshot of part of the new Coordinates module, written in Lua by User:Dragons flight. Note that, with Lua, we can actually use proper conditionals.

But in the future, we could have citations stored in Wikidata and then put together onto article pages using Lua, or even assembled into other various reasonable forms (automatically generated bibliographies?) using Lua, and it will be more easy for Zotero users to discover. That’s just one example; on all our sites over the next few years, things will change from the status quo in a user-visible way. The old math and geography templates were inefficient and hard to hack; once rewritten, they’ll run faster and perhaps editors will use them more. We might see galleries, automatic data analyses, better annotated maps, and various other interesting processes and queries embedded in Wikimedia pages.

Open for change

Wikimedians have been writing wikitext templates for years, and doing hard, astounding, unexpected things with them for readers to enjoy. But the steep learning curve drove contributors away. With Lua, a genuine programming language, people now have a deeper and more useful foundation to build upon. And for years, power users on our sites have customized their experiences with JavaScript/CSS Gadgets and user scripts, but those are basically one level above skins preferences; other people won’t stumble upon your hacks in the process of reading an article.

So, now is the first time that the Wikimedia site maintainers have enabled real coding that affects all readers. We’re letting people program Wikipedia unsupervised. Anyone can write a chunk of code to be included in an article that will be seen by millions of people, often without much review. We are taking our “anyone can edit” maxim one big step forward.

If someone doesn’t like the load time of a webpage, they can now actually improve it themselves. Just as we crowdsourced building Wikipedia, now we’re crowdsourcing bits of infrastructure improvement. And this kind of massively multiplayer, crowdsourced performance improvement is uniquely us.

Wikitext templates could do a lot of things, but Lua does them better and faster, and now mere mortals can do it. We’re aiming to help our users learn to program, to empower themselves, and to help each other and help our readers.

We hope you’ll join us.

Sumana Harihareswara, Engineering Community Manager

Categories: Deployments, Platform engineering, Technology | 16 Comments »
Tags: lua, open-source, performance, templates, wikidata

New Lua templates bring faster, more flexible pages to your wiki

Posted by Sumana Harihareswara on March 11, 2013

Starting Wednesday, March 13th, you’ll be able to make wiki pages even more useful, no matter what language you speak: we’re adding Lua as a templating language. This will make it easier for you to create and change infoboxes, tables, and other useful MediaWiki templates. We’ve already started to deploy Scribunto (the MediaWiki extension that enables this); it’s on several of the sites, including English Wikipedia, right now.

You’ll find this useful for performing more complex tasks for which templates are too complex or slow — common examples include numeric computations, string manipulation and parsing, and decision trees. Even if you don’t write templates, you’ll enjoy seeing pages load faster and with more interesting ways to present information.

Background

The text of English Wikipedia’s string length measurement template, simplified.

MediaWiki developers introduced templates and parser functions years ago to allow end-users of MediaWiki to replicate content easily and build tools using basic logic. Along the way, we found that we were turning wikitext into a limited programming language. Complex templates have caused performance issues and bottlenecks, and it’s difficult for users to write and understand templates. Therefore, the Lua scripting project aims to make it possible for MediaWiki end-users to use a proper scripting language that will be more powerful and efficient than ad-hoc, parser functions-based logic. The example of Lua’s use in World of Warcraft is promising; even novices with no programming experience have been able to make large changes to their graphical experiences by quickly learning some Lua.

Lua on your wiki

As of March 13th, you’ll be able to use Lua on your home wiki (if it’s not already enabled). Lua code can be embedded into wiki templates by employing the {{#invoke:}} parser function provided by the Scribunto MediaWiki extension. The Lua source code is stored in pages called modules (e.g., Module:Bananas). These individual modules are then invoked on template pages. The example: Template:Lua hello world uses the code {{#invoke:Bananas|hello}} to print the text “Hello, world!”. So, if you start seeing edits in the Module namespace, that’s what’s going on.

Getting started

The strlen template as converted to Lua.

Check out the basic “hello, world!” instructions, then look at Brad Jorsch’s short presentation for a basic example of how to convert a wikitext template into a Lua module. After that, try Tim Starling’s tutorial.

To help you preview and test a converted template, try Special:TemplateSandbox on your wiki. With it, you can preview a page using sandboxed versions of templates and modules, allowing for easy testing before you make the sandbox code live.

Where to start? If you use pywikipedia, try parsercountfunction.py by Bináris, which helps you find wikitext templates that currently parse slowly and thus would be worth converting to Lua. Try fulfilling open requests for conversion on English Wikipedia, possibly using Anomie’s Greasemonkey script to help you see the performance gains. On English Wikipedia, some of the templates have already been converted — feel free to reuse them on your wiki.

The Lua hub on mediawiki.org has more information; please add to it. And enjoy your faster, more flexible templates!

Sumana Harihareswara, Engineering Community Manager

Categories: Deployments, MediaWiki, Platform engineering, Technology | 8 Comments »
Tags: lua, performance, templates

Introducing MediaWiki community metrics

Posted by Quim Gil on December 10, 2012

MediaWiki contributors at work in Bangalore, India.

Did you know?

The MediaWiki community includes about 506 code contributors active in the past 12 months. A total of 884 developers have contributed code to Wikimedia projects since the first MediaWiki commit in April 2003. The total sum appears to be 5,6 million lines of code written for the MediaWiki core engine, extension, mobile applications, server tools, etc.
42,016 reports have been filed to MediaWiki’s bug reporting tool since its opening in August 2004. From those, 20,057 have been fixed, 8619 are still open and 2315 never got an answer from anyone.
Brion Vibber is the top reporter with 881 bugs filed. Roan Kattouw is the top bug fixer identified with 667 reports resolved, although no less than 12,149 have been fixed by “Nobody”.

Where are these numbers coming from? This data (and more) are now published in the monthly MediaWiki community metrics reports. The latest issue covers November.

Community metrics are helpful to understand the size and scope of an open source project. Since most activities are public, it is possible to retrieve plenty of raw data. The problem is to decide what data to look for and why, and how to process and interpret it.

In our case, a short-term motivation is to describe all the activities going on in the areas of software development, testing and documentation. What projects are doing well? What projects need a push? Where are the spots where new contributors can make their first steps?

In the past months, we have got three new sources of MediaWiki community data:

Gerrit-stats is a prototype developed by the Wikimedia Foundation to retrieve data from our code repositories (read more).
The Wikimedia stats in Ohloh have been very useful to pool the data of hundreds of projects hosted in our own repositories and GitHub.
The Wikipedia Signpost has published a useful series of reports about code review times, also based on actual data.
The Spanish company Bitergia has just started generating automatic analysis reports about MediaWiki as a showcase of their set of open source tools.

Our metrics reports are still a work in progress. Do you find these numbers helpful? What story do you think they tell? What other metrics would you like to see included? Your feedback and help is welcome!

Quim Gil
Technical Contributor Coordinator (IT Communications Manager)

Categories: MediaWiki, Platform engineering, Technology | Comments Off
Tags: metrics

Lead our development process as a product adviser or manager

Posted by Sumana Harihareswara on November 21, 2012

Would you like to decide how Wikimedia sites work? You can be a product adviser or a product manager, as a volunteer, and guide the work of Wikimedia Foundation developers.

What is a product manager? As Howie Fung, the head of WMF’s product team, recently explained, when we create things on our websites or mobile applications that readers or editors would use,

there are a basic set of things that need to happen when building a product….

Decide what to build
Design it
Build it
Measure how it’s used (if you want to improve the product)

Roughly speaking, that’s how we organize our teams when it comes to building features. Product Managers decide what features to build, Designers design the feature, Developers build the feature, and Analysts measure how the features perform.

So, a product manager works with the designers, developers, and analysts to identify and solve user problems, while representing the users’ point of view. As Fung put it,

there should be someone responsible for ensuring that the various ideas come together into a coherent whole, one that addresses the problem at hand. That responsibility lies with the Product Manager.

Why do you need volunteers? While the Wikimedia Foundation has hired full-time product managers for the most pressing features our engineers are developing, that leaves us with several ongoing projects that don’t get enough product management. The WMF needs your help to: track the progress of these improvements; comment on tasks or proposals; reach out to the Wikimedia reader and contributor communities to ask for feedback via wikis, mailing lists, and IRC; help developers see what users’ needs are; and set priorities on bugs and features, thus deciding what developers ought to work on next. Here are a few of those activities:

File storage, especially regarding Wikimedia Commons. Engineers have been trying to improve our storage system using the Swift distributed filestore but need your help to make sure we do it right.
Prioritizing shell requests. When Wikimedians request configuration changes to the wikis, systems administrators can use help understanding which of them are urgent and which of them don’t actually have the necessary consensus.
Operations requests from the community. It’s not just shell requests. Right now we have 93 open bugs requesting attention from our systems administrators, and those requests could use prioritization and organization.
Data dumps. Wikimedia offers many ways to download Wikimedia data at dumps.wikimedia.org. Your help would improve tools related to import, or conversion to SQL for import, to make it easier for others to use these datasets.
Wikimedia Labs. The sandboxes in Wikimedia Labs will host bots, tools, and test and development environments; can you organize the advice on the roadmap and what those communities will need?
Admin tools development: WMF engineer Chris Steipp works on tools to help fight vandalism and spam, including major bugfixes and minor feature development to make lives of stewards and local sysops a little easier. What’s most urgent on his TODO list?

Volunteer product manager Jack Phoenix put together a detailed roadmap that was incredibly useful to guide the work of Wikimedia engineers on features like anti-spam tools.

Has anyone tried this? The first Wikimedia volunteer product manager was User:Jack Phoenix, who created the admin tools roadmap this summer, detailing a rationale for what should be done when. Jack originally signed up because:

this is just something that I know pretty well and hence why I want to be a part of this project and the team….

I want editors to be able to focus on editing — content creation, tweaking, fine-tuning… — instead of having to play whack-a-mole against spambots and vandals all the time. I have plenty of experience in playing whack-a-spambot, and I’m hoping to use that experience to improve WMF sites and also third-party sites…

It’s perfectly fine for the role of volunteer product manager to be a time-limited engagement. For example, Jack did amazing work for three months creating the roadmap. In retrospect, Jack Phoenix has estimated that to manage a product as broad as the admin tools suite, and to do it well, would take at least an hour per day if not two or three; due to time constraints, Jack has now stepped down from the role and is seeking a successor. Thanks for laying the groundwork, Jack! While we’re sad to see Jack go, we’re thankful for the roadmap and we continue to benefit from it.

If that kind of commitment sounds too burdensome, consider becoming a volunteer product adviser first. You’d do some of the same tasks as a product manager, to help check that the feature we’re building actually meets Wikimedians’ needs, and give your own opinion as well. But there wouldn’t be ownership or leadership attached, and the time commitment wouldn’t be as strong.

What next? The goal of the Engineering Community Team is to have at least two Wikimedia volunteers engaged in product management work by the end of December. Talk with us and check out whether this is something you’d like to try!

To get involved, contact Sumana Harihareswara or Guillaume Paumier.

Sumana Harihareswara
Engineering Community Manager

Categories: Platform engineering, Technology | 3 Comments »

Recovery of broken Gerrit repositories

Posted by Chad Horohoe on September 7, 2012

As some of you may have noticed, yesterday our engineering team noticed that 16 of our Gerrit repositories were very badly broken. Their branches and tags all seemed to have vanished, along with their configuration (this is stored in a special branch on the repository itself). All of the repositories except one have been restored to their state as of about midnight UTC on Thursday, September 6. What follows is an in-depth analysis as to what happened and how I fixed it, along with some commentary about what I learned along the way.

(more…)

Categories: Outage, Platform engineering | Comments Off
Tags: gerrit, git

Meet the Analytics Team

Posted by Diederik van Liere on July 25, 2012

Over the past few months, the Wikimedia Foundation has been gearing up a variety of new initiatives, and measuring success has been on our minds. It should come as no surprise that we’ve been building an Analytics Team at the same time. We are excited to finally introduce ourselves and talk about our plans.

The team is currently a pair of awesome engineers, David Schoonover and Andrew Otto, a veteran data-analyst, Erik Zachte, and one humble product manager, Diederik van Liere. (We happen to be looking for a JavaScript engineer — if beautiful, data-driven client apps are your thing – or you know someone, drop us a line!)

We’ve got quite a few projects under way (and many more ideas), and we’d like to briefly go over them — expect more posts in the future with deeper details on each.

First up: a revamp of the Wikimedia Report Card. This dashboard gives an overview of key metrics representing the health and success of the movement: pageviews, unique visitors, number of active editors, and the like.

The new report card is powered by Limn, a pure JavaScript GUI visualization toolkit we wrote. We wanted non-technical community members to be able to interact with the data directly, visualizing and exploring it themselves, rather than relying on us or analysts to give them a porthole into the deep. As a drop-in component, we hope it will contribute to democratizing data analysis (though we plan to use it extensively across projects ourselves). So play around with the report card data, or fork the project on GitHub!

Kraken: A Data Services Platform

But we have bigger plans. Epic plans. Mythical plans. A generic computational cluster for data analytics, which we affectionately call Kraken: a unified platform to aggregate, store, analyze, and query all incoming data of interest to the community, built so as to keep pace with our movement’s ample motivation and energy.

How many Android users are there in India that visit more than ten times per month? Is there a significant difference in the popularity of mobile OS’s between large cities and rural areas of India? Do Portuguese and Brazilian readers favour different content categories? How often are GLAM pictures displayed off-site, outside of Wikipedia (and where)?

As it stands, answering any of these questions is, at best, tedious and hard. Usually, it’s impossible. The size of the success of Wikimedia projects is a double-edged sword, in that it makes even modest data analysis a significant task. This is something we aim to fix with Kraken.

More urgently, however, we don’t presently have infrastructure to do A/B testing, measure the impact of outreach projects, or give editors insight into the readers they reach with their contributions. From this view, the platform is a robust, unified toolkit for exploring these data streams, as well as a means of providing everyone with better information for evaluating the success of features large and small.

This points toward our overarching vision. Long-term, we aim to give the Wikimedia movement a true data services platform: a cluster capable of providing realtime insight into community activity and a new view of humanity’s knowledge to power applications, mash up into websites, and stream to devices.

Dream big!

Privacy: Counting not Tracking

The Kraken is a mythical Nordic monster with many tentacles, much like any analytics system: analytics touches everything — from instrumenting mobile apps to new user conversion analysis to counting parser cache lookups — and it needs a big gaping maw to keep up with all the data coming in. Unfortunately, history teaches us that mythical cephalopods aren’t terribly good at privacy. We aim to change that.

We’ve always had a strong commitment to privacy. Everything we store is covered by the Foundation’s privacy policy. Nothing we’re talking about here changes those promises. Kraken will be used to count stuff, not to track user behaviour. But in order to count, we need to store and we want you all to have a good idea of what we’re collecting and why we’re collecting it and we will be specific and transparent about that. We aim to be able to answer a multitude of questions using different data sources. Counts of visitors, page and image views, search queries and number of edits and new user registrations are just a few of the data streams currently planned; each will be annotated with metadata to make it easier to query. To take a few more examples: page views will be tagged to indicate which come from bots. Traffic from mobile phones will be tagged as mobile. By counting these different types of events and adding these kind of meta tags, we will be able to better measure our progress towards the Strategic Plan.

We’ll be talking a lot more about the technical details of the system we’re building, so check back in case you’re interested or reach out to us if you want to provide feedback about how to best use the data to answer lots of interesting questions while still preserving users’ privacy. This post only scratches the surface, but we’ve got lots more to discuss.

Talk to Us!

Sound exciting? Have questions, ideas, or suggestions? Well then! Consider joining the Analytics mailing list or #wikimedia-analytics on Freenode (IRC). And of course you’re also very welcome to send me email directly.

Excited, and have engineering chops? Well then! We’re looking for a stellar engineer to help build a fast, intuitive, and beautiful toolkit for visualizing and understanding all this data. Check out the Javascript/UI Engineer job posting to learn more.

We’re definitely excited about where things are going, and we are looking forward to keeping you all up to speed on all our new developments.

Finally, we are hosting our first Analytics IRC office hours! Join us on July 30th, at 12pm PDT (3pm EDT / 9pm CEST) in #wikimedia-analytics to ask all your analytics and statistics related questions about Wikipedia and the other Wikimedia projects.

Best regards,

David Schoonover, Analytics Engineer
Andrew Otto, Analytics Engineer
Erik Zachte, Data Analyst
Diederik van Liere, Product Manager

Categories: Data analytics, Platform engineering, Technology | 5 Comments »
Tags: analytics, big data, kraken, limn, open-source, reportcard

How do you establish a QA & Testing practice for an open community?

Posted by Chris McMahon on June 27, 2012

To keep up with the growth of Wikipedia and its community, one goal of the engineering team at the Wikimedia Foundation for this year is to establish a Quality Assurance (QA) practice for software development, including MediaWiki itself, extensions, and also projects like Article Feedback and Editor Engagement. But how do you establish a QA & Testing practice for a development process that involves so many contributors, with code coming in from so many sources and projects?

In software development, QA is often conflated with software testing, but testing is only a small part of QA in general. The goal of modern software testing is not only to discover defects, but also to investigate software in order to provide valuable information about that software from every point of view, from the user experience to how the software is designed. On the other hand, Quality Assurance is process work, examining the process by which the software is created, from design to code to test to release and beyond.

Dozens of (volunteer and paid) developers contribute code to Mediawiki every month, in areas as varied as MediaWiki’s core, MediaWiki extensions, and localization. Thousands of power users on Wikimedia’s wikis can also contribute code directly on the sites, in the form of JavaScript “gadgets”. With so many entry points for fast-paced development, starting a QA/testing practice is challenging. Our strategy is to focus on two areas: test automation; and building a testing community. We’re hiring people to coordinate these two areas.

As QA Lead, I have created an example of what I believe to be the best available test automation “stack”, to pave the way and start the process of what I intend to be a reference implementation, an industry standard for high-quality browser test automation. We’re now hiring a QA Engineer whose primary responsibility will be to create and maintain browser-level test automation. In the course of creating those automated tests, we will be improving our use of the source code repository recently migrated from Subversion to git, we will be improving the beta labs test environment, and we will be expanding the use of our Continuous Integration in Jenkins.

But test automation isn’t everything, and we also have an opportunity to apply the Wikimedia community’s expertise in online volunteer collaboration to software quality. We’ve already started to explore this path with success: in May, we collaborated with Weekend Testing to validate the new frequent release schedule of MediaWiki to Wikimedia sites. Weekend Testing is an established global group of professional software testers who gather online every month for a different testing project, and testing Mediawiki versions on Wikipedia was a complex effort, executed well. In June, we collaborated with OpenHatch.org to test a near-final version of the new Article Feedback system that will be released to all of Wikipedia in the coming weeks. OpenHatch is an organization
dedicated to matching interested participants in open source software to projects that need participants. This was the first testing project for OpenHatch, and it went well; Article Feedback is much improved because of it.

We are now hiring a Volunteer QA Coordinator, who will be working to create a culture of quality testing and investigation of software related to Wikimedia, both within the Wikimedia community itself, and in collaboration with the greater software testing culture. And we are already planning future activities with both Weekend Testing and OpenHatch.

My first few months as QA Lead at the Wikimedia Foundation have been devoted to creating an environment where the QA Engineer and the Volunteer QA Coordinator will thrive. I am really looking forward to collaborating with the talented people we will hire for these roles. My own role will be shifting as these new practices start to take hold. I will be looking to the future, to bring in innovative and creative approaches to software QA and testing of the highest possible quality.

Chris McMahon
QA Lead Engineer

Categories: MediaWiki, Platform engineering, Technology | Comments Off
Tags: QA, testing

Techies learn, make, win at Foundation’s first San Francisco hackathon

Posted by Sumana Harihareswara on February 7, 2012

Participants at the San Francisco hackathon in January 2012

In January, 92 participants gathered in San Francisco to learn about Wikimedia technology and to build things in our first Bay Area hackathon.

After a kickoff speech by Foundation VP of Engineering Erik Möller (video), we led tutorials on the MediaWiki web API, customizing wikis with JavaScript user scripts and Gadgets, and building the Wikipedia Android app. (We recorded each training; click those links for how-to guides and videos.) We asked the participants to self-organize into teams and work on projects. After their demonstration showcase, judges awarded a few prizes to the best demos.