Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Platform engineering

On our way to Phabricator

Later this year, we’ll say good-bye to Bugzilla, our bug tracking platform, and migrate its content to another software called Phabricator. This will be an opportunity to centralize our various project and product management tools into a single platform, making it easier to follow technical discussions you’re interested in. This is the result of a six-month community review and discussion that identified requirements, evaluated options and decided on Phabricator.

What does this mean for me?

Bug reports and feature requests are listed as “tasks”, in an attractive interface not too different from Bugzilla’s.

If you’re a casual reader of Wikipedia and its sister sites, nothing will change (except maybe the rate at which you see improvements to the site, if our productivity increases).

If you’re an editor on a Wikimedia wiki, we expect this change to make your life easier, if you sometimes report bugs, request new features or participate in other technical discussions:

  • You’ll be able to use your SUL username to log into Phabricator;
  • No more having to go through half a dozen different tools to follow what’s happening on a specific bug or feature you’re watching: eventually, everything will be in one place.
  • Existing bug reports will be migrated to the new tool, and most links will continue to work; they will redirect to the bugs’ new location.
  • You’ll need a little time to adjust to this new tool, but hopefully Phabricator’s modern interface will make it easier for you to report bugs and participate in technical discussions.

If you’re more involved in the Wikimedia technical community, you’ve probably already participated in the discussions that have led to this decision. If you have other questions, you can ask them on the help page.

Why are we moving?

Since we started to use Bugzilla, the size of our technical community has dramatically increased. There are now dozens of developers, engineers, designers, tools maintainers, bot owners, project and product managers, etc. (not yet counting the hundreds of users regularly reporting bugs and participating in technical discussions).

It’s easy to see how a single tool with a limited scope (bug tracking) may not be able to meet the needs of all members of our technical community. Therefore, over the years, we’ve started to use other tools to complement Bugzilla in the areas of code review, project and product management, and quality assurance.

This had led however to a proliferation of scattered tools that barely talked to each other; engineers wrote scripts to keep some of them synchronized, but this wasn’t an ideal solution. Discussions about a single technical issue could be split across Bugzilla, Gerrit, Trello, Mingle and/or Scrumbugz. It was difficult for developers, and even more so for casual users.

Phabricator solves this problem by offering all those features under a single unified umbrella; eventually, everything will be in one place, tightly integrated and neatly organized. Initially, we’ll focus on bug tracking and project management, but we’re planning to also use it for code review once the features we need have been added.

How was this decided?

Phabricator notably includes a project management feature, allowing users to organize tasks in “boards” familiar to developers using the Agile methodology.

In late 2013, the Wikimedia Foundation started to facilitate a community review of all the project management tools then in use in the Wikimedia technical community. Developers, engineers and anyone who identified as a stakeholder in this discussion was invited to provide input and share their use cases, needs and usual processes. After this consultation period, this input was summarized into consolidated requirements.

A list of options was proposed, and discussed by the community to only keep those that were true contenders, based on our requirements.

Phabricator emerged as the only real challenger to the status quo. After a three-week request for comment, the technical community had weighed in the costs and benefits, and expressed an interest in moving to Phabricator. There were still a few issues and missing features to iron out, as well as a carefully-prepared migration plan to put in place, but overall the feeling was that once those had been resolved, there wouldn’t be any social blockers.

The Wikimedia Foundation is now preparing for the migration, and your help is much welcome. You can get involved directly in our test instance of Phabricator, that was originally set up just for testing, but that later became home to the migration project itself in order to become more familiar with the software.

When will this happen?

The migration plan gives an overview of the current timeline. There’s still work to be done, and Wikimedia engineers are working closely with the Phabricator development team, who’s been very responsive and open to collaboration. Together, they’re making sure that the features we need are present, and that we can adapt the software to our various workflows.

The current plan is to deploy a bare bones Phabricator instance with only Wikimedia SUL enabled, and make a first community call to test only the login process. The next step will be to deploy the Trusted User Tool required by the Legal and Community Advocacy team to keep track of agreements signed by community members. These steps will help guaranteeing a successful Day 1, when Phabricator will become the new driver of our development infrastructure.

On the Wikimedia side, Andre Klapper is leading the migration project, Mukunda Modell is lending his Phabricator expertise and Chase Pettet is handling the Operations side. You can read Andre’s retrospective on the review process and the road ahead. You’re also encouraged to follow the progress of the migration (dubbed “Wikimedia Phabricator Day 1″) on the dedicated page, the tracking item and its associated board in our test instance.

Guillaume Paumier, Technical communications manager

Wikimedia’s Road to Bugzilla 4.4 (How we puppetized, upgraded and moved Bugzilla to another server)

The original publication of this blog post can be found here.

The software behind Wikimedia’s website for tracking software issues and feature requests was recently updated to a newer  version and moved onto a new machine in a different datacenter. Furthermore, proper configuration management for this software was set up. This post explains the technical details and challenges.

Though we currently also evaluate Wikimedia’s project management tools, we will have to stick with our current infrastructure for a while. Among many other tasks, I spent the last few months preparing the upgrade of Wikimedia’s Bugzilla instance from 4.2 to 4.4. Some reasons for upgrading can be found in this Bugzilla comment.

In late November of 2013 I started cleaning up Wikimedia Bugzilla’s custom CSS which was copied about five years ago and not kept in sync. It turned out that 16  out of 22 files could be removed since there was no sufficient difference to upstream’s default CSS code (Bugzilla falls back to loading the default CSS file from /skins/default if no custom CSS file is found in /skins/custom). Less noise and less diffing required for future upgrades. In theory.

After testing these CSS changes on a Wikimedia Labs instance and merging them into our 4.2 production instance, I created numerous patches and put them into Gerrit (Wikimedia’s code review tool) by diffing upstream 4.2 code, upstream 4.4 code and our custom code.

At the same time, Wikimedia’s Technical Operations team wanted to move the Bugzilla server from the kaulen server in our old Tampa datacenter to the zirconium server in our new Ashburn (Eqiad) datacenter. While you’d normally prefer to do only one thing at a time, Daniel Zahn (of Technical Operations) and I decided to create a fresh Bugzilla 4.4 instance from scratch on the new server to see into which problems we would run. During this process Daniel Zahn turned the old setup on kaulen, which was largely manual and had organically grown over the years into a proper Puppet module. For every “missing module” error we ran into we avoided installing anything from Perl’s CPAN in Bugzilla’s /lib folder and ensured we just relied on distribution packages for a much cleaner install. Daniel Zahn installed the needed packages by adding them to puppet code. While doing this we also removed Bugzilla’s Sitemap extension as it created sporadic Search::Sitemap errors when running Bugzilla’s checksetup.pl (plus it’s unmaintained anyway). Furthermore I ran into another runtime error to fix.
(more…)

Help Test Media Viewer

Media Viewer lets you browse larger images on Wikimedia sites.

We invite you to try out Media Viewer, a new tool for browsing multimedia content, which is now in beta on Wikipedia and other Wikimedia sites.

Today, viewing images on our sites can be a frustrating experience for casual users: when you click on a thumbnail in an article, you are taken to a separate page where the image is shown in medium size and surrounded with a lot of text information that can be confusing to readers.

Media Viewer aims to improve this viewing experience by showing images in larger size, as an overlay on the current page. To reduce visual clutter, all information is shown below the image, and can be expanded at a click of a button.

This new tool is being developed by the Wikimedia Foundation’s multimedia team and we now invite you to try out in beta version. We plan to gradually release this tool in coming months, starting with a few pilot tests, followed by wider deployments in the next quarter.

How it works

With Media Viewer, you can click on any image thumbnail to see it in large size, without visual clutter. You can see the file name and author credits at the bottom of the screen, and view more information in an expandable panel below the image.

You can also expand the image to full screen, for a more immersive experience, or browse through all images in an article or gallery by clicking on the next and previous arrows. The ‘Use this file’ tool will make it easier to share images with your community, add them to articles or download them for your own purposes, with full attribution to contributors.

User response so far suggests that Media Viewer provides a richer multimedia experience, right where users expect it. They tell us they can see the images more clearly, without having to jump to separate pages, and that the interface is more intuitive, offering easy access to images and metadata.

How you can help

Can you help us test Media Viewer in coming weeks? It’s already included in our “Beta Features” program, so you can try it out right away. Now that we’re planning to enable it more widely, your help is even more crucial to uncover issues and bugs we haven’t caught before.

You can test this tool on any Wikimedia site; for example, you can try it out on this test page. To enable Media Viewer, you first need to log in and click on the small ‘Beta’ link next to ‘Preferences’ in your personal menu; then check the box next to ‘Media Viewer’ and click ‘Save’; you will now be able to click on any thumbnail image to see it in the Media Viewer on that site . Before you start, be sure to read these testing tips.

Try out Media Viewer and let us know what you think on this discussion page. If you find any technical bugs, please report them on Bugzilla.

Over 12,000 beta testers have now enabled Media Viewer across wikis around the world. Here is an overview of the feedback they have kindly given us to help improve this tool. Many of their suggestions are now being implemented, as part of our current release plan.

Next steps

The next version of Media Viewer will support video, audio and other file formats.

We are now working on beta version v0.2 of Media Viewer, with a focus on a better user interface, faster image load, more file info and attributions, still images only, as well as improved ‘Use this file’ tools (e.g. share, embed, download). We aim to release this version gradually out of beta, starting with limited tests on a few pilot sites in coming weeks. Based on test results, we plan a wider release of Media Viewer v0.2 next quarter.

The next version v0.3 of Media Viewer will focus on supporting more file formats (e.g. slides, video, audio), as well as zooming on large images and adding plug-ins for developers. For a preview of what we’re considering, check our Media Viewer v0.3 goals and mockups.

In future releases, we also hope to provide a few tools to help users take action on the media they are viewing: for example, a user might want to thank the person who uploaded a file, or report issues about that file. To see how we propose to expand Media Viewer in coming years, check out this multimedia vision for 2016.

Technology

If you are a developer, you can learn more about the technology behind Media Viewer on these two extension pages: MultimediaViewer (the front-end code that delivers the main user experience) and CommonsMetadata (the back-end code that delivers the file info to the viewer). In coming weeks, we hope to add a variety of hooks, accessible via the usual mw.hook interface, to allow more customization of behavior in the MultimediaViewer extension through gadgets and other extensions.

For more information about this tool, visit its project overview page; you can also learn more about other multimedia projects we’re working at the Multimedia project hub.

Thanks

Multimedia Team members: Gilles, Mark, Fabrice, Gergo, and Aaron (left to right)

We’d like to take this opportunity to thank all the folks who made this project possible, including Gilles Dubuc, Pau Giner, Aaron Arcos, Keegan Peterzell, Brian Wolff, Jared Zimmerman, May Galloway, Bryan Davis, Brion Vibber, Rob Lanphier, Erik Moeller, Howie Fung and Tomasz Finc, to name but a few.

We’re also grateful to all the community members who helped create this feature, through a series of roundtable discussions held by video conference, in person and over IRC. If you would like to participate in future discussions, we invite you to join our multimedia mailing list.

We look forward to more collaborations with you in coming weeks. Your feedback is invaluable for improving Media Viewer, and providing a better experience to our users!

Best regards,

Fabrice Florin, Product Manager
Mark Holmquist, Software Engineer
Gergő Tisza, Software Engineer
on behalf of the Wikimedia Foundation’s Multimedia Team

RfC: Should we support MP4 video on our sites?

A video of a cheetah, captured in slow-motion at 1200 fps. The video was released on Vimeo in MP4 format and converted to OGV format before uploading to Commons. It cannot be viewed in this format on most mobile phones and many web browsers.

The Wikimedia Foundation’s multimedia team seeks your guidance on a proposal to support the MP4 video format. This digital video standard is used widely around the world to record, edit and watch videos on mobile phones, desktop computers and home video devices. It is also known as H.264/MPEG-4 or AVC.

Supporting the MP4 format would make it much easier for our users to view and contribute video on Wikimedia projects. Video files could be offered in dual formats on our sites, so we could continue to support current open formats (WebM and Ogg Theora).

Currently, open video files cannot be viewed on many mobile devices or web browsers without extra software, making it difficult or impossible for several hundred million monthly visitors to watch videos on our sites. Video contributions are also limited by the fact that most mobile phones and camcorders record video only in MP4 format, and that transcoding software is scarce and hard to use by casual users.

However, MP4 is a patent-encumbered format, and using a proprietary format would be a departure from our current practice of only supporting open formats on our sites—even though the licenses appear to have acceptable legal terms, with only a small fee required.

We would appreciate your guidance on whether or not to support MP4 on our sites. This Request for Comments presents views both in favor of and against MP4 support, and hundreds of community members have already posted their recommendations.

What do you think? Please post your comments on this page.

All users are welcome to participate, whether you are active on Commons, Wikipedia, other Wikimedia projects—or any site that uses content from our free media repository. We also invite you to spread the word in your community about this issue.

We look forward to a constructive discussion with you and your community, so we can make a more informed decision together about this important question.

All the best,

Fabrice Florin, Product Manager, Multimedia
On behalf of the Multimedia team

Wikimedia Foundation’s Engineering and Product Group

A Multimedia Vision for 2016

How will we use multimedia on our sites in three years?

The Wikimedia Foundation’s Multimedia team was formed to provide a richer experience and support more media contributions on Wikipedia, Commons, and MediaWiki sites. We believe that audio-visual media offer a unique opportunity to engage a wide range of users to participate productively in our collective work.

To inform our plans, we’ve created a simple vision of how we might collaborate through multimedia by 2016. This hypothetical scenario was prepared with guidance from community members and is intended for discussion purposes, to help us visualize possible improvements to our user experience over the next three years.

Vision

The best way to view this vision is to watch this video:

Multimedia Vision 2016, presented by Fabrice Florin at a Wikimedia Meetup in San Francisco on Dec. 9, 2013.

(more…)

Wikimedia moving to Elasticsearch

We’re in the process of rolling out new search infrastructure to all of the wikis, so it’s a good time to explain what’s coming to all Wikimedia wikis in the very immediate future, why we’re changing it, and how you can get involved.

Screenshot of the new search box

The new search engine is coming soon to all Wikimedia wikis, and may already be on your favorite wiki

First a bit of background. All Wikimedia sites have been using a home-grown search system based on Apache Lucene since 2005 or 2006. It was written primarily by volunteer Robert Stojnić and is called lucene-search-2. This is a fantastic search engine, which has powered the sites for years now, and has managed to scale very well for the past 8 years or so. Early in 2013 this became a point of significant operational problems; short-term we were able to patch some of the most glaring issues in lucene-search-2 but it became increasingly apparent that a replacement was needed. Robert is no longer around and the system is showing its age.

We’re very happy with Lucene but we wanted to get out of the business of maintaining a special-purpose open-source search system when there are two very good general-purpose open-source search systems available: Solr and Elasticsearch. Both are based on Lucene and horizontally scalable for data and query volume. After experimenting with both and implementing basic MediaWiki integration we chose to settle on Elasticsearch for the following reasons:

  • Elasticsearch’s reference manual and contribution documentation promised an easy start and pleasant time getting changes upstream when we’ve needed to
  • Elasticsearch’s super expressive search API lets us search any way we need to search and gives us confidence that we can expand on it. Not to mention we can easily write very expressive ad-hoc queries when we need to.
  • Elasticsearch’s index maintenance API lets us maintain the index right from our MediaWiki extension, so it’s easier for us to deploy and test, and should be easier for MediaWiki users outside Wikimedia to use. At the time of the choice, Solr’s schema API was read-only.
  • Rack awareness, automatic shard rebalancing, statistics exposed over HTTP, preference for JSON and YML over XML, and first-party Debian packages were also nice.

To provide the integration to MediaWiki, we’ve written a new extension called CirrusSearch that we’ve designed to be mostly backwards-compatible with the current search with the following exceptions:

  • Templates are expanded before indexing so text that comes from templates will be searchable but text inside templates no longer will be.
  • Page updates are reflected in search results pretty quickly after they are made, usually within seconds for single page edits.
  • Wiki communities can mark some pages as higher or lower quality and it will be reflected in the search results.
  • A few new “expert” options have been added (intitle: is negate-able, prefer-recent: etc).

We’ve documented all of these features and more on mediawiki.org, and the page is licensed in the public domain so people can feel free to copy it to their wikis as a basis of documentation.

We plan for this replacement search to be a Beta Feature for all wikis by the end of February and the primary search in March or April. See our ever-evolving timeline for ever-evolving specifics.

We’ve got a lot of exciting things on the horizon now that we’ve got a modern and stable search for Wikimedia. We’re talking Wikidata, Commons metadata, faceting, real cross-wiki searching, etc. Please get involved by filing bugs, talking to us on the project page, or by finding us on IRC and pinging us there. On IRC, you can find us as ^d and manybubbles.

Chad Horohoe and Nik Everett, Wikimedia Foundation

OAuth now available on Wikimedia wikis

Oauth logo.svg

Over the past few months, engineers in the MediaWiki Core team at the Wikimedia Foundation have been developing Extension:OAuth. A preliminary version of the extension is now live on all Wikimedia wikis.

OAuth allows users to authorise third-party applications to take actions on their behalf without having to provide their Wikimedia account password to the application. OAuth also allows a user to revoke an application’s access at any time, adding an extra layer of security for users. By using OAuth, third-party applications can streamline their workflows by no longer needing to direct people to the wiki to perform actions such as creating accounts or unblocking users. For example, on the English Wikipedia, Snuggle, the Account Creation Tool, and the Unblock Request System have begun work on implementing OAuth so that users can use their tools more seamlessly.

This dialogue is presented to you when you are asked to authorise an application to access your account.

The list of actions that third parties can be authorised to do is extensive, and extra actions can be added in if there is demand for them. We hope that OAuth will empower third-party application developers to make even better applications to help Wikimedians do their work, and we look forward to seeing what applications are created.

If you need help or have any questions, feel free to visit Help:OAuth on MediaWiki.org. If your question is not answered on that page, please ask on the talk page and a member of the OAuth team will answer it for you. For technical details on the OAuth protocol, visit http://oauth.net.

Dan Garry
Associate Product Manager for Platform, Wikimedia Foundation

Scientific multimedia files get a second life on Wikipedia

On Wikimedia projects, audio and video content has traditionally taken a backseat relative to text and static images (however, changes are underway). Conversely, more and more scholarly publications come with audio and video files, though these are — a legacy from the print era — typically relegated to the “supplementary material” rather than embedded next to the relevant text passages. And a rising number of these publications are Open Access, i.e. freely available under Creative Commons licenses that allow for the materials to be reused in other contexts.

Why not enrich thematically related Wikimedia pages with such multimedia files? That’s where the Open Access Media Importer (OAMI) comes in. It makes scientific video and audio clips accessible to the Wikimedia community and a broader public audience. The OAMI is an open-source program (or ‘bot’) that crawls PubMed Central — a full-text database of over 3 million biomedical research articles — and extracts multimedia files from those publications in the database that are available under Wikimedia-compatible licenses.

Over 700 OAMI-contributed media files are currently used in Wikipedia and other Wikimedia projects. This X-ray video of a breathing American alligator — originally published by Claessens et al. (2009) in PLOS ONE — is currently being used for illustrating the “Respiratory system” entries in the Bulgarian, Chinese, English, German, Russian, and Serbocroatian Wikipedias.

Such reuse-friendly terms are the key ingredient to making scholarly materials useful beyond the article in which they have originally been published. However, OAMI aims to make this material even more useful by making it accessible:

  • in places where people actually look for them (Wikimedia platforms are a prime example),
  • in one coherent format (in our case Ogg Vorbis/Theora, which isn’t encumbered by patent restrictions), and
  • in a way that allows for collaborative annotation with relevant metadata. This makes it a lot easier to browse and search the media files.

(more…)

VipsScaler implementation by volunteer developer improves image handling on Wikimedia sites

A depiction of the Battle of Belmont, Second Boer War. This image is an example of image reduction in VipsScaler. The original file has a resolution of 52 megapixels

Loading and resizing large images within Wikimedia projects has become faster and more reliable with the rollout of VipsScaler, a wraparound to the VIPS free image processing software. VIPS is a tool designed to use a small amount of memory when resizing images. This allows the wiki to create thumbnails of very large PNG files, something which previously was not possible because large amounts of memory would be required. And while Wikimedia Foundation technical staff rolled it out, a volunteer wrote the code.

The most common type of image file on the internet is a JPEG, but since its compression leads to deteriorated image quality with repeated editing, most Wikipedia and Wikimedia Commons non-photographic image files are stored in PNG formats, since it uses lossless compression. Until VipsScaler, thumbnails of PNG files larger than 50 megapixels could not be created.

Volunteer Bryan Tong Minh was a student at Delft University of Technology in 2008 when he initially wrote a utility capable of downscaling PNG images without using huge amounts of memory. Active on Wikimedia Commons at the time, Tong Minh (User:Bryan) said he “was annoyed by the fact that large PNGs did not have thumbnails because the image scaler that we used, ImageMagick, could not efficiently scale large non-JPEG images.”

During the review period of the utility, he became aware of VIPS, which allows for memory-efficient scaling of image files — more than just PNGs. He then set out to implement an extension that would allow usage of VIPS with MediaWiki, which became the VipsScaler extension.

“For many years, PNG files over a certain size could not be displayed on Wikipedia. They could be downloaded, but gave an error when thumbnailed and, on their file page, made it appear that the file was corrupt,” said Adam Cuerden, whose restoration work on Wikipedia since 2007 accounts for four percent of all featured pictures on English Wikipedia. Cuerden said it wasn’t uncommon for PNG files to be marked for deletion because they could not be displayed.

Currently, VIPS scales PNG images from 35 to 140 megapixels, according to Commons contributor Brian Wolff. He points to this 72 megapixel image below of Abraham Lincoln, restored by Adam Cuerden, as an example of an image that previously wouldn’t have been able to be rendered.
(more…)

Buggin’ out! How the Wikimedia Foundation triages and fixes software bugs

Finding software problems

As part of the Wikimedia Foundation’s mandate to operate the Wikimedia sites, a small technical team is responsible for maintaining the MediaWiki core platform, along with its features and extensions. The developers who maintain key software in the stack, improving and enhancing the functionality of MediaWiki, include people outside and inside the Wikimedia Foundation, working side by side — and similarly, volunteers and staff work together to detect and diagnose problems to fix. Unusual among many of its tech peers, the engineering team at the Wikimedia Foundation is helped in this process by a contingent of dedicated volunteers who aid in finding, reporting and fixing unintended software problems, or bugs.

Bugzilla is the website where any user or developer can report a software bug or, if they have an idea, request a feature enhancement be made to MediaWiki. Users of Bugzilla can also search current reports to add new comments or information and search for previous reports about a similar bug.

Andre Klapper, Bug Wrangler

The first step in resolving bugs is getting each Bugzilla report to the developer who handles that area of software code. Wikimedia’s Bug Wrangler, Andre Klapper, takes a look at each bug report to figure out the correct product, such as MediaWiki or MediaWiki extensions, and the related subcomponent that each bug falls into. As Bug Wrangler, it’s also his job to make sure each report has enough information for the developers to fix the bug.

“I’m responsible for triaging, and there are some great community members who help me with that,” said Klapper. “When I see problems that are urgent or really critical, I escalate by making developers explicitly aware of such problems. I also try to keep an eye on the many forums (such as Village Pumps) and places where users report problems, and make sure that software bugs end up as a report in Bugzilla, so developers can find them.”

In describing the open platform used by the Wikimedia Foundation, Klapper added, “Basically anybody can define or correct the priority of bug reports and feature requests. Often it’s members of development teams or other volunteers who triage, or I set priorities in order to help developers see what’s the most important stuff to work on.”

Klapper, who joined the Wikimedia Foundation in 2012, estimates that 70 percent of bugs are reported by Foundation staff and developers, 30 percent by users. The role of Bug Wrangler was a natural fit for him because the “position described pretty well what I’ve worked on before” at GNOME and the now defunct Maemo/MeeGo.

Bugzilla has taken its share of knocks from critics, said Klapper (who counts himself among them), because it requires a separate registration, the user interface is complex and it lacks dashboards. He said the developers at bugzilla.org are working on these issues for the next Bugzilla version, and in the meantime, he has started a weekly blog entry called “Bugzilla Tips,” where he gives advice for using the tool.

“I try to make Bugzilla work better for everybody and, hence, ask teams what they are missing, or try to help establish good workflows,” he said. “For example, I introduced a new Bugzilla front page on bugzilla.wikimedia.org recently that provides quicker access to the main tasks.”

(more…)