Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

Digging for Data: How to Research Beyond Wikimetrics

The next virtual meet-up will point out research tools. Join!!

For Learning & Evaluation, Wikimetrics is a powerful tool for pulling data for wiki project user cohorts, such as edit counts, pages created and bytes added or removed. However, you may still have a variety of other questions, for instance:

How many members of WikiProject Medicine have edited a medicine-related article in the past three months?
How many new editors have played The Wikipedia Adventure?
What are the most-viewed and most-edited articles about Women Scientists?

Questions like these and many others regarding the content of Wikimedia projects and the activities of editors and readers can be answered using tools developed by Wikimedians all over the world. These gadgets, based on publicly available data, rely on databases and Application Programming Interfaces (APIs). They are maintained by volunteers and staff within our movement.

On July 16, Jonathan Morgan, research strategist for the Learning and Evaluation team and wiki-research veteran, will begin a three-part series to explore some of the different routes to accessing Wikimedia data. Building off several recent workshops including the Wiki Research Hackathon and a series of Community Data Science Workshops developed at the University of Washington, in Beyond Wikimetrics, Jonathan will guide participants on how to expand their wiki-research capabilities by accessing data directly through these tools.

(more…)

Making Wikimedia Sites faster

Running the fifth largest website in the world brings its own set of challenges. One particularly important issue is the time it takes to render a page in your browser. Nobody likes slow websites, and we know from research that even small delays lead visitors to leave the site. An ongoing concern from both the Operations and Platform teams is to improve the reader experience by making Wikipedia and its sister projects as fast as possible. We ask ourselves questions like: Can we make Wikipedia 20% faster on half the planet?

As you can imagine, the end-user experience differs greatly due to our unique diverse and global readership. Hence, we need to conduct real user monitoring to truly get an understanding of how fast our projects are in real-life situations.

But how do we measure how fast a webpage loads? Last year, we started building instrumentation to collect anonymous timing data from real users, through a MediaWiki extension called NavigationTiming.[1]

There are many factors that determine how fast a page loads, but here we will focus on the effects of network latency on page speed. Latency is the time it takes for a packet to travel from the originating server to the client who made the request.

ULSFO

Earlier this year, our new data center (ULSFO) went fully operational, serving content to Oceania, South-East Asia, and the west coast of North America[2]. The main benefit of this work is shaving up to 70−80ms of round-trip time for some regions of Oceania, East Asia, US and Canada. An area with 360 million Internet users and a total population of approximately one billion people.

We recently explained how we chose which areas to serve from the new data center; knowing the sites became faster for those users was not enough for us, we wanted to know how much faster.

Results

Before we talk about specific results, it is important to understand that having faster network round trip times might not directly result in a faster user experience for users. When network times are faster, resources are retrieved faster, but there are many other factors that influence page latency. This is perhaps better explained with an example: If we need 4 network trips to compose a page, and if round trips 2, 3 and 4 are happening while I am parsing a huge main document (round trip 1), I will only see improvements from the first request. Subsequent ones are done in parallel and totally hidden under the fetching of the first one. In this scenario, our bottleneck for performance is the parsing of the first resource. Not the network time.

With that in mind, what we wanted to know when we analyzed the data from the NavigationTiming extension were two things: How much did our network times improve? and Can users feel the effect of faster network times? Are pages perceived to be faster, and if so, how much?

The data we harvest from the NavigationTiming extension is segregated by country. Thus we concentrated our data analysis on countries in Asia for which we had sufficient data points; we also included the United States and Canada but we were not able to extract data just for the western states. Data for United States and Canada was analyzed at a country level and thus the improvements in latency appear “muffled”.

How much did our network times improve?

The short summary is: network times improved quite a bit. For half of requests, the retrieval of the main document decreased up to 70 ms.

ULSFO Improvement of Network times on Wikimedia Sites

In the opposite graph, the data center rollout is marked with a dashed line. The rollout was gradual, thus gains are not perceived immediately but they are very significant after a few days. The graph includes data for Japan, Korea and the whole SE Asia Region.[3]

We graphed the responseStart–connectStart time which represents the time spent in the network until the first byte arrives, minus the time spent in DNS lookups. For a more visual explanation, take a look at the Navigation timing diagram. If there is a TCP connection drop, the time will include the setup of the new connection. All the data we use to measure network improvements is provided by request timing API, and thus not available on IE8 and below.

User perceived latency

Did the improvement of network times have an impact that our users could see? Well, yes it did. More so for some users than others.

The gains in Japan and Indonesia were remarkable, page load times dropped up to 300ms at the 50th percentile (weekly). We saw smaller (but measurable) improvements of 40 ms in the US too. However, we were not able to measure the impact in Canada.

The dataset we used to measure these improvements is a bigger one than the one we had for network times. As we mentioned before, the Navigation Timing API is not present in old browsers, thus we cannot measure, say, network improvement in IE7. In this case, however, we used a measure of our creation that tells us when a page is done loading called mediaWikiLoadComplete. This measure is taken in all browsers when the page is ready to interact with the user; faster times do mean that the user experience was also faster. Now, how users perceive the improvement has a lot to do with how fast pages were to start with. If a page now takes 700 ms to render instead of one second, any user will be able to see the difference. However a difference of 300 ms in a 4 second page rendering will be unnoticed by most.

Reduction in latency

Want to know more?

Want to know all the details? A (very) detailed report of the performance impact of the ULSFO rollout is available.

Next steps

Improving speed is an ongoing concern, particularly as we roll out new features and we want to make sure that page rendering remains fast. We are keeping our eyes open to new ways of reducing latency, for example by evaluating TCP Fast Open. TCP Fast Open skips an entire round-trip and starts sending data from the server to client before the final acknowledgment of the three-way TCP handshake has been finished.

We are also getting closer to deploying HipHop. HipHop is a virtual machine that compiles PHP bytecode to native instructions at runtime, the same strategy used by Java and C# to achieve their speed advantages. We’re quite confident that this will result in big performance improvements on our sites as well.

We wish you speedy times!

Faidon Liambotis
Ori Livneh
Nuria Ruiz
Diederik van Liere

Notes

  1. The NavigationTiming extension is built on top of the HTML5 component with same name which exposes fine-grained measurements from the moment a user submits a request to load a page until the page has been fully loaded.
  2. Countries and provinces served by ULSFO include: Bangladesh, Bhutan, Hong Kong, Indonesia, Japan, Cambodia, Democratic People’s Republic of Korea, Republic of Korea, Myanmar, Mongolia, Macao, Malaysia, Philippines, Singapore, Thailand, Taiwan, Vietnam, US Pacific/West Coast states (Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, New Mexico, Nevada, Oregon, Utah, Washington, Wyoming) and Canada’s western territories (Alberta, British Columbia, Northwest Territories, Yukon Territory).
  3. Countries include: Bangladesh, Bhutan, Hong Kong, Indonesia, Japan, Cambodia, Democratic People’s Republic of Korea, Republic of Korea, Myanmar, Mongolia, Macao, Malaysia, Philippines, Singapore, Thailand, Taiwan, Vietnam.

Pywikibot will have its next bug triage on July 24−27

For most Wikimedia projects, Pywikibot (formerly pywikipedia) has proved to be a trusted and powerful tool. Literally millions of edits have been made by “bots” through this (semi-)automated software suite, written in Python.

Bug triage is like a check-up for bots: we check the list of things that need to be done and clean up the list. During a bug triage, we go through the list of our open bugs and check them for reproducibility (Can we make them happen on our computer to investigate them), severity, priority, and we categorize them when necessary. Bugs in this context can imply a problem in scripts, or a feature request that improves Pywikibot.

From July 24 to July 27, we’ll be holding a big online event to learn what more needs to be done for Pywikibot. Which bugs need an urgent fix, what features are missing or incomplete, etc. Obviously, it is a also a good opportunity to look at the code and look for “bit rot”.

Fixing bugs can sometimes be hard and time-consuming, but bug triaging doesn’t require deep technical knowledge: anyone with a little experience about running bots can be of great help in the bug triage. Triage can be a tedious task due to the number of bugs involved, so we need your help to go through them all.

If you know your Python and are interested in putting your skills to good use to support Wikimedia sites, join us for the bug-a-thon starting July 24. Until then, you can start familiarizing yourself with Pywikibot and bug triaging!

Amir Sarabadani (User:Ladsgroup), editor on the Persian Wikipedia and Pywikibot developer

How RIPE Atlas Helped Wikipedia Users

This post by Emile Aben is cross-posted from RIPE Labs, a blog maintained by the Réseaux IP Européens Network Coordination Centre (RIPE NCC). In addition to being the Regional Internet Registry for Europe, the Middle East and parts of Central Asia, the RIPE NCC also operates RIPE Atlas, a global measurement network that collects data on Internet connectivity and reachability to assess the state of the Internet in real time. Wikimedia engineer Faidon Liambotis recently collaborated with the RIPE NCC on a project to measure the delivery of Wikimedia sites to users in Asia and elsewhere using our current infrastructure. Together, they identified ways to decrease latency and improve performance for users around the world. 

During RIPE 67, Faidon Liambotis (Principal Operations Engineer at the Wikimedia Foundation) and I got into a hallway conversation. Long story short: We figured we could do something with RIPE Atlas to decrease latency for users visiting Wikipedia and other Wikimedia sites.

At that time, Wikimedia had two locations active (Ashburn and Amsterdam), and was preparing a third (San Francisco), to better serve users in Oceania, South Asia, and US/Canada west coast regions. We were wondering about the effects on network latency for users world-wide for this third location and Wikimedia wanted to quantify the effect turning up this location would have.

Wikimedia runs their own Content Delivery Network (CDN), mostly for privacy & cost reasons. Like most CDNs, to geographically balance the traffic to their various points of presence (PoPs), they employ a technique called GeoDNS: a user will, based on the DNS request that is made on their behalf from their DNS resolver, be specifically directed to one of the data centers based on their or their resolver’s IP address. This requires the authoritative DNS servers for Wikimedia sites to know where to best direct the user to. Wikimedia uses gdnsd for authoritative DNS to dynamically respond to those queries based on a region-to-datacenter map.

Some call this ‘stupid DNS tricks‘, others find it useful to decrease latency towards websites. Wikimedia is in the latter group, and we used RIPE Atlas to see how this method performs.

One specific question we wanted answered is where to “split Asia” between the San Francisco and the Amsterdam Wikimedia location. Latency is obviously a function of physical distance, but also the choice of upstream networks. As an example, these choices determine if packets to “other side of the world” destinations tend to be routed clockwise or counter-clockwise.

We scheduled latency measurements from all RIPE Atlas probes towards the three Wikimedia locations we wanted to look at, and visualised what datacenter showed the lowest latency for each probe. You can see the results in Figure 1 below.

Screenshot of latency map. Probes are colored based on the datacenter that shows the lowest measured latency for this particular probe

Figure 1: Screenshot of latency map. Probes are colored based on the datacenter that shows the lowest measured latency for this particular probe.

This latency map shows the locations of RIPE Atlas probes, coloured by what Wikimedia data center has the lowest latency measured from that probe:

  • Orange: the Amsterdam PoP has the lowest latency
  • Green: the Ashburn PoP has the lowest latency
  • Blue: the San Francisco PoP has the lowest latency.

Probes where the lowest latency is over 150ms have a red outline. An interactive version of this map is available here. Note that this is a prototype to show the potential of this approach, so it is a little rough around the edges.

Probes located in India clearly have lower latency towards Amsterdam. Probes in China, South Korea, the Philippines, Malaysia and Singapore showed lower latency towards San Francisco. For other locations in South-East Asia the situation was less clear, but that is also useful information to have, because it shows that directing users to either the Amsterdam or the San Francisco data center seems equally good (or bad). It is also interesting to note that all of Russia, including the two most eastern probes in Vladivostok have lowest latency towards Amsterdam. For the Vladivostok probes Amsterdam and San Francisco are almost the same distance, give or take 100 km. Nearby probes in China, South Korea and Japan have lowest latency towards San Francisco.

There is always the question of drawing conclusions based on a low number of samples, and how representative RIPE Atlas probe hosts are for a larger population. Having some data is better then no data in these cases though, and if a region has a low number of probes that can always be fixed by deploying more probes there. If you live in an underrepresented region you can apply for a probe and make this better.

With this measurement data to back it, Wikimedia has gradually turned up Oceania, South Asian countries and US/Canada states where RIPE Atlas measurements showed minimal latency to, to be served by their San Francisco caching PoP. The geo-config that Wikimedia is running on, is publicly available here.

As for the code that created the measurements and created the latency map: This was all prototype-quality code at best, so I originally planned to find a second site where we could do this, so to see if we could generalise scripts and visualisation and then share.

At RIPE 68 there was interest in even this raw prototype code for doing things with data centers, latency and RIPE Atlas, so we ended up sharing this code privately, and have heard of progress made on that already. In the meantime we’ve put up the code that created the latency map on github. Again: it’s a prototype, but if you can’t wait for a better version, please feel free to use and improve it.

Conclusion

If you have an interesting idea, and have no time, or other things are stopping you from implementing it, please let us know! You can always chat with us at a RIPE meetingregional meeting or any other channels. We don’t have infinite time, but we can definitely try out things, especially ideas that will improve the Internet and/or improve the life of network operators.

Emile Aben

Translatewiki.net in the Swedish spotlight

This post is available in 2 languages:
English  • Svenska

English

Translatewiki.net’s logo.

Most Swedes have a basic understanding of English, but many of them are far from being fluent. Hence, it is important that different computer programs are localized so that they can also work in Swedish and other languages. This helps people avoid mistakes and makes the users work faster and more efficienttly. But how is this done?

First and foremost, the different messages in the software need to be translated separately. To get the translation just right and to make sure that the language is consistent requires a lot of thought. In open source software, this work is often done by volunteers who double check each other’s work. This allows for the program to be translated into hundreds of different languages, including minority languages that commercial operators usually do not focus on. As an example, the MediaWiki software that is used in all Wikimedia projects (such as Wikipedia), is translated in this way. As MediaWiki is developed at a rapid pace, with a large amount of new messages each month, it is important for us that we have a large and active community of translators. This way we make sure that everything works in all languages as fast as possible. But what could the Wikimedia movement do to help build this translator community?

We are happy to announce that Wikimedia Sverige is about to start a new project with support from Internetfonden (.Se) (the Internet Fund). The Internet Fund supports projects that improve the Internet’s infrastructure. The idea of translating open software to help build the translator community is in line with their goals. We gave the project a zingy name: “Expanding the translatewiki.net – ‘Improved Swedish localization of open source, for easier online participation’.” This is the first time that Wikimedia Sverige has had a project that focuses on this important element of the user experience. Here we will learn many new things that we will try to share with the wider community while aiming to improve the basic infrastructure on translatewiki.net. The translation platform translatewiki.net currently has 27 programs ready to be translated into 213 languages by more than 6,400 volunteers from around the world.

(more…)

Revamped Wikipedia app now available on Android

The Main Page of the English Wikipedia on the new Android app.

If you love Wikipedia and have an Android phone, you’re in for a treat! Today we’ve released a revamped Wikipedia for Android app, now available on Google Play.

Our new app is native from the ground up, making it the fastest way to experience Wikipedia on a phone. For the first release, we’ve focussed on creating a great browsing and reading experience. Whether you’re looking up a specific fact or looking to spend a day learning a new topic, our search and table of contents features get you to the information you need, quickly and intuitively. We’re also offering the ability to edit in the app, so you can help make Wikipedia better for billions of readers around the world.

What features are included?

  • Speed – Our new, native app allows you to browse and edit Wikipedia faster than ever before.
  • Editing – You can edit Wikipedia on the app. Logged in or logged out, we thank you for all your contributions.
  • Recent pages – We provide you with your reading history, so you can tap as many links as you like without ever getting lost.
  • Saved pages – You can save select pages for offline reading and browse them even when you don’t have a data connection.
  • Share – Use your existing social networking apps to share in the sum of all human knowledge.
  • Language support – The app allows you to seamlessly switch to reading Wikipedia written in any language.
  • Wikipedia Zero – We’ve partnered with cellular carriers around the world to provide Wikipedia free of data charges to users in many developing areas.

Coming soon

  • Night mode – We’ve gotten lots of great beta user feedback; one feature people love is reading Wikipedia in darker environments. The inverted colour scheme offered by night mode will make that much easier.
  • Discussions – Talk pages are an important part of Wikipedia for both new users and experienced editors alike. We’re bringing them to the app.

This release is just the beginning! We’re still working hard on creating new features to make the app the best Wikipedia reading and editing experience out there. Whether you’re a long-time user of Wikipedia on Android or are brand new to the app, give it a spin and let us know what you think. This is just the first step; we hope this app will grow with us, and we’re excited to have our community help us evolve it.

Please help us improve this app by sending a note to our mailing list, [email protected], or writing a comment here.

Thank you!

Dan Garry, 
Associate Product Manager, Mobile Apps

Ram Prasad Joshi: Writing Wikipedia from the western hills of Nepal

Ram Prasad Joshi

Ram Prasad Joshi doesn’t have a computer. His village may be beautiful but there is no electricity. It’s a three-hour walk to the nearest road. In spite of all this, Joshi has accumulated more than 6,000 edits to the Nepali Wikipedia using nothing more than a feature phone.

An image shot by Ram Prasad Joshi on his feature phone: Devotees paying homage to the Thama Mai Temple (replica of Badimalika, Bajura) in Dailekh

“On Wikipedia I write about geography, history and culture of my surroundings,” he said. “I am a Hindu so I write about the Hindu religion and Hindu culture. I edit and write new articles on the Sanskrit, Hindi, Fijian, Bhojpuri and Gujrati Wikipedias, as well as in Nepali. I can introduce my village, my locality and my culture to the world.”

An image shot by Ram Prasad Joshi on his feature phone: Stone script of Damupal near Kartikhamba in Dailekh established by King Prithivi Malla B.S. 1038 (981 A.D.). It is claimed to be the first stone script in the Nepali Language.

In addition to his writing, Joshi has contributed almost a hundred photographs to Wikimedia Commons. He took part in Wiki Loves Monuments 2013 and his images of archaeological monuments in his area won him the prize for best mobile contributor.

Due to its remote geography, his contributions may be the only representation his village will get online. “No newspapers, no magazines, nothing arrives here,” he explains. “In my village there are many people who have never seen a television. Now the mobile phone emerged, villagers watch videos on mobile, but no-one owns a television.”

For Joshi, his initial introduction to editing began on a somber note four years ago. While living and working in Haridwar, a small city in northeast India, his mother became seriously ill and passed away. “According to Hindu culture, all children should perform the rituals; they have to sit isolated for thirteen days in mourning,” he explained. “I was grieved greatly by her loss. My eyes still become wet when I remember her death. Parents are regarded as the almighty and holy in my culture.”

“I had to find ways to divert my thoughts from the memories of mom. As a way to vent my grief, I began to surf mobile internet more which helped me a lot. I explored the Nepali Wikipedia. I also saw the edit button in each article and the sub heading too. I then learned that I could edit these encyclopedia entries. When I remember my mom, I open Wikipedia and read or edit,” he added.

Fortunately, Joshi might no longer be alone in his editing endeavors; soon others will be able to benefit just as he did. Wikipedia Zero’s partnership with Nepali GSM mobile operator Ncell has given more people the opportunity to learn what Wikipedia is and how they can contribute to Wikimedia projects. “I have conveyed to my family and my villagers about Wikipedia,” said Joshi. “But for most people the Internet is out of reach, so it is a vague topic for them. After Ncell announced [their partnership with] Wikipedia Zero, some have given concern to it. Earlier when I started talking about Wikipedia they treated me as if I had gone mad.”

“Ncell broadcast advertisements for Wikipedia Zero through local radio. Many people now understand that Wikipedia is an encyclopedia of knowledge.”

Ncell’s partnership is ideal for those looking to access and contribute to Wikipedia from a mobile phone, in the same way Joshi has for so long.
(more…)

Odia language gets a new Unicode font converter

Screenshot mock-up of Akruti Sarala – Unicode Odia converter

It’s been over a decade since Unicode standard was made available for Odia script. Odia is a language spoken by roughly 33 million people in Eastern India, and is one of the many official languages of India. Since its release, it has been challenging to get more content on Unicode, the reason being many who are used to other non-Unicode standards are not willing to make the move to Unicode. This created the need for a simple converter that could convert text once typed in various non-Unicode fonts to Unicode. This could enrich Wikipedia and other Wikimedia projects by converting previously typed content and making it more widely available on the internet. The Odia language recently got such a converter, making it possible to convert two of the most popular fonts among media professionals (AkrutiOriSarala99 and AkrutiOriSarala) into Unicode.

All of the non-Latin scripts came under one umbrella after the rollout of Unicode. Since then, many Unicode compliant fonts have been designed and the open source community has put forth effort to produce good quality fonts. Though contribution to Unicode compliant portals like Wikipedia increased, the publication and printing industries in India were still stuck with the pre-existing ASCII and ISCII standards (Indian font encoding standard based on ASCII). Modified ASCII fonts that were used as typesets for newspapers, books, magazines and other printed documents still exist in these industries. This created a massive amount of content that is not searchable or reproducible because it is not Unicode compliant. The difference in Unicode font is the existence of separate glyphs for the Indic script characters along with the Latin glyphs that are actually replaced by the Indic characters. So, when someone does not have a particular ASCII standard font installed, the typed text looks absurd (see Mojibake), however text typed using one Unicode font could be read using another Unicode font in a different operating system. Most of the ASCII fonts that are used for typing Indic languages are proprietary and many individuals/organizations even use pirated software and fonts. Having massive amounts of content available in multiple standards and little content in Unicode created a large gap for many languages including Odia. Until all of this content gets converted to Unicode to make it searchable, sharable and reusable, then the knowledge base created will remain inaccessible. Some of the Indic languages fortunately have more and more contributors creating Unicode content. There is a need to work on technological development to convert non-Unicode content to Unicode and open it up for people to use.

(more…)

Wikimedia sites get a new look on tablets

 

Tablet users, rejoice! The Wikimedia Mobile Web team has been working to optimize the mobile view of all our projects, so that reading, browsing, and editing content are all easier on mobile touch screens of any size. Now our changes are finally live on tablets, too!

Why a new tablet view?

Wikipedia and its sister sites were designed long before the rapid growth of smartphones and tablets. For the past two years, we’ve worked to improve the reading and editing experience for smartphone users, and now we’ve turned our attention to tablets. If you’ve used Wikipedia on your phone, you may recognize similarities in the new tablet view. But we’ve also departed from the smartphone experience in some ways, in order to create a tablet-specific experience.

Just the features you need, designed the way you need them

  • Typography and layout. We’ve increased the font size and narrowed the width of the content area to improve readability. These changes are responsive, too, so it looks great whether you’re on a tablet, a phablet – or even the mobile site on your desktop computer.
  • Table of contents and sections. Get to the section you need quicker, but don’t be afraid to lose yourself in the content once you’re there. We’ve taken advantage of the larger screen space that tablets provide and kept article sections open to encourage long-form reading.
  • Last modified byline. Wikipedia is never finished. Getting more readers to see that our content is constantly growing and evolving is a big priority for us. Now you can see at a glance which articles have been edited recently, and which could use some love from contributors like you…
  • Editing. See a typo? Fix it! Simple formatting options and mobile-friendly linking to pages or references are coming soon for all tablet users, and starting this Thursday you can get a preview of this functionality now by opting into our experimental beta site (look for Settings in the site menu and tap to turn on Beta).
  • Other features. The contribution features you know and love, optimized for tablets: uploads, watchlist, page history, notifications, and more.

Your tablet, your choice

If you don’t want to leave the old desktop experience, fear not. You can switch between the desktop view and mobile view from any page by scrolling to the bottom and tapping the “Desktop” or “Mobile” links.

How can I give feedback?

We’re excited to hear from you about these changes! Leave us a comment here and let us know what you think.

Maryana Pinchuk, Product Manager, Mobile

Wikimedia engineering report, May 2014

Major news in May include:

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
(more…)