Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Wikidata

Digging for Data: How to Research Beyond Wikimetrics

The next virtual meet-up will point out research tools. Join!!

For Learning & Evaluation, Wikimetrics is a powerful tool for pulling data for wiki project user cohorts, such as edit counts, pages created and bytes added or removed. However, you may still have a variety of other questions, for instance:

How many members of WikiProject Medicine have edited a medicine-related article in the past three months?
How many new editors have played The Wikipedia Adventure?
What are the most-viewed and most-edited articles about Women Scientists?

Questions like these and many others regarding the content of Wikimedia projects and the activities of editors and readers can be answered using tools developed by Wikimedians all over the world. These gadgets, based on publicly available data, rely on databases and Application Programming Interfaces (APIs). They are maintained by volunteers and staff within our movement.

On July 16, Jonathan Morgan, research strategist for the Learning and Evaluation team and wiki-research veteran, will begin a three-part series to explore some of the different routes to accessing Wikimedia data. Building off several recent workshops including the Wiki Research Hackathon and a series of Community Data Science Workshops developed at the University of Washington, in Beyond Wikimetrics, Jonathan will guide participants on how to expand their wiki-research capabilities by accessing data directly through these tools.

(more…)

Samskrita Bharati and Sanskrit Wikipedia: The journey ahead

“Aksharam,” Samskrita Bharati Office in Bangalore.

In 1981, a movement called the “Speak Samskrit Movement” started in Bangalore. The effort quickly spread across India and evolved into the organization “Samskrita Bharati” in 1995. The movement has a number of dedicated volunteers who aim to popularize the Sanskrit language, Sanskrit culture and the Knowledge Tradition of India.[1]

Inline with these objectives, Samskrita Bharati embarked upon a mission to enrich Sanskrit Wikipedia in 2011. This project involved approximately 50 volunteers with some of them working full-time. Most of the contributors are based in Bangalore, Karnataka or Karnavati, Gujarat. As a result of tremendous effort and dedication, the team was able to substantially grow the number of articles on Sanskrit Wikipedia. With only 2,000 articles in 2011, mostly written in Hindi, the present number of articles is well over 10,000, with articles ranging from geography and history to health and society.

In terms of editing difficulties, Samskrita Bharati editors, like other Sanskrit Wikipedians, encountered difficulty in the use of modern terminology and the paucity of referenceable literature. Most of the contributors of Sanskrit Wikipedia are from the Southern region, resulting in confusion due to pronunciation differences between northern and southern regions for some Sanskrit words.

As part of the outreach efforts, Samskrita Bharati conducted introductory workshops in many educational institutions like Karnataka Samskrit University, Delhi University and Christ University, Bangalore.

(more…)

Insight into Wikimedia Germany – Impressions of a FDC member

A session  from   FDC site visit  to WMDE.

I had the first opportunity to learn about Wikimedia chapters when I participated in the Wikimedia conference in 2011, representing the newly formed Wikimedia India chapter. While  presentations and interactions were useful, the visit to Wikimedia Deutschland (WMDE)  chapter office was very helpful, as I had the chance to meet the staff and learn about the chapter’s various activities. After that, as a member  of the Funds Dissemination Committee (FDC) from 2012, I had the opportunity to learn even more about chapters. Though the initial FDC framework and process was the result of extensive deliberations of FDC Advisory  group supported  by consultants from Bridgespan, some of the gaps became apparent after the first round of FDC deliberations. My major concern was that the process/framework  did not account for the diverse attributes of chapters and relied on impersonal wiki pages for discussions, even when substantive funds are at stake.

When the opportunity to visit WMDE came up, I signed up immediately, as it  gave me the opportunity to meet with the chapter’s leadership and stakeholders in person  - allowing me to better understand their plans and the challenges they face. I, Mike Peel, another FDC Member along with WMF staff team of Anasuya, Garfield and Frank visited WMDE during  Feb 5-7,  2014 in Berlin.

In this report, I would like to highlight my impressions on the chapter programs and their evolution as well as share thoughts on WMDE chapter performance and areas for improvement.

Chapter Structure and Infrastructure

Pavel explaining the  WMDE Events for the year marked on wall calendar

Pavel informed us that the chapter had recently moved into their new facility. He  took us for  a tour of the facility, highlighting the thought process behind the design which allows flexible use of space and also meets the requirements of various stakeholders. The chapter has about 64 full time and part time staff (equivalent to 45 FTE). They are organized into four program teams and one operational/admin team. Each team has appropriate workspaces and computing infrastructure. There was a good number of small meeting rooms and a large conference room. The facility also includes an event room which can accommodate up to 99 participants. The room is used at least two days a week for various chapter programs and its partners. A large calendar on the event room wall has  the details of programs that have been planned till the end of the year, leaving no doubt about the effective use of infrastructure. The event room is designed in such a way that part of it can be used as workspace if required by putting up partitions.

(more…)

Introducing the Wikidata “Concept Cloud”

Concept cloud.jpg

On Wikidata, a free knowledge base about the world that can be read and edited by humans and machines alike, all the Wikipedia articles on the same subject are bundled in a Wikidata item. All these articles are written in different languages and they all refer to other Wikipedia articles through wiki links. These wiki links are known as Wikidata items as well. When you aggregate all these items, you get all the concepts that are related to the original subject and together make up a “concept cloud.”

The beauty of a “concept cloud” is that, as all these subjects are related, they are likely to be more relevant when the subject of a “concept cloud” is what is in the news. It is assumed that what is considered “World News” will be of relevance to all the languages Wikidata supports. It is therefore likely that all the items in a “concept cloud” are more sought after in all these languages.

When you search for information, Wikidata provides more labels than Wikipedia provides articles. Even more powerful is the fact that, for Wikidata, every language is equal. For any language, Wikidata provides the same statements, the same links to Commons and the same visualization in the “Reasonator.”

Wikidata works best when labels exist for the concepts, the properties and the qualifiers that are used. As more labels for related concepts are available, the information becomes more complete. For best results, our challenge is to stimulate people to add more labels.

The new “Concept Cloud” tool presents you with the “concept cloud” for Wikidata items. You can select a language you know and find if there is a label, and for existing labels you can check if they are properly written. They should not be capitalized unless they are always capitalized and spelling has to be the standard spelling.

Every day we are going to present another item that is in the news. Things will return in the news, but we are confident that there is always something that can be done. As this is the first iteration of the “Concept cloud” tool created by Magnus Manske, we do seek to get feedback. Do you like it? What can be improved?

Yes, there are other applications for a “concept cloud”… be creative and either implement them or let us hear about them.

Gerard Meijssen

Wikidata

Any language allowed in Wikidata

Language Committee Logo

The Language Committee of the Wikimedia Foundation, which is in charge of developing and processing new language projects, has decided that any language should be admissible for use on Wikidata. As always this comes with several considerations.

    • The language needs to have an ISO-639-3 code, which is a numeric representation of language names particularly in computer systems. Languages used with multiple scripts need to be configured in this way.
    • Historic languages are permitted; newly minted words are not.
    • Constructed languages are permitted.
    • Language Localisation on MediaWiki is not required for the use of Wikidata.

When content is added by users that do not comply with the prescribed conditions, the labels added by such users will be removed.

As Wikidata moves towards a repository of useful statements, it is likely that this information will be presented in an increasing number of Wikipedias. As items in Wikidata are enriched, all infoboxes in various Wikipedias that rely on data from Wikidata will be enriched as well.

Gerard Meijssen, Language Committee

October report for the Wikisource vision development and online survey

Aubrey, Micru, MarkTraceur, and Amire80 during Wikimania 2013.

With the grant period for the Wikisource strategic vision process reaching its conclusion, we would like to make two major announcements. We’re happy to introduce the open survey for Wikisource contributors and supporters (translated by volunteer community members into 11 languages). The survey is a great way for the community to voice its opinion regarding Wikisource and its future. We  hope you will spend 10 minutes filling it out. It’s worth it.

The second announcement is that Aubrey and Micru, will continue our volunteering efforts to kick-start the Wikisource User Group for one more month. Except for the final name (pending approval from the Wikimedia Foundation legal team), all necessary steps have been taken to assure that all Wikisourcerors can use this association to find and shape development priorities, and as a way to have international coordination.

We attended Wikimania and Google Summer of Code projects. The Wikimania days were as hectic as they were productive. Aubrey participated in the Open Access panel, and presented Wikisource as a positive tool for the scientific community, provided that Wikisource users possess a desire to support digital-born documents. Micru participated in several discussions about how Wikidata can support bibliographic information and how it can all come together with external organizations.

With regards to the Google Summer of Code projects, you can read the final reports from the grantees that worked hard during the summer months on projects that can be used for Wikisource:

David Cuenca (User:Micru)
Andrea Zanni (User:Aubrey) 

The Wikidata revolution is here: enabling structured data on Wikipedia

The logo of Wikidata

A year after its announcement as the first new Wikimedia project since 2006, Wikidata has now begun to serve the over 280 language versions of Wikipedia as a common source of structured data that can be used in more than 25 million articles of the free encyclopedia.

By providing Wikipedia editors with a central venue for their efforts to collect and vet such data, Wikidata leads to a higher level of consistency and quality in Wikipedia articles across the many language editions of the encyclopedia. Beyond Wikipedia, Wikidata’s universal, machine-readable knowledge database will be freely reusable by anyone, enabling numerous external applications.

“Wikidata is a powerful tool for keeping information in Wikipedia current across all language versions,” said Wikimedia Foundation Executive Director Sue Gardner. “Before Wikidata, Wikipedians needed to manually update hundreds of Wikipedia language versions every time a famous person died or a country’s leader changed. With Wikidata, such new information, entered once, can automatically appear across all Wikipedia language versions. That makes life easier for editors and makes it easier for Wikipedia to stay current.”

The Wikidata entry on Johann Sebastian Bach (as displayed in the “Reasonator” tool), containing among other data the composer’s places of birth and death, family relations, entries in various bibliographic authority control databases, a list of compositions, and public monuments depicting him

The dream of a wiki-based, collaboratively edited repository of structured data that could be reused in Wikipedia infoboxes goes back to at least 2004, when Wikimedian Erik Möller (now the deputy director of the Wikimedia Foundation) posted a detailed proposal for such a project. The following years saw work on related efforts like the Semantic MediaWiki extension, and discussions of how to implement a central data repository for Wikimedia intensified in 2010 and 2011.

The development of Wikidata began in March 2012, led by Wikimedia Deutschland, the German chapter of the Wikimedia movement. Since Wikidata.org went live on 30 October 2012, a growing community of around 3,000 active contributors started building its database of ‘items’ (e.g. things, people or concepts), first by collecting topics that are already the subject of Wikipedia articles in several languages. An item’s central page on Wikidata replaces the complex web of language links that previously connected these articles about the same topic in different Wikipedia versions.

Wikidata’s collection of these items now numbers over 10 million. The community also began to enrich Wikidata’s database with factual statements about these topics (data like the mayor of a city, the ISBN of a book, the languages spoken in a country, etc.). This information has now become available for use on Wikipedia itself, and Wikipedians on many language Wikipedias have already started to add it to articles, or discuss how to make best use of it.

“It is the goal of Wikidata to collect the world’s complex knowledge in a structured manner so that anybody can benefit from it,” said Wikidata project director Denny Vrandečić. “Whether that’s readers of Wikipedia who are able to be up to date about certain facts or engineers who can use this data to create new products that improve the way we access knowledge.”

The next phase of Wikidata will allow for the automatic creation of lists and charts based on the data in Wikidata. Wikimedia Deutschland will continue to support the project with an engineering team that is dedicated to Wikidata’s second year of development and maintenance.

Wikidata is operated by the Wikimedia Foundation and its fact database is published under a Creative Commons 0 public domain dedication. Funding of Wikidata’s initial development was provided by the Allen Institute for Artificial Intelligence [AI]², the Gordon and Betty Moore Foundation and Google, Inc.

Tilman Bayer, Senior Operations Analyst, Wikimedia Foundation

More information available here:

Some of the first applications demonstrating the potential of Wikidata:

  • http://simia.net/treeoflife/ – a (still very incomplete) “tree of life” drawn from relations among biological species in Wikidata’s database
  • “GeneaWiki” generates a graph showing a person’s family relations as recorded in Wikidata, example: Bach family

Translate Wikidata’s user interface and open it to the world

Wikidata is one of the most important and exciting innovations in the world around Wikipedia. To make it accessible to a wide range of users, it needs its user interface to be translated to as many languages as possible, and you can help.

At the first stage, already partly enabled, Wikidata stores “interwiki links”, i.e. page metadata that connect articles about a same topic on different language versions of Wikipedia. Historically, these interwiki links have been duplicated and stored in each of the pages they linked together. With Wikidata, the list of pages about a same topic is centralized.

The next goal of Wikidata is to store not only page metadata like interwiki links, but also common data that is repeated in all languages, such as census data for cities and dates of birth and death of famous authors.

Practically all the projects that are related to Wikipedia are massively multilingual, but Wikidata is especially so: it stores common data with the goal of displaying it efficiently in all languages.

The very useful and famous CIA World Factbook site has tables of data about all countries in the world, but the labels are only written in English. Now imagine a site with such tables, but with the ability to display the labels in any language and not just English: that’s what Wikidata aims to become.

In the near future, the translation of such table labels will be done on the Wikidata website itself. In the meantime, you can help by translating the user interface displayed by the software running Wikidata.

Translation of the Wikidata software is done on translatewiki.net, the same translation platform used to translate Wikipedia’s interface. Wikidata relies on three main components that need translating: Wikibase – Repo, Wikibase – Client and Wikibase – Lib.

Wikipedia made encyclopedic articles open and accessible; Wikidata is about to do the same to statistics and other structured information. To ensure that people speaking your language can benefit from the immense potential of Wikidata, and contribute to its success,  please join us today and help us translate it.

Thank you!

Amir Aharoni
Software Engineer (Internationalization)

Wikidata Summit kicks off in Berlin

The 2-day event is focusing on Wikidata and RENDER, technologies to integrate structured data with Wikipedia and its sister sites.

The Wikidata & RENDER summit, a 2-day technical event focusing on the integration of structured data with Wikipedia, started today in Berlin, Germany, as a prologue to the Wikimedia Hackathon.

The event, organized by Wikimedia Deutschland, consists of workshops, presentations and coding, split into two tracks: one on Wikidata, and the second on RENDER.

The Wikidata project was announced earlier this year; its goal is to build the software infrastructure to support a common source of structured data that can be used in all Wikipedia articles, regardless of their language.

It would work in the same way that images and other multimedia content from Wikimedia Commons can be embedded into any page on a Wikimedia site.

Wikidata is expected to lead to a higher consistency and quality within Wikipedia articles, increased availability of information in the smaller language editions, and decreased maintenance effort for Wikipedia volunteers.

RENDER, the other focus of this summit, is a EU-funded project aimed at developing methods, techniques, software and data sets for scholars and readers (such as Wikipedia users) to understand, describe, process and make use of the diversity of knowledge and information.

About fifty people were invited to attend: they are Wikimedia Deutschland engineers, Wikimedia Foundation engineers, and volunteer MediaWiki developers, with expertise in structured data, MediaWiki and Wikimedia projects.

About 50 engineers and volunteer developers have gathered in Berlin for this prelude to the Wikimedia hackathon.

Sessions will be held today and tomorrow at Station-berlin – Hall 6, the same venue where the Berlin Hackathon 2012 (a.k.a. “Wikimedia Dev days”) will take place, starting tomorrow evening.

Follow and participate

We don’t have live video streaming of the event, but you can follow what’s happening on site through a variety of channels:

  • participants are taking live collaborative notes that will be posted on wiki when sessions are over;
  • they’re also posting information snippets on Twitter and Identi.ca; join the discussion with the #wikidata and #RENDER hashtags;
  • last, you can join us on IRC in the #wikimedia-wikidata and #mediawiki channels on Freenode.

Let us know on IRC or in the comments below if we can do anything else to let you participate remotely.

Guillaume Paumier
Technical communications manager

The Wikipedia data revolution

The second phase of Wikidata will aim to augment the infoboxes which are currently widely used on Wikipedia to display structured data

Wikimedia Deutschland, the German chapter of the Wikimedia movement, and the Wikimedia Foundation are proud to announce Wikidata, a collaboratively edited database of the world’s knowledge and the first new Wikimedia project since 2006.

Wikidata will support the more than 280 language editions of Wikipedia with one common source of structured data that can be used in all articles of the free encyclopedia. Wikidata is expected to lead to a higher consistency and quality within Wikipedia articles, as well as increased availability of information in the smaller language editions. At the same time, Wikidata will decrease the maintenance effort for the 90,000 volunteers editing Wikipedia.

“Wikidata is ground-breaking. It is the largest technical project ever undertaken by one of the 40 international Wikimedia chapters,” said Pavel Richter, CEO of Wikimedia Deutschland. “Wikimedia Deutschland is thrilled and dedicated to significantly improving the data management of the world’s largest encyclopedia with this project.”

In addition to the Wikimedia projects, the data is expected to be beneficial for numerous external applications, especially for annotating and connecting data in the sciences, in government, and for applications using data in very different ways. The data will be published under a free Creative Commons license.

The initial development of Wikidata is being funded with a donation of 1.3 million Euros, half of which comes from the Allen Institute for Artificial Intelligence [ai]². The Institute supports long-range research activities that have the potential to accelerate progress in artificial intelligence. It was established in 2010 by Microsoft co-founder Paul G. Allen, whose contributions to philanthropy and the advancement of science and technology span more than 25 years.

“Wikidata is a simple and smart idea, and an ingenious next step in the evolution of Wikipedia,” said Dr. Mark Greaves, Vice President of the Allen Institute for Artifical Intelligence. “It will transform the way that encyclopedia data is published, made available, and used by a global audience. Wikidata will build on semantic technology that we have long supported, will accelerate the pace of scientific discovery, and will create an extraordinary new data resource for the world.”

One quarter of Wikidata’s initial funding has been donated by the Gordon and Betty Moore Foundation through its Science Program. ”It is important for science,” said Chris Mentzel, Gordon and Betty Moore Foundation science program officer. “Wikidata will both provide an important data service on top of Wikipedia, and also be an easy-to-use, downloadable software tool for researchers, to help them manage and gain value from the increasing volume and complexity of scientific data.”

Google, Inc. has provided another quarter of Wikidata’s funding. “Google’s mission is to make the world’s information universally accessible and useful,” said Chris DiBona, Director, Open Source at Google. “We’re therefore pleased to participate in the Wikidata project which we hope will make significant amounts of structured data available to all.”

Wikidata will be developed in three phases. The first phase is expected to be finished by August 2012. It will centralize links between the different language versions of Wikipedia. In the second phase, editors will be able to add and use data in Wikidata. The results of the second phase are scheduled to be released in December 2012. The third and final phase will allow for the automatic creation of lists and charts based on the data in Wikidata. This will close the initial development process for Wikidata.

The team of eight developers is being led by Dr. Denny Vrandečić. Formerly of the Karlsruhe Institute of Technology, he works with Wikimedia Deutschland and is, together with Dr. Markus Krötzsch, of the University of Oxford, co-founder of the Semantic MediaWiki project, which has pursued the goals of Wikidata for the last few years. The proposal for Wikidata was developed with financial support by the EU project RENDER, which also involves Wikimedia Deutschland as a use-case partner.

Wikimedia Deutschland will perform the initial development, and plans to hand over operation and maintenance of the project to the Wikimedia Foundation by March 2013.

Matthew Roth
Global Communications Manager