Submissions/The next million articles in Wikipedia

This is an accepted submission for Wikimania 2015.

Submission no.
Title of the submission
The next million articles in Wikipedia
Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
Author of the submission
E-mail address
Country of origin
Affiliation, if any (organisation, company etc.)
  • Stanford University
  • Wikimedia Foundation
Personal homepage or blog
Abstract (at least 300 words to describe your proposal)

Wikipedia is the largest encyclopedia in human history and a main source of knowledge for many people, with more than 26 million page views per hour. Though the access to Wikipedia content is quite impressive, the online encyclopedia has a lot of missing content across all languages it supports: There are about 5 million articles in the English Wikipedia, but even this largest of all Wikipedias is far from complete, and incompleteness is even more substantial for smaller language versions.

The goal of this project is to identify missing content in Wikipedia and recommend the best editors to write those articles through a three-step process. First, we use interlanguage links from Wikidata to identify the missing articles in each language version. This step creates many missing-article candidates for each language, in fact too many than all current editors can reasonably tackle. Therefore, the list of missing articles needs to be prioritized. Hence, in the second step we leverage access-volume statistics of existing articles to predict which new articles would be most accessed if they existed in a language version. Since not all editors are equally suited for writing about a given topic, the third and last step is editor selection: given a missing article and Wikipedia's complete edit history, we predict the best editors for creating and writing the missing article.

The results of this study will empower the editors by recommending articles to edit based on Wikipedia's edit history and the availability of content in other languages. The output of this research can provide valuable input to the ContentTranslation tool currently developed by the Language Engineering team.


Technology, Interface & Infrastructure

Length of session (if other than 30 minutes, specify how long)
30 minutes
Will you attend Wikimania if your submission is not accepted?
Slides or further information (optional)
Special requests

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Daniel Mietchen (talk) 02:47, 15 February 2015 (UTC)[reply]
  2. Mrjohncummings (talk) 05:24, 16 February 2015 (UTC)[reply]
  3. Pginer-WMF (talk) 06:51, 16 February 2015 (UTC)[reply]
  4. Jaredzimmerman (WMF) (talk) 08:56, 23 February 2015 (UTC)[reply]
  5. Sage (Wiki Ed) (talk) 01:05, 25 February 2015 (UTC)[reply]
  6. Harej (talk) 23:03, 25 June 2015 (UTC)[reply]
  7. --Ziko (talk) 12:58, 28 June 2015 (UTC)[reply]
  8. Santhosh.thottingal (talk) 01:33, 17 July 2015 (UTC)[reply]
  9. बिप्लब आनन्द (talk) 16:06, 17 July 2015 (UTC)[reply]
  10. Nabin K. Sapkota (talk) 16:10, 17 July 2015 (UTC)[reply]
  11. Darafsh Kaviyani (Talk) 20:29, 18 July 2015 (UTC)[reply]