Submissions/The Pleasures and Pains of Analyzing All the Wikis in Realtime

After careful consideration, the Programme Committee has decided not to accept the below submission at this time. Thank you to the author(s) for participating in the Wikimania 2015 programme submission, we hope to still see you at Wikimania this July.

Submission no.
3046
Title of the submission

The Pleasures and Pains of Analyzing All the Wikis in Realtime

Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)

Tutorial & Discussion

Author of the submission
  • Max Klein
  • Anthony Di Franco
E-mail address

isalix@gmail.com di.franco@gmail.com

Username
Country of origin

U.K. and U.S.A.

Affiliation, if any (organisation, company etc.)
Personal homepage or blog
Abstract (at least 300 words to describe your proposal)
Cocytus, river of lamentation.
Cocytus, river of lamentation.

What started off as the problem of tracking citations eventually lead us to develop a much more general solution - a tool to track all the edits of all Wikis in realtime. With this a new world of possibilities opens up: tracking the trends in what people are writing about, allowing users to receive alerts on edits based on custom queries on article and edit content. These ideas are far away, but we can bring them closer by joining together in building the platform. This introduction is a tutorial in what exists so far, and will present an agenda for collaborating on the next steps.

The name "Cocytus" shared a lot of syllables with words describing our original goal, tracking citations as they appear in the recent changes stream. Cocytus is the river of lamentation that flows around Hades.

Screenshot of prototype of cocytus, realtime citation watcher in Wikipedia.

In this presentation we will cover the state of the art technologies and efforts and pitfalls in monitoring Wikipedia in real-time.

Technologies we will cover are:

  • RCstream and websockets.
  • Wikimedia labs.
  • Mediawiki diff API.
  • Wikitext parsing.
  • Stream rebroadcasting.

We also hope to brainstorm and organize future uses and development of a community platform.

Our Future Uses Brainstorm:

  • Using the changes queue directly
  • Trend tracking with dynamic topic modeling (More on this here)
  • Real-time wikimedia analytics in the style of social media analytics and search
  • Alerts based on stream queries.

Lastly we hope to record the experiences of all developers contributing as advice to submit as feedback back to Wikimedia Foundation and Wikimedia Labs.


Track
  • Technology, Interface & Infrastructure
Length of session (if other than 30 minutes, specify how long)
30 minutes
Will you attend Wikimania if your submission is not accepted?
Yes
Slides or further information (optional)


Special requests


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Daniel Mietchen (talk) 01:05, 15 February 2015 (UTC)[reply]
  2. EpochFail (talk) 13:43, 27 February 2015 (UTC)[reply]
  3. DarTar (talk) 00:41, 6 March 2015 (UTC)[reply]
  4. What a pain in the toe it is that there are so many languages and projects :) Amir E. Aharoni (talk) 16:16, 6 March 2015 (UTC)[reply]