Submissions/Would you like some artificial intelligence with that?
This is an accepted submission for Wikimania 2015.
- Submission no.
- Title of the submission
- Would you like some artificial intelligence with that?
- Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
- Author of the submission
- Aaron Halfaker, とある白い猫
- E-mail address
- Special:EmailUser/とある白い猫 - email@example.com
- Country of origin
- Residing in Brussels, Belgium
- Affiliation, if any (organisation, company etc.)
- Volunteer on Wikimedia websites (Wikipedia, Commons, etc.)
- Personal homepage or blog
- Abstract (at least 300 words to describe your proposal)
- See #Abstract section
Artificial Intelligence (AI) is a branch of computer science that makes use of computers to perform tasks that we commonly associate with intelligent humans. While we don't have AIs that are smart enough to replace a human Wikipedia editor, many AI strategies are very good at automating some of the more monotonous and voluminous tasks. Given the scale at which Wikimedia projects operate, there potential to have a massive impact on the tractability of managing an open knowledge products. Yet building and maintaining useful AI is hard, so we don't have very much of it. Instead, most AIs are built purely for academic study and never make it back to the wiki to help us improve our content. In this presentation, we'll argue AIs are so important for massive open knowledge projects like ours, propose strategies for collaboratively developing AIs to support our work, and we'll call attention to some projects that are in the words.
Not smart, but smart enough
Regretfully, we're not yet at the point where an AI can copyedit articles or fact-check statements in Wikidata without supervision, but we can still do a few thing that are very powerful. We'll give a brief overview of a set of AI strategies that can be employed to support wiki work: classification, clustering, similarity/relatedness & network centrality.
AI algorithms can be thought of as black boxes that you do not need to know how they work in order to use them. There are many type of these "black boxes" by no means we will cover them all in this presentation but we will mainly focus on three types:
- classification, which are supervised machine learning algorithms
- clustering and semantic relatedness, which are unsupervised machine learning algorithms
Supervised learning algorithms take labelled input to train on features in the data and any subsequent data is classified based on this background knowledge. Classification algorithms from this family are more usable to determine if edits are harming (say vandalism), if edits are made in good or bad faith as we already have a large amount of labelled data in this areas.
Unsupervised learning algorithms on the other hand try to distinguish information by seeking a hidden pattern in unlabelled data. Clustering algorithms from this family are more usable to for example determine the type of edits where we do not have pre-labelled data by grouping/clustering similar revisions together. Mind that clustering algorithms typically do not actually know what each cluster pertains to - rather it is only able to determine that elements within each cluster has similar (meta) features hence that they must be related.
To take it up a notch, there also are semantic relatedness metrics where we group items based on their semantics/meaning. Algorithms for semantic relatedness will cluster can be used to help readers find interesting new content, this may appear to fulfil a role much similar to what categories perform today but it can generate more relevant suggestions based on context of the articles and even recent activity of the reader. Furthermore we can use network centrality measures to determine important gaps in content converge and direct editors that would be interested in creating or improving the missing content.
But how do we do it?
AI is hard, but so is writing an encyclopedia. Both require a substantial time investment and healthy dose of subject matter expertise. Fortunately on the open production model of Wikipedia and open source software makes time and expertise cheap by lowering the bar of entry and bringing many people together to solve a common problem. We'd like to call attention to a set of active and open of AI projects that are designed to make the development of intelligent Wiki tools easier.
- WikiBrainAPI -- https://meta.wikimedia.org/wiki/Grants:IEG/WikiBrainTools
- Automated Notability Detection -- https://meta.wikimedia.org/wiki/Grants:IEG/Automated_Notability_Detection
- Revision scoring as a service -- https://meta.wikimedia.org/wiki/Grants:IEG/Revision_scoring_as_a_service
- Technology, Interface & Infrastructure
- Length of session (if other than 30 minutes, specify how long)
- 30 minutes
- Will you attend Wikimania if your submission is not accepted?
- Slides or further information (optional)
- Special requests
- If possible, please schedule in a session together with Submissions/The Revision Scoring Service -- Exposing Quality to Wiki Tools with this presentation first.
If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).