Submissions/Mapping the Maps -- Help geolocate 50,000 maps and load them fully categorised onto Commons

After careful consideration, the Programme Committee has decided not to accept the below submission at this time. Thank you to the author(s) for participating in the Wikimania 2015 programme submission, we hope to still see you at Wikimania this July.

Submission no.
2093
Title of the submission
Mapping the Maps -- Help geolocate 50,000 maps and load them fully categorised onto Commons
Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
Presentation
Author of the submission
James Heald
E-mail address
j.heald@ucl.ac.uk
Username
Jheald
Country of origin
UK
Affiliation, if any (organisation, company etc.)
Personal homepage or blog
Abstract (at least 300 words to describe your proposal)
A joint British Library/Wikimedia initiative is trying to geolocate 50,000 maps of places from all across the world, in order that they can then be uploaded, fully categorised, to Commons.
The 50,000 maps were identified in a community effort over eight weeks in November and December 2014 from a set of 1,000,000 images that the BL posted to Flickr in November 2013, found in digitised scans of 19th century books; they currently have no metadata beyond the author and title and date of the book they were taken from.
This session will present the latest results and progress of the project (currently on track to go live in mid-March, with full wiki integration by early April), the geo-location, identification, and upload of the maps; and discuss some of the machinery behind it.
----
In November 2013 the British Library released a collection of 1,000,000 images to Flickr Commons, extracted from digitised 19th-century books. (see c:Commons:British_Library/Mechanical_Curator_collection)
The collection is a potentially rich resource for a wide variety of historical content, but search and discovery for appopriate reuse is hard: the metadata is limited to author, title, publisher and date at the book level, with nothing at the level of the individual image. As such, Wikimedia Commons felt it could not accept the images, other than through exhaustive hand-uploading, because without good metadata describing the content at image level, the content could not be made categorisable and so would simply not be discoverable.
As a first step, an index based on the subjects of the books (from which the images were drawn) was created on-wiki using shelfmarks with progressive hand refinement. This has acted as a guide for over 20,000 images to be uploaded to Wikimedia Commons on a book-by-book basis. However, the manual upload and description involved, and especially the categorisation into appropriate Wikimedia Commons categories, is very time-consuming.
However, one class of images that it should be possible to upload and categorise reasonably straightforwardly are maps and plans -- of which there are a large number in the 1,000,000 images, of locations from all over the world, because a particular focus area in the choice of books scanned were books on discovery, ethnography, travel, and local history worldwide. Using the WikiCommons index of books to drive the process and track progress, in November and December 2014 a wiki-based group of volunteers reviewed all the images, starting with a day-long event in London and continuing online, identifying almost 30,000 maps and plans and tagging them on Flickr. In addition a further 20,000 maps had been identified independently by the computational artist Mario Klingemann (@Quasimondo) using machine-supported pattern recognition methods.


The key second stage in now making these 50,000 maps and plans discoverable and useful, which is on track to begin in mid-March 2015, is to geolocate them on a modern map of the world. For this project we shall be using the Klokan/BL Georeferencer platform, again using the on-wiki index and progress pages to drive the process; future similar projects on Commons are likely to use Commons' own MapWarper, into which the BL data can be imported. The geolocation is done by identifying geographical points in common that can be found both on the digitised image and a current map ("georeferencing"). With enough points, this gives the satisfaction of being able to view the old map laid perfectly over the new map, so that one can fade up and down between the two and see exactly how they compare. The precise location and scale information gained also enables accurate machine identification and characterisation of the map -- by continent, country, state, county, city, and/or individual building -- making it possible to use machine methods to organise the maps discovered into human-meaningful human-browseable groups, and to create a pipeline to automatically upload a map to Commons with a full provisional categorisation, as soon as a volunteer has geo-referenced it.
This talk will explain the process, present the latest results and groups identified, and explore in more detail the machinery being used. It therefore gives a useful study of the challenge of good categorisation of images in or after a big upload -- what can be done by machine, and how it was done in this case; how far Wikidata can help (and what was done to strengthen the data on Wikidata for this project); and what on-wiki structures are then most useful to help people improve most efficiently on even the best initial categorisation by machine.
Both the tagging phase of 2014 and the current geolocation phase represent a valuable case-study for GLAM-wiki co-operation, in particular highlighting how the strength of the wiki platform, in the form of a series of community-created editable indexes to the collection on-wiki, has been used to efficently track and drive an off-wiki process (first Flickr tagging, now georeferencing) that has scaled well to the challenge of an initial tagging phase of 1 million uncharacterised images, and now a georeferencing phase of 50,000 maps -- ten times more than the largest set the BL has ever attempted to georeference in one go.
Most importantly, this talk hopes to encourage people to adopt some maps and georeference them; to spread the word wider at home and online; to help complete these 50,000 maps and get them uploaded to Commons; ready to move on to the similar number of Commons' existing old-map images, which will follow.


Track
GLAM Outreach
WikiCulture & Community
Length of session (if other than 30 minutes, specify how long)
30 minutes
Will you attend Wikimania if your submission is not accepted?
Unsure
Slides or further information (optional)
http://www.slideshare.net/JamesHeald/mapping-the-maps -- Ignite talk (5 minutes) given at Europeana Tech in Paris, 12 February.
These slides introduce the dataset; show how volunteers assisted by the wiki indexes achieved the Flickr-tagging of the maps over 8 weeks in Nov-Dec 2014; and the results of a first run of automated identification and grouping of the maps, based on a group of 3000 georeferenced as a pilot set.
The resulting sets are currently retrievable on Flickr with searches like
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=Scotland+geo:osm_scale=6&m=tags
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=Paris+geo:osm_scale=12&m=tags
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=Edinburgh&m=tags
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=MX+geo:osm_scale=4&m=tags
Special requests


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Daniel Mietchen (talk) 21:30, 28 February 2015 (UTC)[reply]
  2. Yarl (talk) 22:23, 28 February 2015 (UTC)[reply]
  3. Ocaasi (talk) 23:37, 4 March 2015 (UTC)[reply]
  4. Add your username here.