Home > Research > Datasets > David Rumsey Map Collection Text on Maps Data [...
View graph of relations

David Rumsey Map Collection Text on Maps Data [Version 2]

Dataset

Description

This data package contains datasets of text on maps recognized from a georeferenced subset of ~57k historical maps from the David Rumsey Map Collection. It includes files for different versions of the dataset accompanied by Data Sheets and this Cover Sheet. Each dataset was created with the \textit{mapKurator} software library and contains all instances of text as it was automatically identified on a single, physical map sheet, or a composite of sheets. Each text identified by \textit{mapKurator} is saved as a polygon json `Feature` with geospatial coordinates for 16 vertices and has the following properties: `text` (the \textit{mapKurator} predicted transcription of the text within the polygon), `score` (the confidence score for the text detection), `postocr\_label` (a prediction for a normalized version of `text` that is based on a post-processing step), and `img\_coordinates` (the pixel coordinates of the polygon). This data has not been verified or evaluated in any way and will contain significant errors, both in terms of false positives (map features recognized as text, but which are not in fact text), (missing) false negatives (text that \textit{mapKurator} has failed to recognize), incorrect detection (bounding polygons that do not correctly enclose a word), and incorrect recognition (the transcription of the text contains errors). Please see the Data Sheet for each version for more details.
Date made available2023
PublisherStanford Libraries
Date of data production2023
Geographical coverageGlobal

Contact person

Relations

Impacts

  • David Rumsey x Machines Reading Maps Collaboration

    Impact: Cultural Impacts

Press/Media

Research outputs