This data package contains datasets of text on maps recognized from a georeferenced subset of ~57k historical maps from the David Rumsey Map Collection. It includes files for different versions of the dataset accompanied by Data Sheets and this Cover Sheet. Each dataset was created with the \textit{mapKurator} software library and contains all instances of text as it was automatically identified on a single, physical map sheet, or a composite of sheets. Each text identified by \textit{mapKurator} is saved as a polygon json `Feature` with geospatial coordinates for 16 vertices and has the following properties: `text` (the \textit{mapKurator} predicted transcription of the text within the polygon), `score` (the confidence score for the text detection), `postocr\_label` (a prediction for a normalized version of `text` that is based on a post-processing step), and `img\_coordinates` (the pixel coordinates of the polygon). This data has not been verified or evaluated in any way and will contain significant errors, both in terms of false positives (map features recognized as text, but which are not in fact text), (missing) false negatives (text that \textit{mapKurator} has failed to recognize), incorrect detection (bounding polygons that do not correctly enclose a word), and incorrect recognition (the transcription of the text contains errors). Please see the Data Sheet for each version for more details.
Date made available | 2023 |
---|
Publisher | Stanford Libraries |
---|
Date of data production | 2023 |
---|
Geographical coverage | Global |
---|