Accepted author manuscript, 12.6 MB, Word document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - R-YOLO
T2 - A Real-Time Text Detector for Natural Scenes with Arbitrary Rotation
AU - Wang, Xiqi
AU - Zheng, Shunyi
AU - Zhang, Ce
AU - Li, Rui
AU - Gui, Li
PY - 2021/1/28
Y1 - 2021/1/28
N2 - Accurate and efficient text detection in the natural scene is a fundamental yet challenging task in computer vision, especially when dealing with arbitrary-oriented texts. Currently, the majority of text detection methods are designed to identify the horizontal or approximate horizontal text, which cannot satisfy various practical requirements in real-time detection such as image streams or videos. To address this gap, we proposed a novel method of Rotational You Only Look Once (R-YOLO), a robust real-time convolutional neural network (CNN) model to detect arbitrary-oriented texts in natural image scenes. First, the rotated anchor box with angle information was exploited to represent the text bounding box over different orientations. Second, features of different scales were extracted from the input image to achieve the probability, confidence, and inclined bounding boxes of the text. Finally, the Rotational Distance Intersection over Union Non-Maximum Suppression (RDIoU-NMS) is proposed to eliminate the redundancy and acquire the detection results with the highest accuracy. Experiments on benchmark comparison were conducted using four popular datasets, i.e., ICDAR2015, ICDAR2013, MSRA-TD500, and HRSC2016. For example, the proposed R-YOLO method obtains an F-measure of 82.3% at 62.5fps with 720p resolution on the ICDAR2015 dataset. The results demonstrate that the proposed R-YOLO method can outperform the state-of-the-art methods significantly in terms of detection efficiency and accuracy. The code will be released at: https://github.com/wxq-888/R-YOLO.
AB - Accurate and efficient text detection in the natural scene is a fundamental yet challenging task in computer vision, especially when dealing with arbitrary-oriented texts. Currently, the majority of text detection methods are designed to identify the horizontal or approximate horizontal text, which cannot satisfy various practical requirements in real-time detection such as image streams or videos. To address this gap, we proposed a novel method of Rotational You Only Look Once (R-YOLO), a robust real-time convolutional neural network (CNN) model to detect arbitrary-oriented texts in natural image scenes. First, the rotated anchor box with angle information was exploited to represent the text bounding box over different orientations. Second, features of different scales were extracted from the input image to achieve the probability, confidence, and inclined bounding boxes of the text. Finally, the Rotational Distance Intersection over Union Non-Maximum Suppression (RDIoU-NMS) is proposed to eliminate the redundancy and acquire the detection results with the highest accuracy. Experiments on benchmark comparison were conducted using four popular datasets, i.e., ICDAR2015, ICDAR2013, MSRA-TD500, and HRSC2016. For example, the proposed R-YOLO method obtains an F-measure of 82.3% at 62.5fps with 720p resolution on the ICDAR2015 dataset. The results demonstrate that the proposed R-YOLO method can outperform the state-of-the-art methods significantly in terms of detection efficiency and accuracy. The code will be released at: https://github.com/wxq-888/R-YOLO.
KW - scene text detection
KW - arbitrary-oriented text
KW - rotation anchor
KW - convolutional neural network
KW - YOLOv4
U2 - 10.3390/s21030888
DO - 10.3390/s21030888
M3 - Journal article
VL - 21
SP - 1
EP - 20
JO - Sensors
JF - Sensors
SN - 1424-8220
IS - 3
M1 - 888
ER -