Image recognition (object detection)

What is YOLO? “You Only Look Once: Unified, Real-Time Object Detection” is an abbreviation of deep learning, which is a so-called End-to-End (all processes are processed by deep neural network) method.*1.

In conventional image recognition, Regions with Convolutional Neural Networks(R-CNN) and other methods have been used. The methods used in R-CNN and other systems use selective search, which is a method of finding regions that are highly likely to be objects. Object detection was performed by detecting areas that appear to be objects in images, and then inputting them to a deep neural network for identification processing. However, the problem with this method is that it takes a long processing time because all extracted regions are input to the deep neural network. On the other hand, YOLO is very fast and has been shown to significantly outperform all other detection methods, including R-CNN.

*1 Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, 2015

Fig.1 Object detection by YOLO

Fig.2 Identification of real humans and forklift trucks