
Hello, I am Satoru Koda at AI Laboratory. Fujitsu has been developing technologies for realizing the world where people can safely use AI. In June 2024, we presented our achievement at CVPR 2024, a flagship conference in the computer vision field.
This outcome was achieved in our collaboration with Ben-Gurion University of the Negev.
Our developed technology enables us to keep object recognition models safe and performant throughout the entire AI lifecycle.
Paper Information
- Title: YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
- Venue: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
- Authors:Alon Zolfi1, Guy Amit1, Amit Baras1, Satoru Koda2, Ikuya Morikawa2, Yuval Elovici1, Asaf Shabtai1 (1: Ben-Gurion University of the Negev, 2: Fujitsu).
- Link to Paper: cvpr.thecvf.com
Background
Out-of-Distribution Detection
Recent object recognition AI can classify objects much accurately. Such models have learned the heuristics of recognizing objects emerged in the training set. However, even well-trained models cannot output correct recognition results on the objects that had not been presented in the training set. To make matters worse, it is widely known that such models unfortunately tend to recognize unknown objects as objects of the known categories with high confidence *1. As humans pretend to know everything, AI models make a wild guess on unknown objects with confidence (i.e., know-it-all).
Let us show an example of the know-it-all phenomenon; given that a multi-label image classifier that is trained to output the categories of all the animals emerged in a given image. Note that this model is not trained on any object other than the animals. Then, an image containing an unknown object for the model (e.g., "bicycle") is given. What would happen if the image is input into the classifier? The classifier may try to classify the unknown object into one of the known categories of the model (e.g., "deer"). When we face something unknown, we can do something to cope with it (e.g., google the image). However, AI models often reliably recognize unknown objects as known.
The above example does not sound serious. However, in safety-critical applications such as autonomous driving systems, the know-it-all phenomenon might cause fatal accidents. In the real-world, we continuously encounter never-seen-before objects. For AI models, of course, this phenomenon is realistic.
Therefore, the requirements for "sustainable" AI models are: (1) recognizing objects of the known categories and (2) discriminating objects of the known categories and objects of unknown categories.
Such unknown objects are called out-of-distribution (OOD) samples, and the task of discriminating OOD samples (or objects) and in-distribution (IND) objects is called OOD detection. If OOD detection is performed accurately, it becomes possible to prevent models from making unexpected inference (know-it-all) on unknown objects. Additionally, it also allows model developers to make a reasonable decision about model retraining on the basis of identified unknown objects. In this way, like security updates, models can be kept secure and performant. OOD detection plays an important role in the AI lifecycle.
Research Goal
This research focuses on multi-label classification, whose task is, given an input image, to assign object category labels of all the objects emerged in the image. We attempted to develop an OOD detection technology in the multi-label classification scenario. Ultimately, the developed technology will be able to make various applications of object recognition sustainable, such as inventory monitoring based on imagery and human tracking in videos.
Technology
YOLO -- Object Detection Model
Object "detection" models solve the task of localizing each of the objects in an image and assigning category labels to each localized object.
Among existing object detection techniques, YOLO*2 must be one of the most representative approaches.
The figure below shows an example of object detection results by YOLO.

In the object detection task, on each object in an image, models must output its bounding-box, which is the rectangle surrounding the object, and its object category. Whereas, in the object "recognition" task, which is the topic of the paper, models solve the task of multi-label classification. As multi-label classification models are required to output only categories, the output for the above image becomes a tuple compsosed of the animal categories, e.g., ("dog", "bicycle", "truck").
Apply YOLO to Multi-label Classification
As previously mentioned, object "detection" models learn the heuristics of localizing objects. In other words, such models implicitly learn that "there are not objects" on the outside of the object bounding-boxes. Therefore, object detectors have the inherent ability to distinguish between objects of interest and irrelevant objects (and background). This ability sounds compatible with the task of OOD detection.
Many of existing approaches for OOD detection employ negative learning, where negative samples are the samples of irrelevant objects. In the negative learning, models are trained not to recognize negative objects as any of the known objects. However, it requires preparing extra data of negative samples in addition to the original training set.
To mitigate this extra process, we employed object detection concept for negative learning in multi-label classification. By leveraging the inherent capability of object detection models such that they implicitly distinguish between objects of interest and irrelevant objects (and background), we tried to extend the capability of discriminating between known objects and unknown objects into multi-label classification models. To be more specific, our approach realizes negative learning in a single image by discriminating between objects of interest and irrelevant objects (and background) as object detection models do, which allows us to execute negative learning without additional data. We name this OOD detection approach YolOOD. (Instead, our approach requires bounding-box annotations that are not necessary for training multi-label classifiers; however, we also proposed a strategy to automatically annotate bounding-boxes to multi-label classification datasets.)
Evaluation
To testify the effectiveness of YolOOD, we conducted experiments comparing it with a SOTA approach JointEnergy *4. In most of the experimental settings, YolOOD outperformed JointEnergy. In an experimental configuration, for example, YolOOD reduced false positive rate, which is a percentage of recognizing unknown objects as known, from 24.46% to 12.19%.
Thus, YolOOD conveys the following merits: (1) models do not act unexpectedly over unknown objects, and (2) model owners can make a reasonable judgement on the timing of model retraining. As the results, models become secure in the entire AI lifecycle.
Presentation at CVPR
CVPR is one of the most challenging conferences to be accepted in the machine learning and computer vision community. In CVPR 2024, the number of submitted papers are 11,532, of which 23.6% were accepted. This year's CVPR was held in Seattle, America on June 19th-21st. Over 12,000 people participated in the conference, showing the significance of the conference.
From our project, Alon Zolfi (Ph.D. student @BGU), the first author of our paper, made poster presentation at the on-site conference.
(I could not attend it because of my wife's childbirth :).)
According to his report, many attendees had interests on our work, and he discussed with them fruitfully.

Note
Besides the presentation at CVPR, we have continuously developed and released technologies enhancing the trustworthiness of AI as follows:
- Satoru Koda, Ikuya Morikawa.: OOD-robust boosting tree for intrusion detection systems (IJCNN 2023)
- Satoru Koda, Alon Zolfi, Edita Grolman, Asaf Shabtai, Ikuya Morikawa, Yuval Elovici.: Pros and Cons of Weight Pruning for Out-of-Distribution Detection: An Empirical Survey (IJCNN 2023)
Fujitsu has been releasing many AI solutions e.g., through Fujitsu Kozuchi. We also take it serious consideration to address any AI-related risk (security, trust, privacy). In the future, we will advance R&D toward realizing the world where people can use AI with peace of mind.
*1:M. Hein et al.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem
*2:J. Redmon et al.: You Only Look Once: Unified, Real-Time Object Detection
*3:https://pjreddie.com/darknet/yolo/
*4:H. wang et al.: Can multi-label classification networks know what they don’t know?