This dataset was presented as part of the ICLR 2023 paper ๐ ๐ง๐ณ๐ข๐ฎ๐ฆ๐ธ๐ฐ๐ณ๐ฌ ๐ง๐ฐ๐ณ ๐ฃ๐ฆ๐ฏ๐ค๐ฉ๐ฎ๐ข๐ณ๐ฌ๐ช๐ฏ๐จ ๐๐ญ๐ข๐ด๐ด-๐ฐ๐ถ๐ต-๐ฐ๐ง-๐ฅ๐ช๐ด๐ต๐ณ๐ช๐ฃ๐ถ๐ต๐ช๐ฐ๐ฏ ๐ฅ๐ฆ๐ต๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ ๐ข๐ฏ๐ฅ ๐ช๐ต๐ด ๐ข๐ฑ๐ฑ๐ญ๐ช๐ค๐ข๐ต๐ช๐ฐ๐ฏ ๐ต๐ฐ ๐๐ฎ๐ข๐จ๐ฆ๐๐ฆ๐ต.
It is a framework that, based on this dataset (a subset of the ImageNet-21k dataset) is able to generate a C-OOD (AKA open-set recognition) benchmark that covers a variety of difficulty levels. these benchmarks are tailored to the evaluated model. This approach provides a more accurate representation of the modelโs own performance.
The resulting difficulty levels of our framework allow benchmarking with respect to the difficulty levels most relevant to the task. For example, for a task with a high tolerance for risk (e.g., a task for an entertainment application), the performance of a model on a median difficulty level might be more important than on the hardest difficulty level (severity 10). The opposite might be true for some applications with a low tolerance for risk (e.g., medical applications), for which one requires the best performance to be attained even if the OOD is very hard to detect (severity 10). The paper in which the framework was introduced showed that detection algorithms do not always improve performance on all inputs equally, and could even hurt performance for specific difficulty levels and models. Choosing the combination of (model, detection algorithm) based only on the detection performance on all data may yield sub-optimal results for our specific desired level of difficulty.
Paper | Code | Results | Date | Stars |
---|