ID	190397697
Max Iter	270000
lr sched	3x
FLOPs Input No	100
Backbone Layers	101
train time (s/iter)	0.291
Training Memory (GB)	5.2
inference time (s/im)	0.054

ID	190397773
Max Iter	90000
lr sched	1x
FLOPs Input No	100
Backbone Layers	50
train time (s/iter)	0.205
Training Memory (GB)	4.1
inference time (s/im)	0.041

ID	190397829
Max Iter	270000
lr sched	3x
FLOPs Input No	100
Backbone Layers	50
train time (s/iter)	0.205
Training Memory (GB)	4.1
inference time (s/im)	0.041

RetinaNet

facebookresearch / detectron2

Last updated on Feb 19, 2021

Parameters 57 Million

FLOPs 273 Billion

File Size 217.77 MB

Training Data MS COCO

Training Resources 8 NVIDIA V100 GPUs

Training Time 22 hours

Architecture	Focal Loss, FPN, ResNet
ID	190397697
Max Iter	270000
lr sched	3x
FLOPs Input No	100
Backbone Layers	101
train time (s/iter)	0.291
Training Memory (GB)	5.2
inference time (s/im)	0.054
SHOW MORE
SHOW LESS

Parameters 38 Million

FLOPs 206 Billion

File Size 145.08 MB

Training Data MS COCO

Training Resources 8 NVIDIA V100 GPUs

Training Time 5 hours

Architecture	Focal Loss, FPN, ResNet
ID	190397773
Max Iter	90000
lr sched	1x
FLOPs Input No	100
Backbone Layers	50
train time (s/iter)	0.205
Training Memory (GB)	4.1
inference time (s/im)	0.041
SHOW MORE
SHOW LESS

Parameters 38 Million

FLOPs 206 Billion

File Size 145.08 MB

Training Data MS COCO

Training Resources 8 NVIDIA V100 GPUs

Training Time 15 hours

Architecture	Focal Loss, FPN, ResNet
ID	190397829
Max Iter	270000
lr sched	3x
FLOPs Input No	100
Backbone Layers	50
train time (s/iter)	0.205
Training Memory (GB)	4.1
inference time (s/im)	0.041
SHOW MORE
SHOW LESS

README.md

Summary

RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-self convolutional network. The first subnet performs convolutional object classification on the backbone's output; the second subnet performs convolutional bounding box regression. The two subnetworks feature a simple design that the authors propose specifically for one-stage, dense detection.

How do I load this model?

There are several RetinaNet models available in Detectron2, with different backbones and learning schedules.

To load from the Detectron2 model zoo:

from detectron2 import model_zoo
model = model_zoo.get("COCO-Detection/retinanet_R_101_FPN_3x.yaml", trained=True)

Replace the configuration path with the variant you want to use. You can find the paths in the model summaries at the top of this page.

How do I train this model?

You can follow the Getting Started guide on Colab to see how to train a model.

You can also read the official Detectron2 documentation.

Citation

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

Results

Object Detection on COCO minival

Object Detection

BENCHMARK	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
COCO minival	RetinaNet (R101, 3x)	box AP	40.4	# 70
COCO minival	RetinaNet (R50, 3x)	box AP	38.7	# 87
COCO minival	RetinaNet (R50, 1x)	box AP	37.4	# 100