Skip to content

MMDetection Model Support in VisDet

This document provides a comparison of models available in MMDetection and their support status in VisDet.

Summary

VisDet currently focuses on two-stage detectors with a clean, typed, and well-tested codebase. We prioritize quality over quantity, ensuring each supported model works reliably.

Detector Architectures

Model MMDetection VisDet Notes
Two-Stage Detectors
Faster R-CNN Core detector, fully supported
Mask R-CNN Instance segmentation supported
Cascade R-CNN Multi-stage refinement
Cascade Mask R-CNN Via Cascade R-CNN + mask head
Fast R-CNN
RPN Region Proposal Network
HTC (Hybrid Task Cascade)
MS R-CNN (Mask Scoring)
SCNet
TridentNet
Sparse R-CNN
QueryInst
Grid R-CNN
Double Heads
Dynamic R-CNN
Libra R-CNN
Groie
DetectoRS
One-Stage Detectors
RetinaNet
SSD
FCOS
ATSS
GFL
VFNet
YOLACT
YOLOv3
YOLOX
YOLOF
RTMDet
TOOD
PAA
DDOD
FSAF
FreeAnchor
FoveaBox
CornerNet
CenterNet
CentripetalNet
RepPoints
GHM
NAS-FPN
NAS-FCOS
AutoAssign
SABL
Transformer-Based Detectors
DETR
Deformable DETR
Conditional DETR
DAB-DETR
DINO
DDQ
Grounding DINO
MM-Grounding-DINO
GLIP
Panoptic/Instance Segmentation
MaskFormer
Mask2Former
Panoptic FPN
SOLO
SOLOv2
CondInst
BoxInst
Point Rend
Tracking
ByteTrack
QDTrack
SORT
DeepSORT
OC-SORT
StrongSORT
MaskTrack R-CNN
Knowledge Distillation
LAD
LD

Backbones

Backbone MMDetection VisDet Notes
ResNet ResNet-18/34/50/101/152
ResNeXt
Res2Net
ResNeSt
RegNet
HRNet
Swin Transformer
ConvNeXt
PVT
EfficientNet
VGG
MobileNet
DetectoRS ResNet
CSPDarknet
CSPNeXt

Necks

Neck MMDetection VisDet Notes
FPN Feature Pyramid Network
PAFPN
BiFPN
NAS-FPN
CARAFE FPN
FPG
RFNext
DyHead

Techniques & Modules

Technique MMDetection VisDet Notes
DCN (Deformable Conv)
DCNv2
Group Normalization
Weight Standardization
Guided Anchoring
CARAFE
InstaBoost
Albumentations Via configs
Simple Copy-Paste
Seesaw Loss
PISA
Soft Teacher

Datasets

Dataset MMDetection VisDet Notes
COCO
PASCAL VOC
LVIS
Objects365
OpenImages
Cityscapes
WIDER Face
DeepFashion

Roadmap

We plan to add support for: - [ ] RetinaNet (one-stage anchor-based) - [ ] FCOS (anchor-free) - [ ] DETR family (transformer-based) - [ ] RTMDet (real-time) - [ ] More backbones (ConvNeXt, EfficientNet)

Contributing

Want to help add support for a model? See our contribution guide or open an issue to discuss!