MMDetection Model Support in VisDet¶

This document provides a comparison of models available in MMDetection and their support status in VisDet.

Summary¶

VisDet currently focuses on two-stage detectors with a clean, typed, and well-tested codebase. We prioritize quality over quantity, ensuring each supported model works reliably.

Detector Architectures¶

Model	MMDetection	VisDet	Notes
Two-Stage Detectors
Faster R-CNN	✅	✅	Core detector, fully supported
Mask R-CNN	✅	✅	Instance segmentation supported
Cascade R-CNN	✅	✅	Multi-stage refinement
Cascade Mask R-CNN	✅	✅	Via Cascade R-CNN + mask head
Fast R-CNN	✅	❌
RPN	✅	✅	Region Proposal Network
HTC (Hybrid Task Cascade)	✅	❌
MS R-CNN (Mask Scoring)	✅	❌
SCNet	✅	❌
TridentNet	✅	❌
Sparse R-CNN	✅	❌
QueryInst	✅	❌
Grid R-CNN	✅	❌
Double Heads	✅	❌
Dynamic R-CNN	✅	❌
Libra R-CNN	✅	❌
Groie	✅	❌
DetectoRS	✅	❌
One-Stage Detectors
RetinaNet	✅	❌
SSD	✅	❌
FCOS	✅	❌
ATSS	✅	❌
GFL	✅	❌
VFNet	✅	❌
YOLACT	✅	❌
YOLOv3	✅	❌
YOLOX	✅	❌
YOLOF	✅	❌
RTMDet	✅	❌
TOOD	✅	❌
PAA	✅	❌
DDOD	✅	❌
FSAF	✅	❌
FreeAnchor	✅	❌
FoveaBox	✅	❌
CornerNet	✅	❌
CenterNet	✅	❌
CentripetalNet	✅	❌
RepPoints	✅	❌
GHM	✅	❌
NAS-FPN	✅	❌
NAS-FCOS	✅	❌
AutoAssign	✅	❌
SABL	✅	❌
Transformer-Based Detectors
DETR	✅	❌
Deformable DETR	✅	❌
Conditional DETR	✅	❌
DAB-DETR	✅	❌
DINO	✅	❌
DDQ	✅	❌
Grounding DINO	✅	❌
MM-Grounding-DINO	✅	❌
GLIP	✅	❌
Panoptic/Instance Segmentation
MaskFormer	✅	❌
Mask2Former	✅	❌
Panoptic FPN	✅	❌
SOLO	✅	❌
SOLOv2	✅	❌
CondInst	✅	❌
BoxInst	✅	❌
Point Rend	✅	❌
Tracking
ByteTrack	✅	❌
QDTrack	✅	❌
SORT	✅	❌
DeepSORT	✅	❌
OC-SORT	✅	❌
StrongSORT	✅	❌
MaskTrack R-CNN	✅	❌
Knowledge Distillation
LAD	✅	❌
LD	✅	❌

Backbones¶

Backbone	MMDetection	VisDet	Notes
ResNet	✅	✅	ResNet-18/34/50/101/152
ResNeXt	✅	✅
Res2Net	✅	✅
ResNeSt	✅	✅
RegNet	✅	✅
HRNet	✅	✅
Swin Transformer	✅	✅
ConvNeXt	✅	❌
PVT	✅	❌
EfficientNet	✅	❌
VGG	✅	❌
MobileNet	✅	❌
DetectoRS ResNet	✅	❌
CSPDarknet	✅	❌
CSPNeXt	✅	❌

Necks¶

Neck	MMDetection	VisDet	Notes
FPN	✅	✅	Feature Pyramid Network
PAFPN	✅	❌
BiFPN	✅	❌
NAS-FPN	✅	❌
CARAFE FPN	✅	❌
FPG	✅	❌
RFNext	✅	❌
DyHead	✅	❌

Techniques & Modules¶

Technique	MMDetection	VisDet	Notes
DCN (Deformable Conv)	✅	❌
DCNv2	✅	❌
Group Normalization	✅	❌
Weight Standardization	✅	❌
Guided Anchoring	✅	❌
CARAFE	✅	❌
InstaBoost	✅	❌
Albumentations	✅	✅	Via configs
Simple Copy-Paste	✅	❌
Seesaw Loss	✅	❌
PISA	✅	❌
Soft Teacher	✅	❌

Datasets¶

Dataset	MMDetection	VisDet
COCO	✅	✅
PASCAL VOC	✅	✅
LVIS	✅	✅
Objects365	✅	✅
OpenImages	✅	✅
Cityscapes	✅	✅
WIDER Face	✅	✅
DeepFashion	✅	❌

Roadmap¶

We plan to add support for: - [ ] RetinaNet (one-stage anchor-based) - [ ] FCOS (anchor-free) - [ ] DETR family (transformer-based) - [ ] RTMDet (real-time) - [ ] More backbones (ConvNeXt, EfficientNet)

Contributing¶

Want to help add support for a model? See our contribution guide or open an issue to discuss!