Home

Here Be Dragons

Beware, here be dragons! This is very very unstable right now, and probably will be until early Jan.

Introduction¶

VisDet is an open source object detection toolbox based on PyTorch. This project is a fork of the original MMDetection project, providing an enhanced and modernized detection framework for research and production use.

The master branch works with PyTorch 1.5+.

Major features

- **Modular Design** We decompose the detection framework into different components and one can easily construct a customized object detection framework by combining different modules. - **Support of multiple frameworks out of box** The toolbox directly supports popular and contemporary detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc. - **High efficiency** All basic bbox and mask operations run on GPUs. The training speed is faster than or comparable to other codebases, including [Detectron2](https://github.com/facebookresearch/detectron2). - **State of the art** Built on a codebase originally developed by the MMDet team (COCO Detection Challenge winners in 2018), this fork continues pushing the boundaries forward with modern improvements and enhancements.

Roadmap¶

See our Roadmap for planned features, including: - SPDL Integration: Thread-based data loading for 74% faster training - Kornia: GPU-accelerated augmentations - Python 3.13t: Free-threaded Python support

What's New¶

For the latest updates and improvements to VisDet, please refer to the changelog.

Installation¶

Please refer to Installation for installation instructions.

Getting Started¶

Please see the Getting Started guide for the basic usage of VisDet. Available tutorials:

Overview of Benchmark and Model Zoo¶

Results and models are available in the model zoo.

Architectures

Object Detection

Instance Segmentation

Panoptic Segmentation

Other

Fast R-CNN (ICCV'2015)
Faster R-CNN (NeurIPS'2015)
RPN (NeurIPS'2015)
SSD (ECCV'2016)
RetinaNet (ICCV'2017)
Cascade R-CNN (CVPR'2018)
YOLOv3 (ArXiv'2018)
CornerNet (ECCV'2018)
Grid R-CNN (CVPR'2019)
Guided Anchoring (CVPR'2019)
FSAF (CVPR'2019)
CenterNet (ArXiv'2019)
Libra R-CNN (CVPR'2019)
TridentNet (ICCV'2019)
FCOS (ICCV'2019)
RepPoints (ICCV'2019)
FreeAnchor (NeurIPS'2019)
CascadeRPN (NeurIPS'2019)
Foveabox (TIP'2020)
Double-Head R-CNN (CVPR'2020)
ATSS (CVPR'2020)
NAS-FCOS (CVPR'2020)
CentripetalNet (CVPR'2020)
AutoAssign (ArXiv'2020)
Side-Aware Boundary Localization (ECCV'2020)
Dynamic R-CNN (ECCV'2020)
DETR (ECCV'2020)
PAA (ECCV'2020)
VarifocalNet (CVPR'2021)
Sparse R-CNN (CVPR'2021)
YOLOF (CVPR'2021)
YOLOX (ArXiv'2021)
Deformable DETR (ICLR'2021)
TOOD (ICCV'2021)
DDOD (ACM MM'2021)

Mask R-CNN (ICCV'2017)
Cascade Mask R-CNN (CVPR'2018)
Mask Scoring R-CNN (CVPR'2019)
Hybrid Task Cascade (CVPR'2019)
YOLACT (ICCV'2019)
InstaBoost (ICCV'2019)
SOLO (ECCV'2020)
PointRend (CVPR'2020)
DetectoRS (CVPR'2021)
SOLOv2 (NeurIPS'2020)
SCNet (AAAI'2021)
QueryInst (ICCV'2021)
Mask2Former (CVPR'2022)

Panoptic FPN (CVPR'2019)
MaskFormer (NeurIPS'2021)
Mask2Former (CVPR'2022)

Contrastive Learning

SwAV (NeurIPS'2020)
MoCo (CVPR'2020)
MoCov2 (ArXiv'2020)

Distillation

Localization Distillation (CVPR'2022)
Label Assignment Distillation (WACV'2022)

Receptive Field Search

RF-Next (TPAMI'2022)

Components

Backbones	Necks	Loss	Common
VGG (ICLR'2015) ResNet (CVPR'2016) ResNeXt (CVPR'2017) MobileNetV2 (CVPR'2018) HRNet (CVPR'2019) Generalized Attention (ICCV'2019) GCNet (ICCVW'2019) Res2Net (TPAMI'2020) RegNet (CVPR'2020) ResNeSt (CVPRW'2022) PVT (ICCV'2021) Swin (ICCV'2021) PVTv2 (CVMJ'2022) ResNet strikes back (NeurIPSW'2021) EfficientNet (ICML'2019) ConvNeXt (CVPR'2022)	PAFPN (CVPR'2018) NAS-FPN (CVPR'2019) CARAFE (ICCV'2019) FPG (ArXiv'2020) GRoIE (ICPR'2020) DyHead (CVPR'2021)	GHM (AAAI'2019) Generalized Focal Loss (NeurIPS'2020) Seasaw Loss (CVPR'2021)	OHEM (CVPR'2016) Group Normalization (ECCV'2018) DCN (ICCV'2017) DCNv2 (CVPR'2019) Weight Standardization (ArXiv'2019) Prime Sample Attention (CVPR'2020) Strong Baselines (CVPR'2021) Resnet strikes back (NeurIPSW'2021) RF-Next (TPAMI'2022)

See the model zoo for a complete list of supported methods.

FAQ¶

Please refer to the documentation for frequently asked questions.

Contributing¶

We appreciate all contributions to improve VisDet. Welcome community users to participate in development. Please refer to the contributing guide for the contributing guideline.

Acknowledgement¶

VisDet is a fork of the original MMDetection project. We acknowledge the original MMDetection team and the broader open source community for their contributions to the detection research field. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.

We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors.

Citation¶

If you use this toolbox or benchmark in your research, please cite this project.

@article{mmdetection,
  title   = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark},
  author  = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and
             Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and
             Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and
             Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and
             Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong
             and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua},
  journal= {arXiv preprint arXiv:1906.07155},
  year={2019}
}

License¶

This project is released under the Apache 2.0 license.