Skip to content

Home

 

Here Be Dragons

Beware, here be dragons! This is very very unstable right now, and probably will be until early Jan.

Introduction

VisDet is an open source object detection toolbox based on PyTorch. This project is a fork of the original MMDetection project, providing an enhanced and modernized detection framework for research and production use.

The master branch works with PyTorch 1.5+.

Major features - **Modular Design** We decompose the detection framework into different components and one can easily construct a customized object detection framework by combining different modules. - **Support of multiple frameworks out of box** The toolbox directly supports popular and contemporary detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc. - **High efficiency** All basic bbox and mask operations run on GPUs. The training speed is faster than or comparable to other codebases, including [Detectron2](https://github.com/facebookresearch/detectron2). - **State of the art** Built on a codebase originally developed by the MMDet team (COCO Detection Challenge winners in 2018), this fork continues pushing the boundaries forward with modern improvements and enhancements.

Roadmap

See our Roadmap for planned features, including: - SPDL Integration: Thread-based data loading for 74% faster training - Kornia: GPU-accelerated augmentations - Python 3.13t: Free-threaded Python support

What's New

For the latest updates and improvements to VisDet, please refer to the changelog.

Installation

Please refer to Installation for installation instructions.

Getting Started

Please see the Getting Started guide for the basic usage of VisDet. Available tutorials:

Overview of Benchmark and Model Zoo

Results and models are available in the model zoo.

Architectures
Object Detection Instance Segmentation Panoptic Segmentation Other
  • Fast R-CNN (ICCV'2015)
  • Faster R-CNN (NeurIPS'2015)
  • RPN (NeurIPS'2015)
  • SSD (ECCV'2016)
  • RetinaNet (ICCV'2017)
  • Cascade R-CNN (CVPR'2018)
  • YOLOv3 (ArXiv'2018)
  • CornerNet (ECCV'2018)
  • Grid R-CNN (CVPR'2019)
  • Guided Anchoring (CVPR'2019)
  • FSAF (CVPR'2019)
  • CenterNet (ArXiv'2019)
  • Libra R-CNN (CVPR'2019)
  • TridentNet (ICCV'2019)
  • FCOS (ICCV'2019)
  • RepPoints (ICCV'2019)
  • FreeAnchor (NeurIPS'2019)
  • CascadeRPN (NeurIPS'2019)
  • Foveabox (TIP'2020)
  • Double-Head R-CNN (CVPR'2020)
  • ATSS (CVPR'2020)
  • NAS-FCOS (CVPR'2020)
  • CentripetalNet (CVPR'2020)
  • AutoAssign (ArXiv'2020)
  • Side-Aware Boundary Localization (ECCV'2020)
  • Dynamic R-CNN (ECCV'2020)
  • DETR (ECCV'2020)
  • PAA (ECCV'2020)
  • VarifocalNet (CVPR'2021)
  • Sparse R-CNN (CVPR'2021)
  • YOLOF (CVPR'2021)
  • YOLOX (ArXiv'2021)
  • Deformable DETR (ICLR'2021)
  • TOOD (ICCV'2021)
  • DDOD (ACM MM'2021)
  • Mask R-CNN (ICCV'2017)
  • Cascade Mask R-CNN (CVPR'2018)
  • Mask Scoring R-CNN (CVPR'2019)
  • Hybrid Task Cascade (CVPR'2019)
  • YOLACT (ICCV'2019)
  • InstaBoost (ICCV'2019)
  • SOLO (ECCV'2020)
  • PointRend (CVPR'2020)
  • DetectoRS (CVPR'2021)
  • SOLOv2 (NeurIPS'2020)
  • SCNet (AAAI'2021)
  • QueryInst (ICCV'2021)
  • Mask2Former (CVPR'2022)
  • Panoptic FPN (CVPR'2019)
  • MaskFormer (NeurIPS'2021)
  • Mask2Former (CVPR'2022)
  • Contrastive Learning
      • SwAV (NeurIPS'2020)
      • MoCo (CVPR'2020)
      • MoCov2 (ArXiv'2020)
  • Distillation
      • Localization Distillation (CVPR'2022)
      • Label Assignment Distillation (WACV'2022)
  • Receptive Field Search
      • RF-Next (TPAMI'2022)
    Components
    Backbones Necks Loss Common
    • VGG (ICLR'2015)
    • ResNet (CVPR'2016)
    • ResNeXt (CVPR'2017)
    • MobileNetV2 (CVPR'2018)
    • HRNet (CVPR'2019)
    • Generalized Attention (ICCV'2019)
    • GCNet (ICCVW'2019)
    • Res2Net (TPAMI'2020)
    • RegNet (CVPR'2020)
    • ResNeSt (CVPRW'2022)
    • PVT (ICCV'2021)
    • Swin (ICCV'2021)
    • PVTv2 (CVMJ'2022)
    • ResNet strikes back (NeurIPSW'2021)
    • EfficientNet (ICML'2019)
    • ConvNeXt (CVPR'2022)
    • PAFPN (CVPR'2018)
    • NAS-FPN (CVPR'2019)
    • CARAFE (ICCV'2019)
    • FPG (ArXiv'2020)
    • GRoIE (ICPR'2020)
    • DyHead (CVPR'2021)
    • GHM (AAAI'2019)
    • Generalized Focal Loss (NeurIPS'2020)
    • Seasaw Loss (CVPR'2021)
    • OHEM (CVPR'2016)
    • Group Normalization (ECCV'2018)
    • DCN (ICCV'2017)
    • DCNv2 (CVPR'2019)
    • Weight Standardization (ArXiv'2019)
    • Prime Sample Attention (CVPR'2020)
    • Strong Baselines (CVPR'2021)
    • Resnet strikes back (NeurIPSW'2021)
    • RF-Next (TPAMI'2022)

    See the model zoo for a complete list of supported methods.

    FAQ

    Please refer to the documentation for frequently asked questions.

    Contributing

    We appreciate all contributions to improve VisDet. Welcome community users to participate in development. Please refer to the contributing guide for the contributing guideline.

    Acknowledgement

    VisDet is a fork of the original MMDetection project. We acknowledge the original MMDetection team and the broader open source community for their contributions to the detection research field. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.

    We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors.

    Citation

    If you use this toolbox or benchmark in your research, please cite this project.

    @article{mmdetection,
      title   = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark},
      author  = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and
                 Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and
                 Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and
                 Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and
                 Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong
                 and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua},
      journal= {arXiv preprint arXiv:1906.07155},
      year={2019}
    }
    

    License

    This project is released under the Apache 2.0 license.