Training¶

This guide covers training models in visdet.

Quick Start¶

To train a model with a single GPU:

python tools/train.py ${CONFIG_FILE}

For distributed training across multiple GPUs:

bash tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}

Example with 8 GPUs:

bash tools/dist_train.sh configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py 8

python tools/misc/download_dataset.py --dataset-name coco2017 --save-dir data/coco --unzip --delete

Populate a persistent Modal Volume with COCO under /root/data/coco/:

VISDET_COCO_VOLUME=visdet-coco modal run tools/modal/download_coco2017_to_volume.py

Mount the same volume at /root/data in your training app so configs using data/coco/ work unchanged.

Training behavior is controlled through configuration files. See the Configuration Guide for details.

Enable TensorBoard logging in your config:

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

Then run:

tensorboard --logdir=work_dirs/

Model checkpoints are saved to work_dirs/ by default. Configure checkpoint behavior:

checkpoint_config = dict(interval=1)  # Save every epoch

Enable FP16 training for faster training and reduced memory usage:

python tools/train.py ${CONFIG_FILE} --fp16

Resume from a checkpoint:

python tools/train.py ${CONFIG_FILE} --resume-from ${CHECKPOINT_FILE}

Load pretrained weights and train:

python tools/train.py ${CONFIG_FILE} --load-from ${CHECKPOINT_FILE}