Training¶
This guide covers training models in visdet.
Quick Start¶
To train a model with a single GPU:
Multi-GPU Training¶
For distributed training across multiple GPUs:
Example with 8 GPUs:
Datasets¶
COCO 2017 (local)¶
COCO 2017 (Modal Volume)¶
Populate a persistent Modal Volume with COCO under /root/data/coco/:
Mount the same volume at /root/data in your training app so configs using data/coco/ work unchanged.
Configuration¶
Training behavior is controlled through configuration files. See the Configuration Guide for details.
Key Configuration Options¶
- Learning Rate: Adjust based on batch size and number of GPUs
- Epochs: Number of training epochs
- Batch Size: Samples per GPU
- Optimizer: Adam, SGD, AdamW, etc.
Monitoring Training¶
TensorBoard¶
Enable TensorBoard logging in your config:
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
Then run:
Checkpoints¶
Model checkpoints are saved to work_dirs/ by default. Configure checkpoint behavior:
Advanced Topics¶
Mixed Precision Training¶
Enable FP16 training for faster training and reduced memory usage:
Resume Training¶
Resume from a checkpoint:
Fine-tuning¶
Load pretrained weights and train: