Losses¶
visdet uses a registry-based system for losses. In configs, you typically specify a loss by its class name:
loss_cls = dict(type="FocalLoss", gamma=2.0, alpha=0.25, loss_weight=1.0)
loss_bbox = dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0)
All losses follow a common API with reduction, loss_weight, and optional weight and avg_factor parameters for flexible loss computation.
Classification Losses¶
CrossEntropyLoss¶
Standard cross-entropy loss for classification tasks.1
With use_sigmoid=True, it uses binary cross-entropy instead:
Key Hyperparameters:
use_sigmoid(default: False): Use sigmoid instead of softmaxuse_mask(default: False): Use mask cross-entropy for instance segmentationclass_weight(optional): Per-class weights for imbalanced datasetsignore_index(default: -100): Label index to ignore in loss computationavg_non_ignore(default: False): Average only over non-ignored elements
Characteristics:
- Standard choice for multi-class classification
- Works well when classes are relatively balanced
- Can handle class imbalance with
class_weight - Supports mask predictions for instance segmentation
FocalLoss¶
Focal Loss addresses class imbalance by down-weighting easy examples and focusing on hard negatives.2
p_t = sigmoid(pred) if target == 1 else (1 - sigmoid(pred))
focal_weight = alpha * (1 - p_t)^gamma
loss = -focal_weight * log(p_t)
Key Hyperparameters:
gamma(default: 2.0): Focusing parameter that reduces loss for well-classified examplesalpha(default: 0.25): Balancing factor for positive vs negative samplesuse_sigmoid(default: True): Only sigmoid mode is supported
Characteristics:
- Designed for extreme class imbalance (e.g., object detection with many background samples)
gamma=0reduces to standard cross-entropy- Higher
gammavalues focus more on hard examples - Standard classification loss for one-stage detectors like RetinaNet
QualityFocalLoss (GFL)¶
Quality Focal Loss extends Focal Loss to jointly represent classification and localization quality.3
# For negatives (background):
loss = BCE(pred, 0) * sigmoid(pred)^beta
# For positives:
scale_factor = |quality_score - sigmoid(pred)|
loss = BCE(pred, quality_score) * scale_factor^beta
Key Hyperparameters:
beta(default: 2.0): Modulating factor similar to gamma in Focal Lossuse_sigmoid(default: True): Only sigmoid mode is supported
Characteristics:
- Target is a tuple of (category_label, quality_score) where quality_score is typically IoU
- Unifies classification and localization quality into a single prediction
- Eliminates the need for separate centerness prediction
- Used in GFL (Generalized Focal Loss) detector
Regression Losses¶
L1Loss¶
Simple L1 (Mean Absolute Error) loss for regression tasks.
Key Hyperparameters:
reduction(default: "mean"): Options are "none", "mean", "sum"loss_weight(default: 1.0): Weight multiplier for the loss
Characteristics:
- More robust to outliers than L2 loss
- Gradient is constant regardless of error magnitude
- Can cause instability when errors are very small
SmoothL1Loss¶
Smooth L1 loss (Huber loss) combines L1 and L2 to get benefits of both.4
if |pred - target| < beta:
loss = 0.5 * (pred - target)^2 / beta
else:
loss = |pred - target| - 0.5 * beta
Key Hyperparameters:
beta(default: 1.0): Threshold between L2 and L1 behaviorreduction(default: "mean"): Options are "none", "mean", "sum"
Characteristics:
- Smooth gradient near zero (like L2) prevents instability
- Linear for large errors (like L1) for robustness to outliers
- Standard regression loss for two-stage detectors like Faster R-CNN
betacontrols the transition point between quadratic and linear
MSELoss¶
Mean Squared Error (L2) loss for regression.
Key Hyperparameters:
reduction(default: "mean"): Options are "none", "mean", "sum"loss_weight(default: 1.0): Weight multiplier for the loss
Characteristics:
- Penalizes large errors more heavily than L1
- Sensitive to outliers
- Smooth gradient everywhere
- Commonly used for centerness prediction
BalancedL1Loss¶
Balanced L1 Loss from Libra R-CNN promotes balanced learning between classification and localization.5
b = e^(gamma/alpha) - 1
if |diff| < beta:
loss = (alpha/b) * (b*diff + 1) * log(b*diff/beta + 1) - alpha*diff
else:
loss = gamma*diff + gamma/b - alpha*beta
Key Hyperparameters:
alpha(default: 0.5): Controls the upper bound of the lossgamma(default: 1.5): Controls the gradient at the originbeta(default: 1.0): Threshold between regions (like Smooth L1)
Characteristics:
- Designed to balance the contribution of samples at different levels
- Promotes equal training for samples with different IoU values
- Improves localization accuracy especially for high-IoU samples
- More complex but can provide better accuracy than SmoothL1Loss
IoU-Based Losses¶
IoULoss¶
IoU Loss directly optimizes Intersection over Union for bounding box regression.6
iou = intersection(pred, target) / union(pred, target)
if mode == "linear":
loss = 1 - iou
elif mode == "square":
loss = 1 - iou^2
elif mode == "log":
loss = -log(iou)
Key Hyperparameters:
mode(default: "log"): Loss scaling mode - "linear", "square", or "log"eps(default: 1e-6): Small value for numerical stability
Characteristics:
- Scale-invariant: treats small and large boxes equally
- Directly optimizes the evaluation metric
- Log mode provides larger gradients for low IoU predictions
- Used in anchor-free detectors like FCOS
GIoULoss¶
Generalized IoU Loss extends IoU to handle non-overlapping boxes.7
# C is the smallest enclosing box containing both pred and target
giou = iou - (area(C) - union) / area(C)
loss = 1 - giou
Key Hyperparameters:
eps(default: 1e-6): Small value for numerical stabilityreduction(default: "mean"): Options are "none", "mean", "sum"
Characteristics:
- Works even when boxes don't overlap (IoU = 0)
- Provides gradient signal for non-overlapping boxes
- GIoU ranges from -1 to 1, where 1 is perfect overlap
- Better convergence than IoU loss for boxes that are far apart
- Standard regression loss for modern detectors like ATSS
Distribution Losses¶
DistributionFocalLoss¶
Distribution Focal Loss learns a discretized distribution over box offsets instead of direct regression.3
# Label is a continuous value, discretized to neighboring integers
left = floor(label)
right = left + 1
weight_left = right - label
weight_right = label - left
loss = CE(pred, left) * weight_left + CE(pred, right) * weight_right
Key Hyperparameters:
reduction(default: "mean"): Options are "none", "mean", "sum"loss_weight(default: 1.0): Weight multiplier for the loss
Characteristics:
- Models localization uncertainty as a distribution
- Allows the network to express ambiguous boundary locations
- Works with General Distribution representation for bbox regression
- Used together with QualityFocalLoss in GFL detector
Loss Combinations for Popular Detectors¶
RetinaNet¶
RetinaNet uses Focal Loss for classification to handle extreme class imbalance, paired with Smooth L1 for box regression.2
loss_cls = dict(
type="FocalLoss",
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0,
)
loss_bbox = dict(
type="SmoothL1Loss",
beta=1.0,
loss_weight=1.0,
)
FCOS¶
FCOS (Fully Convolutional One-Stage) uses Focal Loss for classification, IoU Loss for box regression, and CrossEntropyLoss for centerness.8
loss_cls = dict(
type="FocalLoss",
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0,
)
loss_bbox = dict(
type="IoULoss",
mode="log",
loss_weight=1.0,
)
loss_centerness = dict(
type="CrossEntropyLoss",
use_sigmoid=True,
loss_weight=1.0,
)
ATSS¶
ATSS (Adaptive Training Sample Selection) uses Focal Loss for classification, GIoU Loss for better box regression, and CrossEntropyLoss for centerness.9
loss_cls = dict(
type="FocalLoss",
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0,
)
loss_bbox = dict(
type="GIoULoss",
loss_weight=2.0,
)
loss_centerness = dict(
type="CrossEntropyLoss",
use_sigmoid=True,
loss_weight=1.0,
)
GFL (Generalized Focal Loss)¶
GFL unifies classification and localization quality with QualityFocalLoss, and uses DistributionFocalLoss for learning box distributions.3
loss_cls = dict(
type="QualityFocalLoss",
use_sigmoid=True,
beta=2.0,
loss_weight=1.0,
)
loss_bbox = dict(
type="GIoULoss",
loss_weight=2.0,
)
loss_dfl = dict(
type="DistributionFocalLoss",
loss_weight=0.25,
)
Listing Available Losses¶
Because the available set can change depending on installed optional dependencies, you can list what your environment has registered:
from visdet.registry import MODELS
# Filter for loss modules
losses = [name for name in sorted(MODELS.module_dict.keys()) if "Loss" in name]
print(losses)
Note
All losses inherit from nn.Module and follow the same forward signature: forward(pred, target, weight=None, avg_factor=None, reduction_override=None).
-
Murphy, K. (2012), Machine Learning: A Probabilistic Perspective. MIT Press. ↩
-
Lin et al. (2017), Focal Loss for Dense Object Detection. https://arxiv.org/abs/1708.02002 ↩↩
-
Li et al. (2020), Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. https://arxiv.org/abs/2006.04388 ↩↩↩
-
Girshick (2015), Fast R-CNN. https://arxiv.org/abs/1504.08083 ↩
-
Pang et al. (2019), Libra R-CNN: Towards Balanced Learning for Object Detection. https://arxiv.org/abs/1904.02701 ↩
-
Yu et al. (2016), UnitBox: An Advanced Object Detection Network. https://arxiv.org/abs/1608.01471 ↩
-
Rezatofighi et al. (2019), Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. https://arxiv.org/abs/1902.09630 ↩
-
Tian et al. (2019), FCOS: Fully Convolutional One-Stage Object Detection. https://arxiv.org/abs/1904.01355 ↩
-
Zhang et al. (2020), Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection. https://arxiv.org/abs/1912.02424 ↩