Trainer settings
The Trainer
settings defined in native Pytorch Lightning will
Multi GPU support
Training on multiple gpu's in streaming is handled by Pytorch lightning. However, the native auto
option within the Trainer
class will not work (we omit the technical details why here).
For the strategy
argument of the Trainer, we therefore recommend to use the ddp_find_unused_parameters_true
instead, which does not conflict with streaming and gradient checkpointing.
Gradient accumulation
Specifying higher batch sizes will not affect normalization layers during training, as they should be on eval()
mode. However, gradient accumulation is still possible and can
stabilize training under certain circumstances. This can be easily set using the accumulate_grad_batches
argument.
Precision
We recommend to train using mixed precision training wherever possible and to let pytorch handle the conversion. This can be set using the 16-mixed
option.
Loggers and callbacks
Callbacks for a variety of training strategies (checkpointing, early stopping, etc) are natively supported by Pytorch Lightning. Please consult their respective documentation on how to do this. The same can be said for standard logging solutions (Tensorboard, Wandb).
Example
trainer = pl.Trainer(
default_root_dir="path_to_save_dir",
accelerator="gpu",
max_epochs=100,
devices=2,
strategy="ddp_find_unused_parameters_true",
accumulate_grad_batches=8,
precision="16-mixed",
logger=wandb_logger,
)