Best Practices

Minimize the Calls to sync

When using AsyncLoader, which already contains an internal sync, additional calls to sync() are generally unnecessary and can cause redundant synchronization. In other scenarios, avoid excessive calls to sync whenever possible.

Prefer AsyncLoader

Use AsyncLoader instead of manually transferring I/O tensors to lazy_device.

Avoid Evaluating Tensors

Evaluating tensors can impact performance. Operations that trigger tensor evaluation include:

  • Printing tensors

  • Calling the item method on a tensor

  • Using tensor values in dynamic control flow for branch logic

Coordinate Gradient Accumulation with sync and AsyncLoader

When using Gradient Accumulation, adjust the batches_per_execution parameter in AsyncLoader to match the GA minibatch count N. This ensures sync is executed once after N minibatches. Additionally, consider the memory overhead in this scenario; if it’s too high, you may need to execute sync after each minibatch.

Model Saving

For robust model reloading during continued training, save the model by first transferring it to CPU with model.to('cpu') before calling the save operation.