Best Practices

Minimize the Calls to mark_step

When using AsyncLoader, which already contains an internal mark_step, additional calls to mark_step() are generally unnecessary and can cause redundant synchronization. In other scenarios, avoid excessive calls to mark_step whenever possible.

Prefer AsyncLoader

Use AsyncLoader instead of manually transferring I/O tensors to lazy_device.

Avoid Evaluating Tensors

Evaluating tensors can impact performance. Operations that trigger tensor evaluation include:

  • Printing tensors

  • Calling the item method on a tensor

  • Using tensor values in dynamic control flow for branch logic

Coordinate Gradient Accumulation with mark_step and AsyncLoader

When using Gradient Accumulation, adjust the batches_per_execution parameter in AsyncLoader to match the GA minibatch count N. This ensures mark_step is executed once after N minibatches. Additionally, consider the memory overhead in this scenario; if it’s too high, you may need to execute mark_step after each minibatch.

Model Saving

For robust model reloading during continued training, save the model by first transferring it to CPU with model.to('cpu') before calling the save operation.