Best Practices
Minimize the Calls to mark_step
When using AsyncLoader, which already contains an internal mark_step, additional calls to mark_step() are generally unnecessary and can cause redundant synchronization. In other scenarios, avoid excessive calls to mark_step whenever possible.
Prefer AsyncLoader
Use AsyncLoader instead of manually transferring I/O tensors to lazy_device.
Avoid Evaluating Tensors
Evaluating tensors can impact performance. Operations that trigger tensor evaluation include:
Printing tensors
Calling the
itemmethod on a tensorUsing tensor values in dynamic control flow for branch logic
Coordinate Gradient Accumulation with mark_step and AsyncLoader
When using Gradient Accumulation, adjust the batches_per_execution parameter in AsyncLoader to match the GA minibatch count N. This ensures mark_step is executed once after N minibatches. Additionally, consider the memory overhead in this scenario; if it’s too high, you may need to execute mark_step after each minibatch.
Model Saving
For robust model reloading during continued training, save the model by first transferring it to CPU with model.to('cpu') before calling the save operation.