# Best Practices ## Minimize the Calls to `sync` When using `AsyncLoader`, which already contains an internal `sync`, additional calls to `sync()` are generally unnecessary and can cause redundant synchronization. In other scenarios, avoid excessive calls to `sync` whenever possible. ## Prefer `AsyncLoader` Use `AsyncLoader` instead of manually transferring I/O tensors to `lazy_device`. ## Avoid Evaluating Tensors Evaluating tensors can impact performance. Operations that trigger tensor evaluation include: - Printing tensors - Calling the `item` method on a tensor - Using tensor values in dynamic control flow for branch logic ## Coordinate `Gradient Accumulation` with `sync` and `AsyncLoader` When using `Gradient Accumulation`, adjust the `batches_per_execution` parameter in `AsyncLoader` to match the GA minibatch count N. This ensures `sync` is executed once after N minibatches. Additionally, consider the memory overhead in this scenario; if it's too high, you may need to execute `sync` after each minibatch. ## Model Saving For robust model reloading during continued training, save the model by first transferring it to CPU with `model.to('cpu')` before calling the save operation.