# Best Practices

## Minimize the Calls to `sync`

When using `AsyncLoader`, which already contains an internal `sync`, additional calls to `sync()` are generally unnecessary and can cause redundant synchronization. In other scenarios, avoid excessive calls to `sync` whenever possible.


## Prefer `AsyncLoader`

Use `AsyncLoader` instead of manually transferring I/O tensors to `lazy_device`.


## Avoid Evaluating Tensors
Evaluating tensors can impact performance. Operations that trigger tensor evaluation include:

- Printing tensors
- Calling the `item` method on a tensor
- Using tensor values in dynamic control flow for branch logic


## Coordinate `Gradient Accumulation` with `sync` and `AsyncLoader`

When using `Gradient Accumulation`, adjust the `batches_per_execution` parameter in `AsyncLoader` to match the GA minibatch count N. This ensures `sync` is executed once after N minibatches. Additionally, consider the memory overhead in this scenario; if it's too high, you may need to execute `sync` after each minibatch.


## Model Saving

For robust model reloading during continued training, save the model by first transferring it to CPU with `model.to('cpu')` before calling the save operation.