Training Setup
Argument Parsing
DeepSpeed uses the argparse library to
supply commandline configuration to the DeepSpeed runtime. Use deepspeed.add_config_arguments()
to add DeepSpeed’s builtin arguments to your application’s parser.
parser = argparse.ArgumentParser(description='My training script.')
parser.add_argument('--local_rank', type=int, default=-1,
help='local rank passed from distributed launcher')
# Include DeepSpeed configuration arguments
parser = deepspeed.add_config_arguments(parser)
cmd_args = parser.parse_args()
- deepspeed.add_config_arguments(parser)[source]
- Update the argument parser to enabling parsing of DeepSpeed command line arguments.
The set of DeepSpeed arguments include the following: 1) –deepspeed: boolean flag to enable DeepSpeed 2) –deepspeed_config <json file path>: path of a json configuration file to configure DeepSpeed runtime.
- Parameters
parser – argument parser
- Returns
Updated Parser
- Return type
parser
Training Initialization
The entrypoint for all training with DeepSpeed is deepspeed.initialize(). Will initialize distributed backend if it is not initialized already.
Example usage:
model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args,
model=net,
model_parameters=net.parameters())
- deepspeed.initialize(args=None, model: Optional[Module] = None, optimizer: Optional[Union[Optimizer, Callable[[Union[Iterable[Parameter], Dict[str, Iterable]]], Optimizer]]] = None, model_parameters: Optional[Module] = None, training_data: Optional[Dataset] = None, lr_scheduler: Optional[Union[_LRScheduler, Callable[[Optimizer], _LRScheduler]]] = None, distributed_port: int = 29500, mpu=None, dist_init_required: Optional[bool] = None, collate_fn=None, config=None, mesh_param=None, config_params=None)[source]
Initialize the DeepSpeed Engine.
- Parameters
args – an object containing local_rank and deepspeed_config fields. This is optional if config is passed.
model – Required: nn.module class before apply any wrappers
optimizer – Optional: a user defined Optimizer or Callable that returns an Optimizer object. This overrides any optimizer definition in the DeepSpeed json config.
model_parameters – Optional: An iterable of torch.Tensors or dicts. Specifies what Tensors should be optimized.
training_data – Optional: Dataset of type torch.utils.data.Dataset
lr_scheduler – Optional: Learning Rate Scheduler Object or a Callable that takes an Optimizer and returns a Scheduler object. The scheduler object should define a get_lr(), step(), state_dict(), and load_state_dict() methods
distributed_port – Optional: Master node (rank 0)’s free port that needs to be used for communication during distributed training
mpu – Optional: A model parallelism unit object that implements get_{model,data}_parallel_{rank,group,world_size}()
dist_init_required – Optional: None will auto-initialize torch distributed if needed, otherwise the user can force it to be initialized or not via boolean.
collate_fn – Optional: Merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
config – Optional: Instead of requiring args.deepspeed_config you can pass your deepspeed config as an argument instead, as a path or a dictionary.
config_params – Optional: Same as config, kept for backwards compatibility.
- Returns
A tuple of
engine,optimizer,training_dataloader,lr_schedulerengine: DeepSpeed runtime engine which wraps the client model for distributed training.optimizer: Wrapped optimizer if a user definedoptimizeris supplied, or if optimizer is specified in json config elseNone.training_dataloader: DeepSpeed dataloader iftraining_datawas supplied, otherwiseNone.lr_scheduler: Wrapped lr scheduler if userlr_scheduleris passed, or iflr_schedulerspecified in JSON configuration. OtherwiseNone.
Distributed Initialization
Optional distributed backend initialization separate from deepspeed.initialize(). Useful in scenarios where the user wants to use torch distributed calls before calling deepspeed.initialize(), such as when using model parallelism, pipeline parallelism, or certain data loader scenarios.
- deepspeed.init_distributed(dist_backend=None, auto_mpi_discovery=True, distributed_port=29500, verbose=True, timeout=datetime.timedelta(seconds=1800), init_method=None, dist_init_required=None, config=None, rank=-1, world_size=-1)[source]
Initialize dist backend, potentially performing MPI discovery if needed
- Parameters
dist_backend – Optional (str). torch distributed backend, e.g., nccl, mpi, gloo, hccl
Optional (auto_mpi_discovery) –
distributed_port – Optional (int). torch distributed backend port
verbose – Optional (bool). verbose logging
timeout – Optional (timedelta). Timeout for operations executed against the process group. The default value of 30 minutes can be overridden by the environment variable DEEPSPEED_TIMEOUT.
init_method – Optional (string). Torch distributed, URL specifying how to initialize the process group. Default is “env://” if no init_method or store is specified.
config – Optional (dict). DeepSpeed configuration for setting up comms options (e.g. Comms profiling)
rank – Optional (int). The current manually specified rank. Some init_method like “tcp://” need the rank and world_size as well (see: https://pytorch.org/docs/stable/distributed.html#tcp-initialization)
world_size – Optional (int). Desired world_size for the TCP or Shared file-system initialization.