Training Setup

Argument Parsing

DeepSpeed uses the argparse library to supply commandline configuration to the DeepSpeed runtime. Use deepspeed.add_config_arguments() to add DeepSpeed’s builtin arguments to your application’s parser.

parser = argparse.ArgumentParser(description='My training script.')
parser.add_argument('--local_rank', type=int, default=-1,
                    help='local rank passed from distributed launcher')
# Include DeepSpeed configuration arguments
parser = deepspeed.add_config_arguments(parser)
cmd_args = parser.parse_args()

deepspeed.add_config_arguments(parser)[source]

Update the argument parser to enabling parsing of DeepSpeed command line arguments.: The set of DeepSpeed arguments include the following: 1) –deepspeed: boolean flag to enable DeepSpeed 2) –deepspeed_config <json file path>: path of a json configuration file to configure DeepSpeed runtime.

Parameters: parser – argument parser
Returns: Updated Parser
Return type: parser

Training Initialization

The entrypoint for all training with DeepSpeed is deepspeed.initialize(). Will initialize distributed backend if it is not initialized already.

Example usage:

model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args,
                                                     model=net,
                                                     model_parameters=net.parameters())

deepspeed.initialize(args=None, model: Optional[Module] = None, optimizer: Optional[Union[Optimizer, Callable[[Union[Iterable[Parameter], Dict[str, Iterable]]], Optimizer]]] = None, model_parameters: Optional[Module] = None, training_data: Optional[Dataset] = None, lr_scheduler: Optional[Union[_LRScheduler, Callable[[Optimizer], _LRScheduler]]] = None, distributed_port: int = 29500, mpu=None, dist_init_required: Optional[bool] = None, collate_fn=None, config=None, mesh_param=None, config_params=None)[source]

Initialize the DeepSpeed Engine.

Parameters

args – an object containing local_rank and deepspeed_config fields. This is optional if config is passed.
model – Required: nn.module class before apply any wrappers
optimizer – Optional: a user defined Optimizer or Callable that returns an Optimizer object. This overrides any optimizer definition in the DeepSpeed json config.
model_parameters – Optional: An iterable of torch.Tensors or dicts. Specifies what Tensors should be optimized.
training_data – Optional: Dataset of type torch.utils.data.Dataset
lr_scheduler – Optional: Learning Rate Scheduler Object or a Callable that takes an Optimizer and returns a Scheduler object. The scheduler object should define a get_lr(), step(), state_dict(), and load_state_dict() methods
distributed_port – Optional: Master node (rank 0)’s free port that needs to be used for communication during distributed training
mpu – Optional: A model parallelism unit object that implements get_{model,data}_parallel_{rank,group,world_size}()
dist_init_required – Optional: None will auto-initialize torch distributed if needed, otherwise the user can force it to be initialized or not via boolean.
collate_fn – Optional: Merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
config – Optional: Instead of requiring args.deepspeed_config you can pass your deepspeed config as an argument instead, as a path or a dictionary.
config_params – Optional: Same as config, kept for backwards compatibility.

Returns

A tuple of engine, optimizer, training_dataloader, lr_scheduler

engine: DeepSpeed runtime engine which wraps the client model for distributed training.
optimizer: Wrapped optimizer if a user defined optimizer is supplied, or if optimizer is specified in json config else None.
training_dataloader: DeepSpeed dataloader if training_data was supplied, otherwise None.
lr_scheduler: Wrapped lr scheduler if user lr_scheduler is passed, or if lr_scheduler specified in JSON configuration. Otherwise None.

Distributed Initialization

Optional distributed backend initialization separate from deepspeed.initialize(). Useful in scenarios where the user wants to use torch distributed calls before calling deepspeed.initialize(), such as when using model parallelism, pipeline parallelism, or certain data loader scenarios.

deepspeed.init_distributed(dist_backend=None, auto_mpi_discovery=True, distributed_port=29500, verbose=True, timeout=datetime.timedelta(seconds=1800), init_method=None, dist_init_required=None, config=None, rank=-1, world_size=-1)[source]

Initialize dist backend, potentially performing MPI discovery if needed

Parameters

dist_backend – Optional (str). torch distributed backend, e.g., nccl, mpi, gloo, hccl
Optional (auto_mpi_discovery) –
distributed_port – Optional (int). torch distributed backend port
verbose – Optional (bool). verbose logging
timeout – Optional (timedelta). Timeout for operations executed against the process group. The default value of 30 minutes can be overridden by the environment variable DEEPSPEED_TIMEOUT.
init_method – Optional (string). Torch distributed, URL specifying how to initialize the process group. Default is “env://” if no init_method or store is specified.
config – Optional (dict). DeepSpeed configuration for setting up comms options (e.g. Comms profiling)
rank – Optional (int). The current manually specified rank. Some init_method like “tcp://” need the rank and world_size as well (see: https://pytorch.org/docs/stable/distributed.html#tcp-initialization)
world_size – Optional (int). Desired world_size for the TCP or Shared file-system initialization.