Prerequisites

Understand how to select nodes for a job and create a pbs script and running a batch or interactive job.

Pytorch is preinstalled. You must run a batch or interactive job to run pytorch jobs

Pytorch Examples

Pytorch requires a node with a CUDA supported GPU. There are a select number of GPU nodes available. Below are several examples. Copy each script an place them in a working directory. To run the script enter the command “sh mpi_test.sh” or “rpc_test.sh”.

MPI Example:

This example illustrates Pytorch running in parallel on one or more nodes with MPI. Copy each script an place them in a working directory.

PBS script “mpi_test.sh”:

#PBS -N pytorch_mpi_test
#PBS -l nodes=2:ppn=20:cuda
cd $PBS_O_WORKDIR
module load pytorch/cuda/mpi/2.5.1 mpiexec python3 mpi_test.py

Pytorch script “mpi_test.py”:

import os
import torch
import torch.distributed as dist
# Environment variables set by torch.distributed.launch
LOCAL_RANK = int(os.environ['OMPI_COMM_WORLD_LOCAL_RANK'])
WORLD_SIZE = int(os.environ['OMPI_COMM_WORLD_SIZE'])
WORLD_RANK = int(os.environ['OMPI_COMM_WORLD_RANK'])
def run(backend):
 tensor = torch.zeros(1)
# Need to put tensor on a GPU device for nccl backend
 if backend == 'nccl':
 device = torch.device("cuda:{}".format(LOCAL_RANK))
 tensor = tensor.to(device)

if WORLD_RANK == 0:
 for rank_recv in range(1, WORLD_SIZE):
 dist.send(tensor=tensor, dst=rank_recv)
 print('worker_{} sent data to Rank {}\n'.format(0, rank_recv))
 else:
 dist.recv(tensor=tensor, src=0)
 print('worker_{} has received data from rank {}\n'.format(WORLD_RANK, 0))

def init_processes(backend):
 if backend == 'mpi':
 dist.init_process_group(backend)
 else:
 dist.init_process_group(backend, rank=WORLD_RANK, world_size=WORLD_SIZE)

run(backend)
if __name__ == "__main__":
backend='mpi'
 dist.init_process_group(backend)
 run(backend)

RPC Example

In this example, one “rpc_test.py” runs on each node. Copy each script an place them in a working directory.
PBS script “rpc_test.sh”

PBS -l nodes=2:ppn=20:cuda
#PBS -N rpc_test
cd $PBS_O_WORKDIR
module load pytorch/cuda/mpi/2.5.1
export MASTER_ADDR=$HOSTNAME
export MASTER_PORT=8394
# In this example, one "rpc_test.py" runs on each node.
mpiexec -npernode 1 python3 rpc_test.py

 

Pytorch script “rpc_test.py”

 

import torch
import torch.distributed.rpc as rpc
from torch import Tensor
import os

def remote_fn(x: Tensor, n: int) -> Tensor:
 return x * n

if __name__ == "__main__":
 rank = int( os.environ.get("OMPI_COMM_WORLD_RANK") )
 world_size = int( os.environ.get("OMPI_COMM_WORLD_SIZE") )
 name = "worker" + str(rank)

 rpc.init_rpc(
 name=name,
 rank=rank,
 world_size=world_size
 )

 if rank == 0:
 workers = [(f"worker{n}", n) for n in range(1, world_size)]
 for worker, rank in workers:
 result = rpc.rpc_sync(worker, remote_fn, args=(torch.tensor(5), rank + 1))
 print(result)
 print("I AM ALL DONE")


 rpc.shutdown()