{"id":2042,"date":"2025-05-06T13:25:05","date_gmt":"2025-05-06T17:25:05","guid":{"rendered":"http:\/\/localhost:8280\/?p=2042"},"modified":"2025-05-06T13:27:50","modified_gmt":"2025-05-06T17:27:50","slug":"pytorch","status":"publish","type":"post","link":"https:\/\/knowledgebasev.kinsta.cloud\/?p=2042","title":{"rendered":"Pytorch"},"content":{"rendered":"<h3>Prerequisites<\/h3>\n<p>Understand how to select\u00a0<a title=\"2) Getting Information About Processors, Compute Nodes, and Your Jobs\" href=\"https:\/\/knowledgebasev.kinsta.cloud\/?p=156\">nodes<\/a>\u00a0for a job and create a\u00a0<a title=\"4) The PBS script \u2013 Defining Your Job\" href=\"https:\/\/knowledgebasev.kinsta.cloud\/?p=233\">pbs script<\/a>\u00a0and running a\u00a0<a title=\"5) Running Jobs \u2013 Batch and Interactive\" href=\"https:\/\/knowledgebasev.kinsta.cloud\/?p=223\" target=\"_blank\">batch or interactive job<\/a>.<\/p>\n<p>Pytorch is preinstalled. You must run a\u00a0<a href=\"https:\/\/knowledgebasev.kinsta.cloud\/?p=233\">batch or interactive job<\/a>\u00a0to run\u00a0pytorch jobs<\/p>\n<h3>Pytorch Examples<\/h3>\n<p>Pytorch requires a node with a CUDA supported GPU. There are a select number of GPU nodes available.\u00a0Below are several examples.\u00a0Copy each script an place them in a working directory. To run the script enter the command &#8220;sh mpi_test.sh&#8221; or &#8220;rpc_test.sh&#8221;.<\/p>\n<p><strong>MPI Example:<\/strong><\/p>\n<p>This example illustrates Pytorch running in parallel on one or more nodes with MPI. Copy each script an place them in a working directory.<\/p>\n<p>PBS script &#8220;mpi_test.sh&#8221;:<\/p>\n<pre>#PBS -N pytorch_mpi_test\r\n#PBS -l nodes=2:ppn=20:cuda\r\ncd $PBS_O_WORKDIR\r\nmodule load pytorch\/cuda\/mpi\/2.5.1 mpiexec python3 mpi_test.py<\/pre>\n<p>Pytorch script &#8220;mpi_test.py&#8221;:<\/p>\n<pre>import os\r\nimport torch\r\nimport torch.distributed as dist\r\n# Environment variables set by torch.distributed.launch\r\nLOCAL_RANK = int(os.environ['OMPI_COMM_WORLD_LOCAL_RANK'])\r\nWORLD_SIZE = int(os.environ['OMPI_COMM_WORLD_SIZE'])\r\nWORLD_RANK = int(os.environ['OMPI_COMM_WORLD_RANK'])\r\ndef run(backend):\r\n tensor = torch.zeros(1)\r\n# Need to put tensor on a GPU device for nccl backend\r\n if backend == 'nccl':\r\n device = torch.device(\"cuda:{}\".format(LOCAL_RANK))\r\n tensor = tensor.to(device)\r\n\r\nif WORLD_RANK == 0:\r\n for rank_recv in range(1, WORLD_SIZE):\r\n dist.send(tensor=tensor, dst=rank_recv)\r\n print('worker_{} sent data to Rank {}\\n'.format(0, rank_recv))\r\n else:\r\n dist.recv(tensor=tensor, src=0)\r\n print('worker_{} has received data from rank {}\\n'.format(WORLD_RANK, 0))\r\n\r\ndef init_processes(backend):\r\n if backend == 'mpi':\r\n dist.init_process_group(backend)\r\n else:\r\n dist.init_process_group(backend, rank=WORLD_RANK, world_size=WORLD_SIZE)\r\n\r\nrun(backend)\r\nif __name__ == \"__main__\":\r\nbackend='mpi'\r\n dist.init_process_group(backend)\r\n run(backend)<\/pre>\n<p><strong>RPC Example<\/strong><\/p>\n<p>In this example, one &#8220;rpc_test.py&#8221; runs on each node.\u00a0Copy each script an place them in a working directory.<br \/>\nPBS script &#8220;rpc_test.sh&#8221;<\/p>\n<pre>PBS -l nodes=2:ppn=20:cuda\r\n#PBS -N rpc_test\r\ncd $PBS_O_WORKDIR\r\nmodule load pytorch\/cuda\/mpi\/2.5.1\r\nexport MASTER_ADDR=$HOSTNAME\r\nexport MASTER_PORT=8394\r\n# In this example, one \"rpc_test.py\" runs on each node.\r\nmpiexec -npernode 1 python3 rpc_test.py<\/pre>\n<p>&nbsp;<\/p>\n<p>Pytorch script &#8220;rpc_test.py&#8221;<\/p>\n<p>&nbsp;<\/p>\n<pre>import torch\r\nimport torch.distributed.rpc as rpc\r\nfrom torch import Tensor\r\nimport os\r\n\r\ndef remote_fn(x: Tensor, n: int) -&gt; Tensor:\r\n return x * n\r\n\r\nif __name__ == \"__main__\":\r\n rank = int( os.environ.get(\"OMPI_COMM_WORLD_RANK\") )\r\n world_size = int( os.environ.get(\"OMPI_COMM_WORLD_SIZE\") )\r\n name = \"worker\" + str(rank)\r\n\r\n rpc.init_rpc(\r\n name=name,\r\n rank=rank,\r\n world_size=world_size\r\n )\r\n\r\n if rank == 0:\r\n workers = [(f\"worker{n}\", n) for n in range(1, world_size)]\r\n for worker, rank in workers:\r\n result = rpc.rpc_sync(worker, remote_fn, args=(torch.tensor(5), rank + 1))\r\n print(result)\r\n print(\"I AM ALL DONE\")\r\n\r\n\r\n rpc.shutdown()\r\n\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Prerequisites Understand how to select\u00a0nodes\u00a0for a job and create a\u00a0pbs script\u00a0and running a\u00a0batch or interactive job. Pytorch is preinstalled. You must run a\u00a0batch or interactive job\u00a0to run\u00a0pytorch jobs Pytorch Examples Pytorch requires a node with a CUDA supported GPU. There are a select number of GPU nodes available.\u00a0Below are several examples.\u00a0Copy each script an place [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-2042","post","type-post","status-publish","format-standard","hentry","category-software-specific-guides"],"_links":{"self":[{"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=\/wp\/v2\/posts\/2042","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2042"}],"version-history":[{"count":8,"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=\/wp\/v2\/posts\/2042\/revisions"}],"predecessor-version":[{"id":2050,"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=\/wp\/v2\/posts\/2042\/revisions\/2050"}],"wp:attachment":[{"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2042"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2042"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/knowledgebasev.kinsta.cloud\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2042"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}