Creating new horizons with rcuda: the power of remote virtualization Federico Silla Universitat Politècnica de València Spain
s are great! HPC Advisory Council Perth Conference 2017 2/55
Basics of computing (single node) Basic behavior of CUDA HPC Advisory Council Perth Conference 2017 3/55
Basics of computing (single node) s can only be used within the node they are attached to Basic behavior of CUDA HPC Advisory Council Perth Conference 2017 4/55
Using s across the cluster A -enabled cluster is a set of independent self-contained nodes that share nothing among them: MPI is required for aggregating resources within the cluster (included s) node 1 node 2 node 3 node n Interconnection HPC Advisory Council Perth Conference 2017 5/55
s are great! How can we make an even better usage of s? HPC Advisory Council Perth Conference 2017 6/55
s are great! How can we make an even better usage of s? Which characteristics do we miss from s? HPC Advisory Council Perth Conference 2017 7/55
Characteristics missing in s How can we make an even better usage of s? Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) HPC Advisory Council Perth Conference 2017 8/55
Characteristics missing in s 1. Why many s in a single box Traditionally, in order to use many s, applications had to use MPI: s can only be used within the node they are attached to Nothing is directly shared among nodes (MPI required for aggregating computing resources within the cluster) node 1 node 2 node 3 node n A non-mpi application running in this node can only use the s in this node Interconnection HPC Advisory Council Perth Conference 2017 9/55
Characteristics missing in s 1. Many s in a single box HPC Advisory Council Perth Conference 2017 10/55
Characteristics missing in s 1. Many s in a single box The amount of s is limited by the physical space inside the node HPC Advisory Council Perth Conference 2017 11/55
Characteristics missing in s 1. Having many s in a single box MonteCarlo multi- program running in 10 NVIDIA Tesla K40 s HPC Advisory Council Perth Conference 2017 12/55
Characteristics missing in s 1. Having many s in a single box 64 s!! HPC Advisory Council Perth Conference 2017 13/55
Characteristics missing in s 1. Many s in a single box How many s are many s in a single box? HPC Advisory Council Perth Conference 2017 14/55
Characteristics missing in s 1. Many s in a single box How many s are many s in a single box? As many s as they can be installed in the cluster node 1 node 2 node 3 node n Interconnection HPC Advisory Council Perth Conference 2017 15/55
Characteristics missing in s How can we make an even better usage of s? Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) HPC Advisory Council Perth Conference 2017 16/55
Characteristics missing in s 2. Easily sharing a given Why should we be interested on sharing s among applications? HPC Advisory Council Perth Conference 2017 17/55
usage of -Blast assigned but not used assigned but not used NVIDIA Tesla K20 HPC Advisory Council Perth Conference 2017 18/55
usage of CUDA-MEME utilization is far away from maximum NVIDIA Tesla K20 HPC Advisory Council Perth Conference 2017 19/55
usage of LAMMPS assigned but not used NVIDIA Tesla K20 HPC Advisory Council Perth Conference 2017 20/55
Sharing a among jobs: -Blast One instance required about 51 seconds Two concurrent instances of -Blast HPC Advisory Council Perth Conference 2017 21/55
Sharing a among jobs: -Blast First instance Two concurrent instances of -Blast HPC Advisory Council Perth Conference 2017 22/55
Sharing a among jobs: -Blast Second instance First instance Two concurrent instances of -Blast HPC Advisory Council Perth Conference 2017 23/55
Sharing a among applications K20 (5GB memory) LAMMPS: 876 MB mcuda-meme: 151 MB BarraCUDA: 3319 MB MUMmer: 2104 MB -LIBSVM: 145 MB HPC Advisory Council Perth Conference 2017 24/55
Sharing a among applications K20 (5GB memory) LAMMPS: 876 MB mcuda-meme: 151 MB BarraCUDA: 3319 MB MUMmer: 2104 MB -LIBSVM: 145 MB The main concern for sharing a is the memory limitation HPC Advisory Council Perth Conference 2017 25/55
Characteristics missing in s How can we make an even better usage of s? Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) HPC Advisory Council Perth Conference 2017 26/55
Easily sharing a among VMs A is assigned to a VM by using PCI passthrough Assignment is done exclusively to a single virtual machine. Concurrent usage of the is not possible HPC Advisory Council Perth Conference 2017 27/55
Easily sharing a among VMs High performance network available Low performance network available HPC Advisory Council Perth Conference 2017 28/55
Characteristics missing in s Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) HPC Advisory Council Perth Conference 2017 29/55
Characteristics missing in s Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) The remote virtualization technique can efficiently address these concerns HPC Advisory Council Perth Conference 2017 30/55
Characteristics missing in s node 1 node 2 node 3 node n Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council Perth Conference 2017 31/55
Characteristics missing in s Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council Perth Conference 2017 32/55
Characteristics missing in s Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council Perth Conference 2017 33/55
Characteristics missing in s Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council Perth Conference 2017 34/55
Remote virtualization What is remote virtualization? HPC Advisory Council Perth Conference 2017 35/55
Basics of computing Basic behavior of CUDA HPC Advisory Council Perth Conference 2017 36/55
Basics of computing HPC Advisory Council Perth Conference 2017 37/55
Remote virtualization No HPC Advisory Council Perth Conference 2017 38/55
rcuda remote CUDA A software technology that enables a more flexible use of s in computing facilities No rcuda is a development by Technical University of Valencia HPC Advisory Council Perth Conference 2017 39/55
Basics of rcuda rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council Perth Conference 2017 40/55
Basics of rcuda rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council Perth Conference 2017 41/55
Remote virtualization envision Remote virtualization allows a new vision of a deployment, moving from the usual cluster configuration: node 1 node 2 node 3 node n Physical configuration Interconnection to the following one: node 1 Logical connections node 2 node 3 node n Logical configuration Interconnection HPC Advisory Council Perth Conference 2017 42/55
Performance of rcuda Guy Kawasaki, marketing specialist and Silicon Valley venture capitalist Ideas Are Easy, Implementation Is Hard HPC Advisory Council Perth Conference 2017 43/55
Performance of rcuda to Higher is better to HPC Advisory Council Perth Conference 2017 44/55
Performance of rcuda to Higher is better to HPC Advisory Council Perth Conference 2017 45/55
Performance of rcuda to Higher is better to HPC Advisory Council Perth Conference 2017 46/55
Performance of rcuda CUDA rcuda rcuda scenario 1 rcuda scenario 2 HPC Advisory Council Perth Conference 2017 47/55
Performance of rcuda CUDA rcuda rcuda scenario 1 Ideas Are Easy, Implementation Is Hard rcuda scenario 2 Guy Kawasaki, marketing specialist and Silicon Valley venture capitalist Higher is better HPC Advisory Council Perth Conference 2017 48/55
Performance of applications using rcuda K40 s and EDR InfiniBand Lower is better MonteCarlo multi- program running in 10 NVIDIA Tesla K40 s HPC Advisory Council Perth Conference 2017 49/55
Performance of applications using rcuda 64 s!! HPC Advisory Council Perth Conference 2017 50/55
Performance of applications using rcuda K20 and FDR InfiniBand K40 and EDR InfiniBand Lower is better HPC Advisory Council Perth Conference 2017 51/55
Performance of applications using rcuda EDR InfiniBand and P100 Lower is better BarraCUDA CUDA-MEME Lower is better HPC Advisory Council Perth Conference 2017 52/55
Get a free copy of rcuda at http://www.rcuda.net More than 850 requests world wide @rcuda_ rcuda is a development by Technical University of Valencia HPC Advisory Council Perth Conference 2017 53/55
Tony Díaz Pablo Higueras Javier Prades Carlos Reaño Jaime Sierra Federico Silla rcuda is a development by Technical University of Valencia HPC Advisory Council Perth Conference 2017 54/55
Thanks! Questions? rcuda is a development by Technical University of Valencia HPC Advisory Council Perth Conference 2017 55/55