Nvidia V100 32Gb on Dell PowerEdge R730 + ESXi (Part 2)

If you have already perused my last article then I’m assuming (and hope) that you have managed to successfully install the Nvidia V100 card into your Dell PowerEdge server. The link is here:-
https://www.mutface.com/nvidia-v100-32gb-on-dell-poweredge-r730-on-esxi/

For ESXi – I have used version 8.0 with the Dell variant; this is the exact version I used (it’s probably not the latest now but I managed to install it and it worked for me)
VMware-VMvisor-Installer-8.0.0.update03-24022510.x86_64-Dell_Customized-A01.iso – You can click on this link to download the ISO. Officially ESXi 8.0 is not supported by R730 but actually it works just fine. It just hasn’t been tested properly. ESXi 8.0 is good and supports GPU passthrough or (“Passthru”).

After ESXi has been successfully installed then you need to do a couple of things to allow for the passthru to happen.

(1) Go to Host > Manage > Hardware > PCI Devices

Hopefully the V100 card will show up; it shows up like this on my machine.

passthru setup ESXi for GPU

You select the device and then click on “Toggle passthrough”, after which you should get the successful activation like this:-

passthrough successful ESXi

BTW – before I forget – you don’t actually need to download the Nvidia vib to install ESXi; in fact the Nvidia site was really useless and I couldn’t download the vib but I managed to get things working without it. Don’t bother signing up you won’t get anywhere.

Assuming that you are familiar with ESXi Shell – you can login using command line and you can check with the following command to see whether the GPU is being detected:-

lspci | grep NVIDIA

which should give you this:-

(2) When creating the Virtual Machine (VM) there are a number of things you need to do otherwise it won’t boot

After you have allocated the memory to the VM you actually need to define the reserved memory. So I allocated 8Gb of RAM and then had to make sure that this memory was reserved:-

If you don’t do this – you will get an error when starting the VM.

(3) Set Passthru parameters in the VM options – this is necessary for GPIUs

VM Options > Advanced > Configuration Parameters > Edit Configuration

Two additional key/value pairs need to be added.

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = 64

You actually have to click on Add to manually add these; they don’t exist in the config by default.

Once you’ve added these the VM should boot up.

If you have the 32Gb version of the V100 GPU then you should allocate at least 64Gb of RAM to the VM if possible. In terms of what Operating System you should run; if you are planning on running Linux then Ubuntu Server is probably the best flavour to use. I did some research and it seems that the best supported version for this slightly older card is Ubuntu 22.

Here are some of the commands I run to install Nvidia Drivers + Cuda

sudo ubuntu-drivers autoinstall

This installs a lot of the generic Nvidia drivers for Ubuntu and then to install the nvidia drivers you can do the following (which adds Nvidia) to the apt install repo system

curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub | sudo gpg --dearmor | sudo tee /usr/share/keyrings/nvidia-drivers.gpg > /dev/null 2>&1

and
echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /' | sudo tee /etc/apt/sources.list.d/nvidia-drivers.list

after you have imported these repos then you would need to run

sudo apt update

I actually install Nvidia Driver 575 (open source) version the Cuda drivers like this:-

sudo apt install nvidia-driver-575-open cuda-drivers-575 cuda

After all the things have been installed then you need to reboot the server.

sudo reboot

Hopefully after that when you run:-

nvidia-smi

then the V100 card will show like this:-

Really hope this information has been useful to anyone else doing this setup.