After years of working with industry on large-scale numerical simulations, VFX studios on massive rendering, and other compute-intensive stuff such as AI-powered satellite image analysis, I have now 1 habit: I’m always looking for the biggest bang for the buck, preferably cloud-based since I hate operating hardware.
When it comes to HPC (high-performance computing), Xeon CPUs work great and are available in large supply, but regarding deep learning, it’s currently better to have GPUs at hand (even though Intel is making great strides with its Tensorflow Xeon optimizations).
Question is, how to get powerful GPUs in the cloud, for cheap?
AWS, Gcloud & co are clearly not an option (just multiply their GPU instance hourly rates by 24h x 30 days and compare the resulting monthly cost to your day job wage… done screaming? good now read on).
So the most obvious solution is to find dedicated server you can rent by the month. Surprisingly, a lot of providers that are usually very competitive (ex: OVH), are not when it comes to GPUs, probably due to shortages or Nvidia datacenter licence issues.
Hetzner stands out with a GTX 1080 gpu server at around 100€/month.
But recently at a wedding, I heard about Shadow PC, the cloud PC for gamers.
I went to their website, when I saw the specs (Nvidia Quadro P5000, 12GB RAM) and the price tag (around 40€/month), I immediately knew I needed to get one and see for myself what the fuss is all about.
One problem though: the PC comes with Windows 10. You can’t set up Linux dual boot, and GPU pass-through is not available on Windows desktop so you can’t use docker-nvidia nor a linux VM and use the machine GPU (+ anyway the Shadow PC is a VM already). This is a serious downside for anyone in machine learning.
So, we’ll have to stick with Windows.
Here is the step by step procedure to get up and running with a Tensorflow training:
– Sign up on Shadow website and follow the instructions
– Once you’re on your Windows 10 desktop, go to https://www.anaconda.com/downloads and install the Python 3.6 version, with all the extra options that are proposed
– Then, in the Windows search bar (bottom left), type “Powershell”, and open the executable
– Then run those commands:
conda update conda
conda update anaconda
conda update python
conda update –all
conda create –name tf-gpu
conda install -c aaronzs tensorflow-gpu (note: this channel proposes a recent Tensorflow binary for windows)
conda install -c anaconda cudatoolkit
conda install -c anaconda cudnn
– Now, Tensorflow and all necessary stuff is installed. Let’s test now that everything is working, and launch a train:
conda install -c anaconda git
git clone https://github.com/tensorflow/models.git
– We’re done! You should now see the ongoing Tensorflow training, on your Shadow PC. From what I could see, speed is decent, the hardware is crunching good.
Of course, additional work would be needed to use Shadow PCs in a production-ready pipeline. The lack of docker-nvidia availability means some scripts should be written to automate the steps above and other dependencies installs, depending on your use case. For VFX studios doing some rendering, there should be less problems, since most render farm managers (Deadline, Qube…) propose windows clients.
What is interesting is that gaming is very demanding on hardware, and Blade had to be able to propose good hardware specs, at a competitive price tag, in order to be competitive on the cloud gaming market: mass market is far more price-sensitive than corporations.
Thing is, at the same time this offering is very interesting for all people in need of cheap computing power in the cloud.
One can only hope Blade will soon consider Linux-powered shadow PCs.