Supercrunch Blog

Markus Herrmann August 14, 2017 DevOps

Crunching with GPUs – finally fit for general purpose computing

Using a GPU for computationally intensive and embarrassingly parallel tasks is nothing new; neither in science, nor in general IT. We have been playing around with (CUDA-based) general purpose GPU computing (GPGPU) for several years at GfK and have to admit that GPUs are now finally suitable for general purpose computing. Why? Because the setup and usability has never been as easy and useful as it is now.

A look back at the ‘good old days’

Rewind back to the beginning of this decade when data science was simply statistics and we talked about logistic regressions rather than AI. We mainly used GPUs to speed up matrix operations in marketing science computing. In terms of coding, this was a (more or less) painful process. We either used one of the few available methods (basically matrix algebra libraries) in R, Python or Matlab, or we had to dynamically link our own CUDA-compiled C/C++ code at run-time.

Sometimes these approaches resulted in tangible speed-ups, but as the available GPU methods were limited, the motivation of our R and Python users to go the extra mile of advanced C/C++ coding was low. Additionally, the set-up and operation of the overall GPU stack (including dependencies) often turned out to be a manual build and configure nightmare; especially when updating. Not least because the total capacity of GPU-attached RAM (1-2GB at that time) was considered a bottleneck and required careful memory allocation.¹

The rise of neural networks in data science

Nowadays there are a huge number of GPU-based ‘ready-to-go’ applications, frameworks and libraries available. This is largely due to the rise of neural networks in data science and the fact that neural network (e.g. deep learning) applications tremendously benefit from the utilization of a GPU. Add to this the fact that the overall usability of applications harnessing GPUs has become really comfortable (really fast). Modern neural network libraries and frameworks offer broad and scalable functionalities and are well integrated into R (e.g. Tensorflow, MXNet, H2O, DeepNet) and Python (e.g. Tensorflow, Theano, Keras, Caffe). Such integrated libraries enable our data scientists to easily shift tasks to the GPU – seamlessly inside their well-known environments by just setting simple device parameters.

And even the manual installation and setup of drivers, frameworks and applications is not a big thing anymore. The CUDA toolkit and all go together libraries are nowadays easily installable via official NVIDIA packages (for the main Linux distributions) and, most of the time, our data scientists are satisfied with installations of additional libraries via package-managers under their own control.

The rise of containers in software engineering

Additionally, the recently matured possibility of utilizing GPUs within Docker containers is a strong argument to try out things on a GPU. Getting a fresh and dedicated fully configured GPU computing environment by just firing up a NVIDIA-Docker container – it has never been so easy!

Also in terms of standardizing and isolating infrastructure, the containerized CUDA environments will own the GPU computing future! Containerized GPU-based applications can now be easily shared, tested and deployed across different environments and thereby enable full control over the GPU resource allocation.

The status quo of GPGPU at SUPERCRUNCH

At SUPERCRUNCH we are already making heavy use of containerized applications in order to leverage our tools and applications with the power of GPUs. Be it a “simple” matrix or advanced tensor operations, non-parametric learning of dependencies or general classification tasks, the GPU is already a constant companion to our data scientists.

In terms of technical specifications of our data science edge nodes, we are currently betting on NVIDIA’s P100 in combination with two E5 CPUs, 1.5TB RAM and NVME memory in a Dell PowerEdge R730xd chassis.



The future of GPU computing at SUPERCRUNCH

We are impatiently expecting the “native support of GPU configuration, discovery, scheduling and isolation on YARN“². Once available, we would put the NVIDIA P100 not only into our data science edge nodes, but also onto our Hadoop worker nodes.

There are already some nice niche products, such as BigDL, CaffeOnSpark or Deeplearning4J available on the market, but these solutions are still too specialized for general purpose GPU computing in the Hadoop/Spark ecosystem.

Having said that, there will be huge opportunities of GPGPU in the future of data science: A new era will start soon; once it is possible to operate GPU-leveraging containers on YARN. Stay tuned and follow our GPGPU activities!


¹ Lilienthal/Herrmann – GPU-Computing mit R –

² Hadoop YARN JIRA Board: