It is a modern (technological) saying: data is the new oil. Oil has revolutionized the world through its widespread use as an energy source. The hidden energy in oil is released when combined with oxygen. Similarly, the hidden “intelligence” in big data is only released when combined with computing. Computing is then the new oxygen of AI.
A significant portion of that oxygen comes from a single source: Nvidia GPUs. Nvidia is the darling of the tech and stock market world today. Nvidia’s earnings announcement is the most anticipated financial event. On February 22, 2024, Nvidia’s stock surged to create the largest single-day market capitalization increase ever—$277 billion! How did it get to such a dominant position?
It is the insatiable thirst for computation of artificial intelligence that has caused this phenomenon. The real answer lies in Nvidia’s mission to make high-performance computing (HPC) accessible even to modest budgets. It is a story of foresight, perseverance and luck that has favored the most prepared!
Nvidia began making GPUs, or add-in cards, for 3D graphics and computer gaming in 1993. The demand for increased speed, resolution, and image quality required specialized hardware. “Moore’s Law” was in full force, with the number of transistors on a chip doubling roughly every two years. PC processors got faster and cheaper until about 2000, when they hit a wall because their complex designs couldn’t efficiently exploit the extra transistors. GPUs (Graphics Processing Units), however, had simpler architectures and could productively use more transistors to perform simple, nearly identical calculations on a large number of elements, such as pixels in an image. Nvidia wasn’t the only company making GPUs. It had competition from 3dfx, 3Dlabs, ATI (now part of AMD), S3, and others. Today, AMD and Nvidia are neck and neck in the gaming GPU space, while others have disappeared.
As GPUs became more powerful, parts of them became programmable, mostly for interesting visual effects. Clever researchers realized that GPUs were like the specialized matrix processors built in the 1970s that used a SIMD (Single Instruction on Multiple Data) design. Earlier experience was effectively recycled to implement fundamental operations like matrix multiplication, FFT, sorting, etc., on the GPU in the mid-2000s.
Jen-Hsun (Jensen) Huang, Nvidia’s founder and CEO, saw an opportunity to “democratize high-performance computing.” He envisioned providing specialized computing for specific applications, with 3D graphics being the first. GPUs released in 2006 had multiple identical processing units, instead of specialized units for processing vertices and pixels. Nvidia positioned GPUs as affordable, accessible parallel processors, offering around 350 GFLOPS at $400!
At the heart of the strategy was the CUDA parallel computing platform to harness the power of the GPU via high-level APIs. It also included cutting-edge compilers, runtimes, debuggers, drivers, etc. to facilitate adoption as parallel processors. The end-to-end or full-stack approach is the major factor in Nvidia’s phenomenal success in HPC. Success was far from guaranteed, but Jensen persisted. GPUs using CUDA began to power protein folding, oil and gas exploration, etc., in addition to graphics and multimedia processing. Many academics began developing algorithms and techniques to use GPUs for various problems: computer vision, ray tracing, graphics algorithms, sorting, etc. By 2012, GPUs were widely used as computational accelerators; 13 of the top 100 supercomputers used Nvidia GPUs, including 2 of the top 10. (Today, 53 of the top 100 and 6 of the top 10 use them. GPUs use less power for computation and now appear in 70 of the top 100 and 7 of the top 10 Green-500 supercomputers.) Nvidia’s focus on HPC would seemingly pay off in the years to come.
Life has become more interesting, with luck meeting the prepared. Deep Learning has burst onto the scene to transform AI and with it, the computing landscape and Nvidia. Artificial neural networks had existed for decades as shallow multi-layer perceptron networks that did not scale to larger problems. Deep neural networks (DNNs) and multi-layer convolutional neural networks (CNNs) have been experimented with by a few. They require massive amounts of data and computing power to train. Huge amounts of text, voice, and image data under varied conditions have become available with the explosion of the internet, cheap sensors like cameras, microphones, and smartphones, etc. Computation, however, remained a problem.
In 2012, AlexNet revolutionized the AI landscape by beating ImageNet’s recognition challenge by a wide margin! Images are large and require large networks to process. Alex Krizhevsky trained a CNN of over 60 million parameters on two Nvidia GTX580 GPUs in about 7 days. This could not have been attempted without GPUs. Deep networks then became the only game in town for most AI tasks, with GPUs providing the oxygen. Deeper and larger networks and new architectures emerged later; they require more data and computation.
Nvidia was quick to spot the potential of AI and pivoted to an “AI-first” company, focusing heavily on AI compute. Architectural features tailored to AI compute, such as 16-bit and 8-bit floating point numbers, were added to the hardware. Equally remarkable is the development of software tools and libraries to leverage the GPU for deep learning. CUDA libraries like cuBLAS and cuDNN integrate seamlessly with high-level tools like TensorFlow and PyTorch developed by others. With the advent of heavier networks like Transformers, the demand for compute has exploded. The huge ripple effect created by the introduction of LLMs like ChatGPT to the world has made AI a global buzzword. With it, the voracious demand for data and compute has increased.
Advances in chip technology are making GPUs faster every year. Nvidia combines this with improvements in architecture, number representation, memory, and more. Their latest Hopper GPUs feature transformer engines to help train fundamental models. The overall performance of Nvidia GPUs has doubled every year for the past decade; this is colloquially known as “Huang’s Law.”
Computing power is a critical resource in today’s world. Nvidia is sitting comfortably with its dominant GPU offerings. Nvidia’s data center or hyperscaler market is 10x larger than the gaming market today, with no credible competition. Other companies are playing catch-up. Google is building its own Tensor Processing Units (TPUs) to accelerate AI. Other large companies and several startups are building AI processors as alternatives. Cerebras is taking a radical approach to building wafer-scale engines with a million compute cores. None of these engines are available in an easy-to-use way.
PJ Narayanan
Professor PJ Narayanan is Director of IIIT Hyderabad.