What hardware runs AI?
AI workloads require specialized processors optimized for parallel computation. GPUs dominate, with custom AI chips from major manufacturers competing for the market.
Why can't AI just run on regular computers?
It can, slowly. A modern CPU can run a small language model. But training and running large models requires something different: massive parallelism.
Neural networks are fundamentally parallel operations. Matrix multiplications across billions of parameters, repeated for every token. CPUs excel at complex sequential tasks. AI needs simpler operations done billions of times simultaneously.
This is why GPUs (Graphics Processing Units) dominate AI. They were designed for parallel computation (rendering millions of pixels), which happens to be exactly what neural networks need.
CPU vs GPU vs TPU
CPU (Central Processing Unit):
- Few powerful cores (4-64)
- Optimized for sequential, complex tasks
- Flexible, general-purpose
- Can run AI, but slowly
GPU (Graphics Processing Unit):
- Thousands of simpler cores
- Optimized for parallel computation
- Originally for graphics, now AI workhorses
- Orders of magnitude faster than CPU for AI
TPU (Tensor Processing Unit):
- Google's custom AI chip
- Designed specifically for matrix operations
- Available via Google Cloud
- Competitive performance, different architecture
The NVIDIA dominance
NVIDIA GPUs power most AI training and inference. Their dominance comes from:
- CUDA: Software platform that makes GPU programming accessible. Years of ecosystem development.
- Performance: Leading hardware capabilities in each generation.
- Memory: High-bandwidth memory crucial for large models.
- First-mover advantage: They invested in AI early when others didn't see it.
Key products:
- H100: Current flagship for data centers ($25,000-40,000 each)
- A100: Previous generation, still widely used
- Consumer RTX cards: Accessible for researchers and hobbyists
The demand for NVIDIA hardware during the AI boom pushed their market cap past $1 trillion.
Memory is often the bottleneck
Large models need large memory. A 70-billion parameter model at 16-bit precision needs 140 GB just for weights. Then add activations, KV cache, and overhead.
A single H100 has 80 GB of memory. Running large models requires:
- Model parallelism: Split the model across multiple GPUs
- Quantization: Reduce precision to fit in less memory
- Offloading: Use CPU memory, accepting slowdowns
Memory bandwidth matters too. Data must flow between memory and compute. High-bandwidth memory (HBM) is expensive but essential.
Power and cooling
AI hardware is power-hungry. An H100 draws 700 watts under load. A rack of 8 draws 5.6 kilowatts. A cluster of thousands draws megawatts.
This creates challenges:
- Data centers need massive power connections
- Cooling removes heat equivalent to space heaters per GPU
- Geographic constraints: need cheap power and cooling capacity
- Environmental impact: significant electricity consumption
Companies locate data centers near cheap, ideally renewable, power. Nuclear and hydroelectric regions are attractive.
Consumer hardware
You don't need data center equipment for all AI work:
- RTX 3090/4090: Consumer GPUs ($1,500-2,000) can run smaller models
- Mac with Apple Silicon: Unified memory enables surprisingly large models
- Cloud GPUs: Rent by the hour without capital investment
Open-source models like Llama 3.3 and Mistral run on accessible hardware. You can't train GPT-5.1 at home, but you can run capable models locally.
The hardware lottery
Progress in AI is partly determined by what hardware can efficiently compute. Attention mechanisms dominate partly because GPUs handle them well. Alternative architectures might be better but worse on current hardware.
This is the "hardware lottery": ideas that fit current hardware win. Ideas that don't fit wait for future hardware or never get explored.
Hardware shapes what's possible. Understanding hardware helps understand why AI develops as it does.