How are LLMs customized?
Fine-tuning adapts a pre-trained model to specific tasks or behaviors. It's how base models become assistants, coders, or domain specialists.
Why doesn't a pre-trained model just work as an assistant?
Pre-training teaches a model to predict text. Give it "The capital of France is" and it predicts "Paris." But give it "What is the capital of France?" and it might continue with "This question is often asked by students who..."
The model learned to complete text, not to answer questions. Turning a text predictor into a useful assistant requires additional training: fine-tuning.
What is fine-tuning?
Fine-tuning continues training on a smaller, specialized dataset. The pre-trained weights are the starting point. Additional training adjusts them toward new behavior.
Types of fine-tuning:
- Instruction tuning: Train on (instruction, response) pairs. The model learns to follow instructions.
- RLHF: Train on human preferences. The model learns what responses humans prefer.
- Task-specific tuning: Train on examples of a specific task (summarization, translation, coding).
- Domain adaptation: Train on domain-specific text (medical, legal, scientific).
Fine-tuning is much cheaper than pre-training. You're adjusting existing capabilities, not building them from scratch.
The instruction tuning revolution
A key insight: if you fine-tune on diverse instruction-response pairs, the model learns to follow instructions generally, not just the specific ones in training.
Early models needed prompt engineering tricks. Instruction-tuned models just do what you ask. "Summarize this article" works. "Write a poem about dogs" works. The model generalized from training examples to novel instructions.
This is why ChatGPT felt so different from GPT-3. Same base model, but instruction tuning transformed the interface.
LoRA: efficient fine-tuning
Full fine-tuning updates all parameters. For a 70-billion parameter model, that means storing and updating 70 billion numbers. Expensive.
LoRA (Low-Rank Adaptation) adds small trainable matrices alongside the frozen original weights. Only these small additions are trained. The original model stays fixed.
Benefits:
- Much less memory (training millions instead of billions of parameters)
- Can store multiple LoRAs for different tasks
- Combine or swap LoRAs easily
- Nearly matches full fine-tuning quality for many tasks
LoRA democratized fine-tuning. Individuals can now customize large models on consumer hardware.
The fine-tuning stack
A modern assistant model goes through multiple fine-tuning stages:
- Pre-training โ Base model (text predictor)
- Instruction fine-tuning โ Instruction follower
- RLHF / preference tuning โ Helpful, harmless assistant
- Optional task-specific tuning โ Specialized tool
Each stage refines behavior. The final model reflects accumulated choices about what "good" means.
Limits of fine-tuning
Fine-tuning can shape behavior but can't create capabilities from nothing. If the base model doesn't understand chemistry, no amount of chemistry examples will make it a chemistry expert. Fine-tuning unlocks and redirects; it doesn't fundamentally add.
This is why base model quality matters so much. Fine-tuning is powerful, but it's ultimately limited by what pre-training provided.