What is a model?

A model is a trained neural network, the actual artifact that processes your prompts and generates responses.

What exactly is a "model" in AI?

When people say "GPT-4" or "Claude" or "Llama," they're referring to a specific trained artifact: billions of numerical parameters arranged in a particular architecture, shaped by a particular training process.

The model is the thing that actually does the work. It's not software in the traditional sense (not code that runs step-by-step), but a mathematical structure that transforms inputs into outputs through learned patterns.

How do current models compare?

The frontier models (as of late 2025) include GPT-5.1, Claude 4 (Opus/Sonnet/Haiku), Gemini 2.0, and Llama 3.3. They differ in architecture, training data, and the values instilled during alignment. But they're more similar than different: all are Transformer-based, all trained on web-scale data, all refined through human feedback.

GPT-5.1, released November 2025, introduced adaptive reasoning with "Instant" mode for quick responses and "Thinking" mode for complex tasks.

Anthropic's Claude 4.5 series rolled out over several months: Sonnet 4.5 (September 2025, "best coding model in the world"), Haiku 4.5 (October 2025, fastest and most cost-efficient), and Opus 4.5 (November 2025, most advanced with agentic abilities and enterprise focus). Opus 4.5 is described as Anthropic's most robustly aligned model, with improved resistance to prompt injections.

Gemini 2.0 brings improved agentic capabilities and native multimodal understanding.

Approximate Model Sizes (Parameters)

GPT-3 (2020)

175.0B

Llama 3.3 70B

70.0B

Claude Sonnet 4.5

200.0B (estimated)

Llama 3.3 405B

405.0B

GPT-4 (MoE)

1.8T (estimated)

GPT-5.1

2.5T (rumored)

parameters

Why are parameter counts often secret or estimated?

Frontier labs like OpenAI and Anthropic don't publish exact parameter counts for their latest models. Reasons vary: competitive advantage, avoiding capability-focused marketing, or because parameter count alone is misleading.

GPT-4 is rumored to be a "mixture of experts" (MoE) architecture with ~1.8 trillion parameters total, but only ~220 billion active for any given query. This makes direct comparison with dense models like Llama tricky.

The industry has shifted from "biggest model wins" to more nuanced metrics: efficiency, reasoning capability, instruction following, and safety.

What actually differs between models?

Capability Comparison (Illustrative)

Coding

Math

Writing

Reasoning

Instruction Following

Beyond raw capability scores, models differ in personality:

GPT-5.1 offers two modes: "Instant" for quick, conversational responses, and "Thinking" for complex reasoning tasks. The model adapts its reasoning depth to the problem.
Claude 4.5 (Opus/Sonnet/Haiku) emphasizes nuance, safety, and coding excellence. Sonnet 4.5 claims "best coding model" status. Opus 4.5 leads in agentic tasks and enterprise features, including Microsoft Office integration. Haiku 4.5 offers near-frontier performance at dramatically lower cost.
Gemini 2.0 integrates deeply with Google's ecosystem and excels at agentic tasks: autonomously planning and executing multi-step actions.
Llama 3.3 is open-weights, meaning you can run it yourself, fine-tune it, and inspect it, with all the flexibility and responsibility that implies.

What is a "model family"?

A single training run produces one base model, but labs release multiple sizes and variants:

Size variants: Llama 3.3 comes in 8B, 70B, and 405B versions. Smaller models are faster and cheaper; larger ones are more capable.
Instruct/Chat variants: Base models just predict text. Instruct variants are fine-tuned to follow instructions and have conversations.
Reasoning variants: Models like GPT-5.1's "Thinking" mode are optimized for complex, multi-step reasoning over speed.
Specialized variants: Code models, vision models, long-context models, each optimized for specific use cases.

The base model is rarely what you interact with. ChatGPT uses an instruct-tuned, safety-aligned variant of GPT-5.1, not the raw base model.

How do models improve over time?

New model releases typically improve through:

More data: Larger and higher-quality training corpora.
Better architecture: Attention improvements, longer context, more efficient computation.
Refined alignment: Better instruction following, fewer harmful outputs, more helpful responses.
Post-training improvements: RLHF, constitutional AI, and other techniques that shape behavior after initial training.

A model isn't a static artifact. Even without retraining, providers continuously improve the systems around it: better prompts, retrieval augmentation, and safety filters.

Sources & Further Reading

🔗 Article

GPT-5.1 in ChatGPT

OpenAI · 2025

🔗 Article

Claude Opus 4.5

Anthropic · 2025

🔗 Article

Introducing Llama 3.3

Meta AI · 2025

🔗 Article

Gemini 2.0

Google DeepMind · 2025