How you phrase requests shapes what you get back. Prompt engineering is the art of communicating with AI systems to get useful results.
Why do small wording changes produce dramatically different results?
LLMs are sensitive to phrasing. The same question asked two ways can yield different quality answers. Adding a single sentence can transform a mediocre response into an excellent one.
This sensitivity is a feature, not a bug. The model uses every token for context. Change the tokens, change the context, change the predictions.
Prompt engineering is the practice of crafting inputs to get better outputs. It's part art, part science, and essential for getting value from LLMs.
The basics: be clear and specific
Vague prompts get vague responses. Specific prompts get specific responses.
Weak: "Tell me about dogs"
Better: "Explain how dogs were domesticated from wolves, focusing on the timeline and genetic evidence"
Weak: "Write code for a website"
Better: "Write a Python Flask endpoint that accepts POST requests with JSON containing 'email' and 'message' fields, validates them, and returns a success response"
Specificity helps the model focus. It's not reading your mind; it's pattern-matching on your words.
Provide context
The model only knows what's in the context window. Background information helps:
"You are helping a beginner programmer" shapes the explanation level
"This is for a formal business email" shapes the tone
"The user is a domain expert in biology" shapes assumed knowledge
Don't assume the model knows your situation. State it explicitly.
Few-shot prompting: show examples
Instead of describing what you want, show it.
Convert these sentences to formal tone:
Casual: "Hey, can you send that over?"
Formal: "Would you please forward the document at your earliest convenience?"
Casual: "That's not gonna work for us."
Formal: "Unfortunately, that approach does not meet our requirements."
Casual: "Let's grab lunch and hash this out."
Formal: [model completes]
Examples communicate format, style, and expectations more precisely than description. Describing a format is ambiguous. Does "casual" mean very casual or slightly casual? An example shows exactly what you mean. The model pattern-matches your examples and continues the pattern.
Chain-of-thought: ask for reasoning
Complex problems benefit from explicit reasoning. Adding "think step by step" or "explain your reasoning" often improves accuracy.
Without chain-of-thought:
Q: If a ball costs $1.10 and the ball costs $1 more than the bat,
how much does the bat cost?
A: $0.10 โ (Common wrong answer)
With chain-of-thought:
Q: [same question] Think step by step.
A: Let me work through this. If the bat costs x, then the ball costs x + 1.
Together they cost $1.10, so x + (x + 1) = 1.10.
That's 2x + 1 = 1.10, so 2x = 0.10, so x = $0.05. โ
Why does chain-of-thought work?
Two hypotheses:
Computation through tokens: The model does more computation when generating more tokens. Reasoning steps are where thinking happens. Skip them and you skip the thinking.
Self-consistency: Stating intermediate steps makes errors visible to the model. It can catch inconsistencies as it generates.
Whatever the cause, the effect is real. Complex tasks benefit from explicit reasoning, even when you don't need to see the steps yourself.
Structure your requests
Clear structure helps the model parse your intent:
## Task
Summarize the following article.
## Requirements
- Maximum 3 paragraphs
- Include key statistics
- Maintain neutral tone
## Article
[paste article here]
Headers, bullet points, and explicit sections reduce ambiguity. The model knows exactly what's task, what's constraint, and what's input.
Iterate and refine
Prompt engineering is often iterative. Try something, see the result, adjust. Common refinements:
Add constraints if output is too broad
Remove constraints if output is too narrow
Provide examples if format is wrong
Ask for reasoning if accuracy is low
Adjust length instructions if too long/short
There's rarely a perfect prompt on the first try. Treat prompting as a conversation, not a one-shot command.
The limits of prompting
Prompting can't make a model do what it fundamentally can't do. If the capability isn't in the model, no prompt will unlock it.
Prompting also can't guarantee consistency. Even with a perfect prompt, temperature adds variation. Different runs give different results.
Think of prompting as steering, not programming. You're guiding a capable system, not specifying exact behavior. Good prompts make desired outputs more likely, not certain.