1. Zero-shot vs few-shot, in one minute

Zero-shot prompting means you give the model an instruction and nothing else — no examples. You're trusting it to know what "good" looks like from its training: "Classify this review as positive, negative, or neutral." Most everyday prompting is zero-shot, and for common tasks it works fine.

Few-shot prompting means you include a handful of worked examples — input-output pairs — right inside the prompt, then give it the real input last. The model doesn't get re-trained; it just copies the pattern it sees in your examples. This is sometimes called in-context learning: the examples teach the model, for this one request, exactly what shape the answer should take. One-shot is the same idea with a single example.

The whole question of this guide is when the examples earn their place. Examples cost tokens and effort, and a sloppy example actively hurts — so the goal isn't "always add examples," it's knowing the moment they flip a result from unreliable to reliable.

The mental model: zero-shot tells the model the rule. Few-shot shows the model the rule being applied. When the rule is easy to state, telling is enough. When "good" is easier to demonstrate than to describe, showing wins.

2. The same task, both ways

Take a concrete job: tag incoming support messages with a category and an urgency. Here's the zero-shot version — pure instruction.

zero-shot.txt
# ZERO-SHOT Categorize this support message as Billing, Bug, or Feature Request, and rate urgency High / Medium / Low. Message: "The export button does nothing since today's update."

You'll get a reasonable answer — but the format is a gamble. One run returns a paragraph, the next a JSON object, the next a different label like "Technical Issue" that you never asked for. The model is guessing at your house style because you never showed it. Now the few-shot version, with two examples doing the work:

few-shot.txt
# FEW-SHOT (2 examples, then the real input) Categorize each support message. Reply in exactly this format: category | urgency Message: "I was double-charged this month." → Billing | High Message: "Would love a dark mode someday." → Feature Request | Low Message: "The export button does nothing since today's update." →

The two examples pin down three things at once: the exact label set (no surprise "Technical Issue"), the output format (category | urgency, every time), and the urgency judgment — a double-charge is High, a someday-wish is Low, so a broken core feature lands High without you spelling out the rule. The model copies the demonstrated pattern instead of inventing one. Same model, far more consistent output — because you showed instead of only told.

Why this works: the model finishes patterns. When the last line of your prompt is an unfinished example, it completes it in the shape of the examples above. That's the entire mechanism behind few-shot — and why your examples must be exactly right.

3. The decision rule: show or just tell?

Don't reach for examples by default — they cost tokens, and a bad one teaches the wrong thing. Start zero-shot, then add examples only when the task hits one of four triggers.

Trigger 1 · Format

Output must match a shape

You need a precise schema, table, label set, or layout every single time — JSON keys, a fixed CSV, a strict tag vocabulary. One or two examples lock the format far more reliably than describing it in prose.

Trigger 2 · Style

"Good" is easier to show than say

Tone, voice, and house style resist description. If you'd struggle to write the rule but can paste a perfect example, that's a few-shot job — the example carries the style you can't articulate.

Trigger 3 · Edge cases

The model keeps repeating a mistake

It mishandles one recurring case — abbreviations, negation, a boundary call. Add an example that demonstrates the right handling of exactly that case, and the error usually disappears.

When to stay zero-shot

Keep it zero-shot — and just tighten the instruction — when any of these is true:

  • The task is common and well-understood. Summarize, translate, rewrite, simple classification — modern models already nail these. Examples add tokens for no gain.
  • You can't write good examples. A wrong or off-pattern example is worse than none — the model will faithfully copy the flaw. No clean example? Don't fake one.
  • You want the shortest, cheapest prompt. For high-volume API calls, a tight zero-shot instruction that already works beats paying for example tokens on every request.
  • The instruction itself isn't sharp yet. Fix the wording first — a clear role and explicit constraints often solve the problem before examples are needed.

The fourth few-shot trigger is the catch-all: the task is niche or domain-specific — your internal jargon, an unusual labeling scheme, a format the model has never seen. There, examples aren't optional; they're how the model learns your world for the length of the prompt.

4. How many examples — and which ones

More examples is not better. Past a handful, extra examples mostly add cost, and they can even bias the model toward whatever pattern dominates your set. Here's the practical ladder.

1️⃣

One-shot

A single example. Great for locking format fast — the model now knows the shape. Weak at teaching variation, since it's seen only one case.

Use when: you just need the output structure pinned down.
3️⃣

Two to five (the sweet spot)

Enough to show the common case and a tricky one, without bloating the prompt. This is where few-shot earns its keep for most real tasks.

Use when: format + judgment both matter. Default here.
🛑

Ten-plus (rarely)

Diminishing returns and rising cost. If you genuinely need many examples, that's often a signal to fine-tune or restructure the task instead.

Use when: almost never for prompting. Reconsider the approach.

Which examples matter more than how many. Two principles do most of the work:

  • Cover diversity, not repetition. Three different cases — including a boundary or tricky one — teach more than ten near-identical ones. The model generalizes from the range it sees.
  • Make every example flawless. The model copies your examples literally, mistakes included. One mislabeled example will reliably reproduce that exact error in the output. Proofread examples harder than you proofread the instruction.

And mind the order: models weight the most recent examples a little more heavily, so put your cleanest, most representative example last — right before the real input.

5. Zero-shot vs few-shot, side by side

DimensionZero-shotFew-shot
Prompt size & costShortest — instruction onlyLarger — every example costs tokens on every call
Format controlInconsistent; the model guesses your styleTight — examples pin the exact shape
Best forCommon, well-understood tasksSpecific formats, house style, niche or edge cases
Main riskDrift in format and label vocabularyA bad example teaches the wrong pattern
Effort to writeLow — one clear instructionHigher — you must craft and verify examples

The two aren't rivals so much as a progression: start zero-shot, and graduate a prompt to few-shot the moment it starts drifting in format, missing your style, or fumbling the same edge case twice. If even good examples don't make it reliable, the task may want a chain of focused prompts instead of one bigger prompt.

6. The 30-second recap

  • Zero-shot = instruction, no examples. Shortest and cheapest; great for common tasks the model already nails.
  • Few-shot = a few input-output examples in the prompt, then the real input. The model copies the demonstrated pattern.
  • Add examples when you need a precise format, a style you can show but not describe, a fix for a recurring edge case, or for niche/domain tasks. Otherwise stay zero-shot.
  • Two to five examples is the sweet spot — diverse over repetitive, every one flawless, cleanest example last.
  • The trap: a wrong example is worse than none. The model reproduces the flaw exactly.

The fastest way to internalize this isn't reading it once — it's practicing until "show or just tell?" is an instant call. That's what PromptSharp is built for: one short lesson a day, real before/after examples, until sharp prompting comes out of your fingers without thinking.

Frequently asked questions

What is the difference between few-shot and zero-shot prompting? +
Zero-shot prompting gives the model an instruction with no examples and relies on what it learned in training. Few-shot prompting includes a handful of worked input-output examples inside the prompt so the model copies the exact pattern. Zero-shot is shorter and cheaper; few-shot is more reliable when the task has a specific format, style, or edge-case behavior you need matched.
When should I use few-shot instead of zero-shot prompting? +
Use few-shot when the output needs a precise format or schema, when "good" is a style you can show but struggle to describe, when the model keeps making the same edge-case mistake, or when the task is niche or domain-specific. Stay zero-shot when the task is common, when you want the shortest prompt, or when you can't write a genuinely clean example.
How many examples should a few-shot prompt include? +
Start with two or three. One example (one-shot) locks format but shows little variation; three to five cover common and edge cases without bloating the prompt. More rarely helps and costs tokens. Favor diverse examples — including a tricky one — over many similar ones, and put your cleanest example last, since models weight the most recent examples slightly more.
Does few-shot prompting work with ChatGPT, Claude, and Gemini? +
Yes. Few-shot and zero-shot are model-agnostic — they're just how you structure the prompt. They work with Claude, ChatGPT, Gemini, and any other LLM, in a chat window or via the API. For more techniques, see our prompt engineering guide and how to write better prompts.