How do I use Gemini's thinking mode?

Set the model to gemini-2.0-flash-thinking-exp and include a prompt that asks Gemini to reason step by step. The model will generate hidden thinking tokens before producing its final answer, substantially improving accuracy on complex reasoning tasks.

What is Gemini's context window size?

Gemini 1.5 Pro and Gemini 2.0 models support a 1 million token context window — enough for entire codebases, long legal documents, or hours of video transcripts. Use it by placing background material before your main question.

What is the difference between a system instruction and a user message in the Gemini API?

System instructions (system_instruction field) set persistent behavior, persona, and rules that apply to the entire conversation. User messages are the actual turn-by-turn requests. Always put constraints, tone, and output format rules in the system instruction so they aren't accidentally overridden.

How to Write Better Prompts for Gemini

What Makes Gemini Different — and Why It Changes How You Prompt

Prompting Gemini the same way you prompt Claude or ChatGPT leaves a significant amount of capability on the table. Gemini was built natively multimodal — meaning it doesn't just process text that describes an image, it processes the actual image alongside text, in the same pass. It has the largest publicly available context window of any production model (1 million tokens). And it runs on Google's infrastructure, which means it can be grounded directly against live Google Search results in a way no other model supports natively.

These aren't marketing claims. They're architectural differences that require different prompting strategies. What follows is a practical guide to each.

💡

Which Gemini are we talking about?

This guide covers Gemini 1.5 Pro (1M token context), Gemini 2.0 Flash (fast, multimodal, search grounding), and gemini-2.0-flash-thinking-exp (dedicated reasoning mode). Most tips apply to all three unless noted.

Multimodal Prompting: More Than "Describe This Image"

Gemini processes text, images, audio, video, and PDFs natively — not via plugins or external preprocessing. The practical implication: you can pass Gemini raw source material and ask it to reason about it directly, rather than first converting everything to text yourself.

What actually works

The most effective multimodal prompts in Gemini follow a simple structure: provide the media first, then ask a specific, narrowly scoped question about it. Broad questions like "What do you see?" get generic answers. Focused questions get analytical ones.

📷

Image + Analytical Question

Don't ask Gemini to "describe" an image. Give it a task. Ask it to compare two product screenshots for UX inconsistencies, identify accessibility issues in a UI, or extract all text visible in a photo.

Here are two versions of our signup flow [image1] [image2]. List every visual inconsistency between them — focus on typography, spacing, and button states.

🎥

Video Timestamp Queries

Gemini can process video files up to 1 hour long. Rather than asking for a general summary, specify timestamps or ask it to identify specific events — e.g., "At what point does the speaker transition from problem to solution?"

Watch this 40-minute lecture [video]. List every claim the speaker makes that could be fact-checked, with the approximate timestamp for each.

📄

PDF Document Reasoning

Upload a PDF directly rather than copying text. Gemini preserves table structure, headers, and layout context that gets lost in copy-paste. Then ask questions that require cross-referencing sections.

Here is our 80-page vendor contract [PDF]. Flag every clause that creates liability for us if the vendor misses an SLA. Cite the section number for each.

✅

Tip: Interleave media and instructions

In the Gemini API (and in Gemini Advanced), you can interleave multiple images with text in a single message. This is useful for before/after comparisons, step-by-step visual walkthroughs, or asking Gemini to reconcile data from multiple charts at once.

Thinking Mode: How to Activate and Direct Gemini's Reasoning

Gemini's thinking mode — available in gemini-2.0-flash-thinking-exp — generates hidden reasoning tokens before producing a final answer. This is similar in spirit to OpenAI's o1/o3 reasoning models and Claude's extended thinking, but with some Gemini-specific nuances in how you prompt it.

When thinking mode actually helps

Thinking mode has a real cost: it's slower and uses more tokens. Use it deliberately, not by default. The scenarios where it consistently outperforms standard Gemini:

Multi-step math and logic problems — especially ones where intermediate steps matter, not just the final answer
Strategic decisions with multiple conflicting constraints — e.g., "given these 6 requirements, rank the 4 implementation options"
Adversarial reasoning — asking Gemini to argue both sides of a position and then adjudicate
Code that requires non-obvious architectural decisions — not just syntax, but design tradeoffs

How to prompt thinking mode effectively

You don't need special syntax to trigger thinking — the model identifier handles it. But the quality of the reasoning depends on how clearly you structure the problem. Two techniques work best:

1. Decompose the problem explicitly. Instead of "solve this," tell Gemini to identify all the sub-problems first, then solve each one, then synthesize. This mirrors how the thinking model allocates its internal reasoning budget.

2. Ask for a confidence judgment. After the answer, append: "On a scale of 1–10, how confident are you in this answer, and what's the biggest assumption it depends on?" Thinking mode surfaces uncertainty more accurately than standard mode — use that signal.

🧠

Worked example: Thinking mode prompt

"Before giving me an answer, break this problem into its component parts and reason through each one. Then give your final answer. Finally, rate your confidence 1–10 and name the single biggest assumption your answer depends on."

System Instructions vs. User Messages in the Gemini API

If you're accessing Gemini via the API (Google AI Studio or Vertex AI), understanding the distinction between system_instruction and the user message is critical. They serve fundamentally different purposes and Gemini treats them differently.

System instructions are set once, persist across the conversation, and define Gemini's role, behavior constraints, output format, and persona. They are not part of the user's conversational turn. Think of them as the operating rules the model runs under throughout the session.

User messages are turn-by-turn requests. They should contain the actual task, data, and context for that specific request — not behavioral rules (those belong in the system instruction and will erode over a long conversation if placed in user turns).

Here's a complete example of a well-structured Gemini API call with a system instruction:

          Gemini API Call — Python
          python
        

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

# System instruction: sets persistent behavior for the entire session
system_instruction = """You are a senior financial analyst specializing in SaaS metrics.
Your outputs must follow these rules:
- Always cite the specific metric you are analyzing (ARR, NRR, CAC, LTV, etc.)
- Present conclusions in a structured format: Observation → Implication → Recommendation
- Flag any data that appears anomalous before drawing conclusions
- Do not speculate beyond what the data supports — state the limit of your analysis explicitly.
Output format: Markdown with clear headers."""

model = genai.GenerativeModel(
    model_name="gemini-2.0-flash",
    system_instruction=system_instruction
)

# User message: the actual task for this turn
user_message = """Here is our Q1 2026 cohort data:
- January cohort: 120 customers, 12-month retention 74%, ACV $8,400
- February cohort: 95 customers, 12-month retention 81%, ACV $11,200
- March cohort: 140 customers, 12-month retention 68%, ACV $7,100

Analyze what these retention trends suggest about our ICP fit.
Flag anything that looks anomalous before drawing conclusions."""

response = model.generate_content(user_message)
print(response.text)

⚠

Common mistake: Putting behavior rules in user messages

Rules like "always respond in bullet points" or "never mention competitor X" placed in user messages will drift over long conversations. Gemini respects system instructions more durably. Keep constraints in system_instruction, not in conversational turns.

PromptSharp

Start Learning with PromptSharp

Model-specific optimization guides for Claude, ChatGPT, and Gemini — plus 30+ battle-tested templates. Works with every major AI model.

Get Started — $29/mo → See All Features

Using Gemini's 1 Million Token Context Window

Gemini 1.5 Pro and 2.0 models support a 1 million token context window — roughly 750,000 words, or the equivalent of a 3,000-page book. Most users treat this as a curiosity rather than a workflow-changing capability. It shouldn't be.

What you can actually fit in 1M tokens

An entire medium-sized codebase (50–100 files)
A year's worth of Slack conversation logs
Every email you've sent in the last 3 months
A full-length novel plus your notes and annotations
20–30 lengthy research papers simultaneously
Hours of meeting transcripts with full attribution

Prompting strategies for long context

Put the question at the end, not the beginning. Gemini (like all transformers) gives more weight to content close to the query. If you front-load 500 pages of documents and then ask your question at the end, performance is significantly better than if you ask first and then provide the material.

Use explicit anchoring. With long documents, tell Gemini exactly what to look for before it reads: "In the following contract, I need you to find every indemnification clause, every SLA with a penalty provision, and every reference to 'force majeure.' Catalog each by section number." This focuses attention before processing begins.

Chunk and compare within a single context. One underused pattern: load multiple documents into a single context and ask Gemini to cross-reference them. Conflicting claims between a sales deck, a contract, and a technical spec are hard to catch manually — trivial for Gemini with all three loaded at once.

Use document structure markers. When pasting multiple documents, separate them with clear delimiters and titles:

          Multi-document context structure
          prompt
        

## DOCUMENT 1: Vendor Contract (signed 2025-11-01)
[full contract text]

## DOCUMENT 2: Vendor SLA Addendum (signed 2026-01-15)
[full addendum text]

## DOCUMENT 3: Recent vendor email thread (2026-04-01 to 2026-04-20)
[email thread]

---
QUESTION: Based on the above, does the vendor's behavior in the email thread
constitute a breach of the SLA in Document 2? Cite specific clauses.

⚡

Context caching for repeat queries

If you're making multiple API calls against the same large document (e.g., a codebase you query repeatedly), use Gemini's context caching feature. It stores the processed representation of your long context so you don't re-process it on every call — dramatically reducing latency and cost.

Google Search Grounding and Workspace Integration

Gemini's native integration with Google Search is one of its most differentiated capabilities — and the one most people don't think to use deliberately.

Google Search grounding

When grounding is enabled (via the API or in Gemini Advanced), the model can query live Google Search results and cite them in its response. This is not a plugin or a separate tool call — it's built into the generation process. The practical effect: you can ask Gemini about recent events, current prices, live documentation, or breaking news and get responses grounded in real-time data rather than training data.

The key prompting insight: you need to signal clearly that recency matters. If your question could be answered from training data, Gemini may not invoke search. Phrases like "as of today," "current as of April 2026," or "check for the most recent information" trigger grounding behavior more reliably.

🔍

Grounding prompt pattern

"As of today (April 2026), what are the current pricing tiers for [Competitor X]'s enterprise plan? I need the most up-to-date information — please search for recent sources rather than relying on training data."

Google Workspace integration prompts

In Gemini for Workspace (Gmail, Docs, Sheets, Meet), the model has access to your actual files and emails — which changes what "good prompting" means. Generic prompts waste this access. Effective Workspace prompts are specific and context-aware:

Gmail: "Summarize all emails from [sender] this month and list any requests I haven't responded to yet" — not just "summarize my email"
Docs: "Review this document against our style guide [attach guide] and mark every sentence that violates it" — not just "improve my writing"
Sheets: "I have a dataset with columns A–G. Write a formula that flags any row where column C is empty but column E is not" — give Gemini the actual structure
Meet: "From this transcript, extract: (1) every decision made, (2) every action item with the assigned person, (3) any open questions that need follow-up" — structure the output you need

Gemini vs. Claude vs. ChatGPT: Key Prompting Differences

The biggest mistake prompt engineers make is treating all LLMs as interchangeable. The models have meaningfully different architectures, training objectives, and response characteristics. Here's what actually differs at the prompting level:

Gemini

Excels with multimodal inputs, massive context, and live data via Search grounding. Benefits from explicit section markers when loading multiple documents. Thinking mode unlocks reasoning depth. Less reliant on structured delimiters than Claude.

Claude (Anthropic)

Highly responsive to XML-style structure tags (<instructions>, <context>, <examples>). Strongest at nuanced writing, following complex multi-part instructions, and maintaining safety constraints under adversarial conditions.

ChatGPT (GPT-4o / o3)

Responds well to role + task framing ("You are a senior X, your task is Y"). Strong at tool use and code execution via the sandbox. o3's reasoning mode is competitive with Gemini thinking for math and logic.

The practical implication: you need a model-specific prompting strategy, not a one-size-fits-all approach. The structural markers that improve Claude's output (XML tags) have little effect in Gemini. The role framing that works in ChatGPT is useful in Gemini but less essential than context placement and explicit task decomposition.

🌟

The fastest way to improve across all three models

Learn what each model uniquely rewards — then build templates optimized for each. Don't edit one generic prompt. Maintain separate Gemini, Claude, and ChatGPT variants of your most-used prompts. The output quality difference is substantial.

Model-Specific Frameworks

Start Learning with PromptSharp — Works with Claude, ChatGPT, Gemini

30+ battle-tested templates, model-specific optimization guides, and a personal prompt library that grows with you. One framework for every model.

Get Started — $29/mo → Compare All Tools