1. System Messages: The First Thing ChatGPT Reads
When you use ChatGPT through the API, every conversation begins with a system message — a hidden instruction block that sets the model's persona, scope, constraints, and output style before the user ever types a word. In the ChatGPT web UI, Custom GPTs surface this same concept through the "Instructions" field. Understanding how to write a good system message is the single highest-leverage skill in ChatGPT prompting.
A well-structured system message has four components:
- Identity: Who the model is playing. Be specific — not "you are a helpful assistant" but "you are a senior financial analyst at a long-short equity hedge fund."
- Scope: What the model is and is not allowed to discuss. Negative constraints ("do not discuss competitors by name") are as important as positive ones.
- Format rules: How outputs should be structured — markdown vs. plain text, response length, whether to use headers, numbered lists, or prose.
- Tone and register: Formal/informal, technical depth, vocabulary level relative to the assumed audience.
Here is a concrete example of a well-structured API system message for a B2B SaaS support assistant:
# Role
You are a senior support engineer for Acme Analytics, a B2B SaaS platform
for financial reporting. You have deep knowledge of the product's REST API,
data pipeline architecture, and common integration failure modes.
# Scope
- Answer questions about Acme Analytics ONLY.
- Do not speculate about features not in the provided context.
- If a question is outside your scope, say so and suggest docs.acme.io.
# Response format
- Keep responses under 300 words unless the user explicitly asks for detail.
- Use numbered steps for any multi-step process.
- Code snippets in triple backtick blocks with language tag.
- Never use marketing language; be direct and technical.
# Tone
Professional but approachable. Address the user's underlying problem,
not just their literal question. If their approach seems wrong,
say so politely and explain why.
# Context injection
User's account tier: {{account_tier}}
Last error seen: {{last_error_code}}
Product version: {{product_version}}
Notice the last section: context injection. When calling the API programmatically, you populate placeholders at runtime — account data, session state, retrieved documents. This is the pattern behind retrieval-augmented generation (RAG). The system message is the container; the injected context is the payload.
Think of the system message as a contract between you and the model. Every ambiguity in the contract produces variance in the output. Tighten the contract, tighten the results.
2. Role-Task-Format Framing
Role-Task-Format (RTF) is the most reliable prompt scaffold for conversational turns — the part the user writes, not the system. It applies whether you're using the API or typing directly into ChatGPT.
- Role — Who should the model behave as? Even in the user turn, naming a role activates relevant knowledge. "As a copy editor..." or "Responding as a regulatory attorney..." shifts the model's defaults measurably.
- Task — What exactly should the model do? Use action verbs: draft, summarize, critique, rewrite, extract, classify. Avoid vague verbs like "help me with" or "tell me about."
- Format — What should the output look like? Specify length, structure, and any constraints. "Return a bulleted list of no more than 6 items" is better than "keep it short."
Weak: "Write something about onboarding emails."
Strong: "As a product-led growth copywriter, draft three subject line variants for a day-3 onboarding email targeting users who signed up but haven't connected a data source. Each variant under 50 characters. Tone: direct, no emojis."
The strong version gives GPT-4o a specific role (PLG copywriter), a concrete task (three subject line variants), a behavioral context (day-3, non-activated users), hard format constraints (under 50 chars), and a tone rule. There is almost no variance left to chance.
3. Few-Shot vs. Zero-Shot in GPT-4o
GPT-4o is substantially better at zero-shot reasoning than earlier GPT versions, which means a well-specified instruction alone often produces high-quality output. But few-shot examples remain the fastest way to transfer an idiosyncratic style the model has never seen.
Use zero-shot when:
- The task is well-defined and the output format is standard (JSON, markdown, code).
- You want maximum instruction-following without the model anchoring to your examples.
- Token budget is tight — few-shot examples are expensive at scale.
Use few-shot when:
- You have a proprietary tone, vocabulary, or style guide the model has never seen.
- The output format is unconventional and hard to describe in text alone.
- You're classifying inputs into categories that don't map to common labels.
- Zero-shot output keeps drifting from your target even with detailed instructions.
For classification tasks, two to three examples per class is usually the sweet spot. More than five examples per class in a chat turn rarely improves accuracy and wastes tokens. If you need more calibration, fine-tuning (via the OpenAI fine-tuning API) outperforms few-shot prompting at scale.
GPT-4o follows explicit formatting instructions more reliably than GPT-4 Turbo. If you previously needed few-shot examples to enforce output structure, try a detailed zero-shot format instruction first — you may be able to remove the examples entirely and save tokens.
Learn Prompting Systematically
PromptSharp's structured curriculum teaches you model-specific techniques for ChatGPT, Claude, and Gemini — with practice exercises and a personal prompt library.
Start Learning with PromptSharp4. JSON Mode and Structured Outputs
One of ChatGPT's most underused API features is JSON mode and the newer Structured Outputs feature, available in GPT-4o and GPT-4o-mini. These eliminate an entire category of prompt engineering problem: getting the model to return machine-parseable output reliably.
JSON mode (older)
Set response_format: { type: "json_object" } in your API call. The model is guaranteed to return valid JSON. However, you must still describe the schema in your prompt — JSON mode only enforces valid syntax, not a specific structure.
Structured Outputs (preferred)
Structured Outputs, introduced mid-2024, go further. You provide a JSON Schema in the API request, and the model is constrained to return output that exactly matches your schema — correct field names, correct types, no hallucinated keys. This is enforced at the token-generation level, not post-hoc.
from openai import OpenAI
import json
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Extract the key entities from the user's text."},
{"role": "user", "content": "Apple reported $94.9B revenue in Q1 2024, up 2% YoY."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "entity_extraction",
"strict": True,
"schema": {
"type": "object",
"properties": {
"company": {"type": "string"},
"revenue_usd_billions": {"type": "number"},
"period": {"type": "string"},
"yoy_growth_pct": {"type": "number"}
},
"required": ["company", "revenue_usd_billions", "period", "yoy_growth_pct"],
"additionalProperties": False
}
}
}
)
The practical implication: with Structured Outputs, you can remove all the "respond only with valid JSON" and "do not include any other text" instructions from your prompts. The constraint is encoded in the API call, not the prompt, which is cleaner and more reliable.
5. Custom GPTs vs. System Prompts: When to Use Each
Custom GPTs (available on ChatGPT Plus and Team plans) are essentially packaged system prompts with optional tool access — web browsing, code interpreter, DALL-E, and custom Actions (OAuth-authenticated API calls). Understanding the tradeoffs helps you choose the right container for your prompting work.
| Dimension | Custom GPT | API System Prompt |
|---|---|---|
| Shareability | Publish to GPT Store, share by link, set visibility | Lives in your codebase; not shareable without deployment |
| Tool access | Web, code interpreter, image gen, custom Actions | Tools defined by you via function calling in API |
| Prompt confidentiality | Weak — users can often extract system prompt via jailbreaks | Strong — system prompt stays server-side |
| Programmability | Static config; no runtime variable injection | Full dynamic injection at request time |
| Cost model | Included in Plus subscription | Per-token API pricing |
| Best for | Personal workflows, team sharing, non-technical users | Production apps, RAG pipelines, dynamic context |
For personal productivity workflows — a research assistant, a writing coach, a code reviewer tuned to your style — Custom GPTs are faster to set up and easier to iterate. For production applications where context must be injected dynamically (user account data, retrieved documents, real-time state), the API with a programmatic system message is the right choice.
6. The Assistant API: Conversation Structure and Thread Management
The OpenAI Assistants API (separate from the basic Chat Completions API) introduces a structured object model for multi-turn conversations: Assistants, Threads, Messages, and Runs.
- Assistant: A configured entity with a system prompt, model selection, and tool access. Created once, reused across many conversations. Analogous to a Custom GPT but accessible via API.
- Thread: A persistent conversation session. Messages accumulate in a Thread; the API handles context window management automatically (truncating older messages as needed).
- Message: An individual user or assistant turn within a Thread. Messages can include text, files, and images.
- Run: A single model invocation against a Thread. You create a Run to trigger the Assistant to respond; polling or streaming the Run gives you the result.
The key prompting implication of this architecture: your system prompt lives on the Assistant object, not in each API call. This means you write it once, version it deliberately, and it applies consistently across all Runs on that Assistant. Changes to the Assistant system prompt take effect on the next Run — there's no per-request override at the system level.
For applications requiring different personas or scopes for different user segments, you need separate Assistant objects — not different system messages in the same Run. Design your Assistant taxonomy before building.
7. ChatGPT vs. Claude: Prompt Differences Worth Knowing
Both GPT-4o and Claude 3.5 Sonnet are capable models, but they have distinct tendencies that affect how you write prompts for each. Understanding the differences saves you from porting prompts directly and wondering why quality degraded.
| Dimension | ChatGPT (GPT-4o) | Claude (3.5 Sonnet) |
|---|---|---|
| Format instruction following | Strong; responds well to explicit structural rules | Strong; also infers appropriate structure from context |
| Default verbosity | Tends toward concise; may under-explain without prompting | Tends toward thorough; may over-explain without length constraints |
| Role activation | Responds well to persona framing in system message | Role framing works, but Claude may push back on constrained personas |
| System prompt placement | System message has highest priority; user turn can override with effort | Claude treats system and user turns as collaborative; less rigid hierarchy |
| Code generation | Strong; good at following code style from examples | Strong; tends to add more inline comments and explanations |
| Refusals | More willing to engage with edge-case requests when context is provided | More conservative on ambiguous content; more likely to clarify before proceeding |
The practical takeaway: prompts that rely on rigid persona constraints and strict output schemas work better with GPT-4o. Prompts that benefit from the model exercising judgment — nuanced writing tasks, open-ended analysis, complex multi-step reasoning — often produce higher-quality results with Claude. The best approach is to test both on your specific task, not to assume one is universally better.
Stop Guessing. Start Learning Systematically.
PromptSharp is structured prompt engineering training for ChatGPT, Claude, and Gemini — with model-specific technique libraries, exercises, and a personal prompt vault that improves with every session.
Start Learning with PromptSharp