What is prompt engineering?

Prompt engineering is the discipline of writing inputs to large language models so they produce useful, consistent outputs. It covers role, context, task, format, and constraints.

Was ist Prompt Engineering?

Prompt Engineering ist die Disziplin, Eingaben für KI-Sprachmodelle so zu formulieren, dass sie nützliche, konsistente Antworten geben. Sie umfasst Rolle, Kontext, Aufgabe, Format und Einschränkungen.

What is few-shot prompting?

Few-shot prompting gives the model two to five worked examples of input plus desired output, then asks it to follow the same pattern for new input. The examples teach the model the format, tone, and judgment criteria you want.

What is chain of thought prompting?

Chain of thought prompting asks the model to write out its reasoning steps before giving an answer. Adding 'let's think step by step' often improves accuracy on multi-step problems.

What is a system prompt?

A system prompt is the persistent instruction set at the start of a conversation that defines role, voice, forbidden behaviors, and quality rules. The model follows it for the entire session.

Prompt Engineering Guide · Few-Shot, Chain of Thought, RAG, RLHF

TL;DR

Prompt engineering is writing inputs to large language models so they produce reliable, useful outputs · empirical, not theoretical. This guide gives you the CRAFT framework, a five-minute hands-on, and twenty prompt patterns (sales, content, outreach · gathered on the Prompt-Examples sub-page).

Underlined terms reveal a tooltip on hover; a click opens the full glossary entry. Bilingual via the EN/DE toggle in the top-right. Where sources conflict, empirical evidence beats convention.

Skip ahead: CRAFT framework · Prompt examples · Topic grid.

The four tools that matter in 2026 · ChatGPT, Claude, Perplexity, Grok

Three tools cover ninety percent of prompt-engineering work in 2026. Pick one as your primary, switch on workflow boundaries.

ChatGPT (OpenAI, GPT-5.5 since May 2026; previously GPT-5.4) is the breadth-first choice · widest plugin ecosystem, native image and voice, and the default for non-engineers. Pick ChatGPT for general drafting, brainstorming, and any workflow where speed matters more than precision.

Claude (Anthropic, Opus 4.7) is the precision-first choice · stronger on long-context reasoning, structured XML prompts, and code review. Claude Memory rolled out to all users in March 2026, closing the persistence gap with ChatGPT. Pick Claude for code, careful reasoning, or any task where the answer must hold up under inspection.

Perplexity is the research-first choice · routes prompts through live web search with citation footnotes by default. Pick Perplexity when you need cited sources, recent data, or a sanity check on facts that the training-cutoff models would only guess at.

Grok (xAI, Grok 4) is the real-time sidekick · the only LLM with direct X access, best for breaking news, social trends, and "what's happening right now" queries. Pick Grok when recency matters more than citation quality · Perplexity stays superior for cited research.

Three more frontier models complete the field: Gemini 3.1 Pro (Google · native Workspace integration, very long context window), Mistral Large 3 (Mistral AI · European jurisdiction, 675B-parameter MoE since December 2025), and Llama 4 (Meta · open weights for self-hosting). They share the same basic LLM architecture; differences are tone, factual reliability, and tool support.

Your first prompt in five minutes

Type a question, get an answer, change one thing, send again. That iteration loop is the whole discipline.

A prompt is the instruction you send to a language model; the model returns the statistically most likely useful continuation. Three mechanics matter:

Context is what the model sees. It does not know your product, your tone, or your customer until you write it in.
Memory is per-conversation, not persistent. A new tab starts at zero · except for opt-in features like Claude Memory or ChatGPT Memory.
The model wants to please. It defaults to plausible-sounding answers rather than admitting gaps. Tell it explicitly:
```
If unsure, flag rather than speculate.
```

Now try one CRAFT-structured prompt. Each line names one slot the model would otherwise guess:

You are an experienced B2B sales copywriter.            (Role)
I work at a SaaS company selling project-management
software to mid-market teams in DACH.                   (Context)
Write three subject-line variants for a cold email      (Action)
opening.
Each variant: maximum eight words, no exclamation       (Format)
marks, no em-dash.
Aim for a twenty-five percent open rate from agency     (Target)
owners in B2B services.

Send it. Read the output. Change one slot · for example tighten the Role, swap the Target, or replace the Format constraint with a different one. Send again. Repeat until the output is publishable, then save the working version as a template. The next section unpacks each CRAFT slot.

Failure modes: too tight, too specific, just right

A prompt can fail in two opposite directions. The skill is finding the middle.

The trap is in both directions. With too little guidance, the model falls back on whatever is statistically average for your phrasing: copy that could have been written for any company in your industry. With too much specification, the model stops reasoning and starts ticking boxes. The result satisfies every rule on paper and misses the actual point, because the goal got buried under the constraints. The sweet spot is enough context to orient the model without micromanaging the path.

The prompt

Illustrative composite, behaviour validated against our own Opus 4.8 runs (Corset n=5, Sweet/Anchor n=1).

If you do need an example, pull it from a different domain. A reference output for a pellet-stove product page will not lock the model onto pellet-stoves when the real job is decluttering services. Same-domain examples almost always over-anchor.

CRAFT · the five-slot framework

Context, Role, Action, Format, Target. Five slots that close the gaps the model would otherwise fill with statistical default.

C · Context

Context is the situational background: who you are, what your product does, who reads the output, and which constraints exist. The model treats context as the highest-priority signal in the prompt. Concrete beats abstract every time · compare a specific Context-slot:

SaaS company in DACH selling to mid-market teams

versus a vague one:

my company

The first activates a different set of patterns than the second. Context belongs at the top of the prompt, before role and action, because it shapes how the model interprets the rest.

R · Role

Role assigns the model a persona that controls tone, perspective, and stylistic register. A role does not control factual accuracy; it controls voice. Example:

You are an experienced B2B sales copywriter who writes for skeptical agency owners.

That single line pulls the model out of generic-mode and into a domain-tuned voice. Keep roles specific and one-dimensional · stacking three roles produces averaged output, not the sum of their strengths.

A · Action

Action is the verb that names the task: write, summarise, compare, classify, draft. The clearer the verb, the cleaner the output. Two different actions produce two different artefacts:

Brainstorm headlines.

versus:

Choose the strongest of these headlines and explain why.

If you mean both, write both, in order. A prompt with no clear action verb is a prompt without a goal.

F · Format

Format is the structural shape of the output: length, layout, allowed characters, forbidden phrases. Three subject-line variants, max eight words each, no exclamation marks. A markdown table with these four columns. A numbered list of exactly five items. Format constraints turn the model from open-ended generator into structured tool. The tighter the format, the less editing afterwards.

T · Target

Target names the effect you want on the reader: open rate, click-through, scheduling a call, internal alignment, agreement. Target is what separates a useful prompt from a vanity prompt. A vanity Action produces a vanity output:

Write three subject lines.

A Target-aware Action produces output aimed at a specific outcome the model can optimise for:

Write three subject lines that get agency owners to open the email and forward it to a colleague.

The five slots are not an equal checklist. Each controls a different layer, and not every one is as strong as it is said to be. Click a slot to see what it really controls and how well that is evidenced.

overrated, weakstrong evidence, effective

Lever Fine-tuning

controls style and formcontrols substance

Findings from our own tests (Opus 4.8) plus current research (2025/26); the source sits in each slot detail.

Reliability · what the prompt fixes, what the run decides

You wrote a clean CRAFT prompt. Run it five times and some properties of the output come back nearly identical, others swing wildly. The difference is not noise, it is measurable and it splits cleanly.

The same prompt does not give you the same output, even at temperature 0. But that variability is not uniform. We did not stop at a single run: six versions of the prompt (the full one and five with a single building block removed), across two models (Opus 4.8 and Haiku 4.5), five runs each, sixty real outputs in all, with four properties measured on every one. Two of them, information density and rhythm, barely move from run to run: the prompt controls them. The other two, formulaicity and how the text addresses the reader, are a lottery at five runs: the run decides, not the prompt.

On top of that sits a second, stronger signal: a model fingerprint. Across 9 of 9 test conditions, Haiku writes more predictably (lower surprisal) and more fragmented (higher burstiness) than Opus, with very large effect sizes (d_z 1.7 to 2.8). That gap is reproducible where the run-to-run lottery is not. The practical lesson: pin down what the prompt actually controls, and stop fighting the run for what it does not.

The prompt sits on top, full width. Output and assessment sit side by side below, because they move together: switch the model or page through the runs, and text and measurement jump in sync.

The prompt

Click a building block to remove it.

Output, real run

Assessment across the 5 runs

Position = model-typical, width = run spread

Both models are real runs (Opus 4.8 high effort, Haiku 4.5 standard, thinking off). The prompt and outputs are in German, the measured corpus; the effects are language-agnostic. Evidence: Haiku is more predictable (lower surprisal, d_z=1.72) and more fragmented (higher burstiness, d_z=2.76), 9 of 9 conditions each. Non-determinism even at temperature 0 (arXiv 2408.04667).

PAS · Problem, Agitate, Solve

A content pattern that sits inside CRAFT · use CRAFT to structure the prompt, use PAS to structure the sales copy that comes out.

PAS (Problem · Agitate · Solve) is a content pattern that sits inside CRAFT, not next to it. Use CRAFT to define how the prompt is structured; use PAS to define what the resulting sales copy is structured around.

P · Problem

The Problem stage names the buyer's current situation as a friction they recognise: a deadline they keep missing, a process that breaks under load, a number that won't move. State it concretely and without judgment · the reader confirms "yes, that's me" before any solution lands. In the CRAFT shell: the Context slot describes the buyer's reality so the model can pull the right pain out.

A · Agitate

The Agitate stage makes the pain visceral by surfacing its second-order costs: lost revenue, escalating effort, internal credibility damage, the workaround that becomes the permanent process. Agitation is not exaggeration · it names the downstream consequences the buyer rationalises away. In the CRAFT shell: the Target slot describes which consequence to surface most sharply.

S · Solve

The Solve stage introduces the offer as a resolution to the named pain · concrete, scoped, and immediately actionable. The strongest Solve sentences answer three implicit reader questions in one line: what changes, how quickly, with what evidence. In the CRAFT shell: the Action slot tells the model to write a PAS-structured cold email; the Format slot constrains length so Solve does not drift into feature-listing.

PAS works because buyers move primarily to avoid pain, secondarily to capture gain · a PAS-aware prompt builds that mechanism into every output.

Hallucinations · why models invent and how prompts reduce it

A hallucination is an output that sounds confident and is factually wrong. The cause is structural: an LLM produces the statistically most likely continuation, not the verified-true continuation. When the training pattern runs out · niche facts, recent events, internal data · the model fills the gap with the next most plausible-sounding text. Confidence and accuracy are uncorrelated.

Three prompt-level reductions. First, give the model explicit permission to flag gaps:

If unsure, say so rather than guess.

Second, require source citations for any factual claim:

Each number must include a source URL.

Third, verify load-bearing numbers manually before publishing. The failure domains worth knowing in advance are statistics, dates, legal citations, and any niche fact outside the model's training window · these are the places where the next-token machine sounds most confident and is most likely to be wrong. The cluster post KI-Halluzinationen covers detection patterns, common failure domains, and how grounding via RAG changes the picture.

Prompt versioning · why prompts behave like code

Working prompts drift. A prompt that produced clean output last month produces noise this month because the model version changed, the workflow expanded, or someone tweaked a constraint and forgot to write it down. The fix is structural, not heroic: treat prompts like code · version them, diff them, attach them to the workflow that consumes them.

Three minimum mechanics: (1) semantic versioning (a 0.x prompt is experimental, a 1.x prompt is production), (2) regression tests against golden outputs (a small set of canonical inputs whose expected output you check on every model upgrade), and (3) a single source of truth that the workflow reads, not a copy in someone's Notion. Without these three, drift accumulates silently until a stakeholder asks why last quarter's emails were better.

The cluster post Prompt-Versionierung covers concrete patterns and why a small Prompt Management System replaces the typical drift-prone Notion document.

Advanced workflows · agents, custom GPTs, vibe coding, automation

Four directions in which a single working prompt scales into a production workflow.

A working prompt is the seed. Most production AI work is what grows around that seed · chaining prompts into a flow, packaging them as a reusable assistant, scaffolding them with tools, or embedding them into a workflow that runs without you. The four directions below cover the patterns that scale the cleanest.

Build an AI agent · move from one prompt to a chained workflow with human-in-the-loop checkpoints. Start with the smallest possible MVP, then add tools and memory only where the workflow demands it.
Build a custom GPT · package a working prompt as a reusable assistant with its own system prompt, knowledge files, and conversation style. The setup is mostly system-prompt engineering plus careful scope.
Vibe coding with AI · use Cursor, Claude Code, or Lovable to build working apps without writing the code yourself. The skill is prompt and architecture; the model writes the syntax.
AI automation in business · prompts as the foundation under company workflows that previously required either a person or a brittle macro. The trade-off is observability versus throughput.

Advanced techniques · Chain-of-Thought, Self-Consistency, Calibration

Three techniques that raise output quality on multi-step or load-bearing tasks. Use them when a plain prompt is too thin.

Chain-of-Thought prompting

Ask the model to write its reasoning steps before the answer. Append a single phrase at the end of a multi-step prompt:

Think step by step.

That phrase measurably improves accuracy on math, logic, and structured-analysis tasks. The model uses each generated step as context for the next, which turns one-shot prediction into a small internal monologue. The cost is more tokens and slightly slower output; the benefit is fewer plausible-sounding wrong answers. On 2026 reasoning models the picture shifts, see Reasoning effort below.

Self-Consistency

Run the same prompt multiple times with non-zero temperature, then pick the answer that appears most often. This works because hallucinations tend to be inconsistent · the model invents differently each time · while correct reasoning paths converge. Self-Consistency is the cheapest reliability lever on any prompt that returns a discrete answer: a classification, a number, a yes/no, a category. For continuous output (an essay, a paragraph) it's less useful · majority-vote breaks down when there is no shared "answer" to vote on.

Calibration

Calibration is the discipline of pulling the model out of overconfident mode. Three patterns. First, ask for a confidence rating with every answer:

Rate your confidence on a 1-5 scale and explain the rating.

Second, ask for the strongest counter-argument before the conclusion:

What is the best objection to this answer?

Third, ask the model to list the assumptions it made:

Which assumptions did you treat as given?

Calibration adds tokens but exposes the shape of the model's certainty · which is exactly the information you need to decide whether to trust the output.

Reasoning effort: more thinking is not more better

"Think step by step" was the 2022 playbook. On 2026 reasoning models the lever is a parameter, not a phrase, and accuracy versus reasoning effort behaves like an inverted U: it helps up to a budget, then it hurts. Where the peak sits depends on the difficulty.

Pick your next move · nine topics

Each tile names a concrete pain · pick the one that matches what you are actually stuck on.

The cluster posts below each take one of this guide's high-level ideas and unpack it into a thousand-plus-word working post. Read the tile labels first; the rest of the page is reference material that you can return to when one of those nine pains becomes the next thing to fix in your workflow.

Twenty prompt patterns, ready to copy · Sales, content, outreach, universal · with one fully-filled showcase per category.
Your prompts drift when nobody is looking · Why prompt management is the missing layer between Notion and production.
Why the context window is not memory · Context engineering, the stateless context window, and why context rot crowds out the goal.
The model sounded sure · and it was wrong · Causes of hallucinations and the three prompt-level levers that reduce them.
From one prompt to a working agent · Step-by-step from use case to MVP with human-in-the-loop checkpoints.
Package a prompt as a reusable assistant · Custom GPT setup with system-prompt templates and scope discipline.
Build apps without writing the code yourself · Cursor, Claude Code, and Lovable in practical comparison.
Prompts as workflow infrastructure · Where automation pays off and where observability becomes the real bottleneck.
Which model fits which job in 2026? · Claude, ChatGPT, Gemini, and Perplexity compared per use case.
Term unclear · check the glossary · Fifty plus prompt-engineering terms with bilingual definitions.

FAQ · the most-asked questions

Brand questions, vendor comparisons, taxonomy, and the full capability catalogue. Refer back as needed · none of the foundation chapters depends on these. The questions below are written to also work as standalone snippets for sharing or for AI-search citation.

What is ChatGPT?

ChatGPT is a conversational AI assistant developed by OpenAI that understands and answers requests in natural language. It was released in November 2022 and runs on OpenAI's GPT model series; in 2026 the underlying model is GPT-5.5 (since May 2026; prior default GPT-5.4).

ChatGPT is a product, not the technology category itself. The category is AI; ChatGPT is one product inside it. Direct competitors include Claude (Anthropic), Gemini (Google), Mistral (Mistral AI), Perplexity, and Microsoft Copilot · all of them are LLMs. They work on the same basic idea (type a question, get a written answer) but differ in tone, factual reliability, available tools, and price.

Does it work in German? Yes. ChatGPT understands and answers German with the same fluency as English. The same applies to Claude and Gemini. The output quality in German is slightly below English for very stylistic tasks (because the training data is 60–90% English), but for everyday work the difference is small.

What is Generative AI?

Generative AI describes systems that produce new content · text, images, code, audio, video · instead of merely classifying or sorting existing data. The output is generated token by token (for text) or pixel by pixel (for images), based on the patterns learned during training.

The split by output type:

Text-generative: ChatGPT, Claude, Gemini, Mistral, Perplexity.
Image-generative: Midjourney, DALL-E (OpenAI), Stable Diffusion, Flux.
Code-generative: GitHub Copilot, Cursor, Claude Code.
Audio- and video-generative: ElevenLabs (voice), Suno (music), Sora (OpenAI, video), Veo (Google, video).

This guide is about text generation for sales, marketing and content work. The patterns transfer to other domains, but the templates assume text output to humans.

Which AI tools matter in 2026?

The relevant Generative AI tools in 2026 are ChatGPT, Claude, Gemini, Perplexity and Microsoft Copilot · each with a free and a paid tier. For images, Midjourney is the strongest dedicated option. You do not need all of them; start with one and switch only when you hit its limits.

Tool	Vendor	Strength	Free tier
ChatGPT	OpenAI	Broadest availability, large plugin ecosystem, image and voice	Yes, with limits
Claude	Anthropic	Long context, precise instruction-following, strong for stylistic writing	Yes, with limits
Gemini	Google	Native integration with Google Workspace (Docs, Sheets, Gmail)	Yes
Perplexity	Perplexity AI	Web search with cited sources, good for research	Yes
Microsoft Copilot	Microsoft	Office and Windows integration (Word, Excel, Outlook, Teams)	With M365 subscription
Midjourney	Midjourney Inc.	Image generation, strong aesthetics	Paid only

For the templates in this guide it does not matter which of the three text tools you pick: ChatGPT, Claude and Gemini all handle these prompts well in 2026. Differences in tone are smaller than differences caused by your own context and constraints.

What can AI actually do?

Modern AI handles in seconds tasks that used to take hours, provided the task is clearly stated and rests on patterns the training data already contains. Outside that envelope the model degrades · sometimes silently, by inventing plausible-sounding nonsense.

Realistic everyday use for sales, marketing and content teams:

Writing and revising text (emails, landing pages, blog posts, LinkedIn updates, internal memos).
Research and summarisation (long PDFs, transcripts, articles).
Translation, especially between English and German.
Brainstorming and idea generation (headlines, angles, objection responses).
Comparison tables and structured analyses from unstructured input.
Generating, explaining and debugging code snippets.
Drafting interview questions, customer-research scripts and outreach sequences.

The limits, equally important:

No reliable current facts without web search · the training data has a cut-off date.
No independent judgement about your market · the model does not know your competitors, your customers, or your pricing unless you tell it.
No memory between separate conversations unless you use Projects or Memory features.
Hallucination on niche facts. Verify load-bearing numbers before publishing.

For a fuller treatment of these limits, see Hallucinations.