What is conversation design?

Conversation design is the discipline of deciding how AI systems behave in language — not just what they say, but how they structure dialogue, handle misunderstanding, manage uncertainty, and determine when to escalate to a human. It is a coordination discipline, not a writing discipline. The medium is language, but the work is behavioural engineering.

Why does conversation design matter more with generative AI?

Generative AI amplifies design decisions. A well-bounded, well-designed system becomes dramatically more helpful. A poorly designed one becomes dramatically more dangerous. Because generative AI sounds confident and coherent even when it is wrong, the failures are harder to detect, harder to audit, and more damaging to customer trust. Conversation design is the discipline that prevents fluent-sounding failure.

Doesn't generative AI make conversation design obsolete?

No. Generative AI has made it easier to build systems that sound natural. It has not made it easier to build systems that behave correctly under uncertainty, manage risk responsibly, or protect customers when something goes wrong. Those remain human design decisions. The better the underlying model, the more important it is that the design surrounding it is sound.

What does an AI conversation designer actually do?

An AI conversation designer decides how a system behaves when understanding is partial, confidence is low, and consequences matter. This includes: designing dialogue structure and turn-taking; defining how the system signals uncertainty; setting escalation triggers and handoff logic; establishing tone, persona, and language constraints; and governing the system in production as behaviour drifts from design intent.

Conversation Design in the Age of Generative AI

The misconception that's costing organisations real money

There is a belief spreading through enterprise AI programmes right now that goes something like this: generative AI has made conversation design less important because the model just handles it now.

It's understandable. Modern large language models produce remarkably natural-sounding dialogue. They adapt tone, handle unexpected input, and don't require the laborious intent mapping that dominated the previous generation of conversational AI work. Compared to writing hundreds of intent utterances for a rule-based bot, deploying a generative agent feels almost effortless.

That perception is the problem.

What generative AI has done is solve the language problem. It has not solved the conversation problem. Those are different things, and conflating them is one of the most expensive mistakes you can make when deploying AI in customer experience.

Conversation design exists because conversation is not a language problem. It is a coordination problem.

Language is the medium. But what makes a conversation work — or fail — is the structure underneath it. The sequencing of turns. The signals the system sends about what it understands and what it doesn't. The decisions it makes when the customer says something unexpected. The logic that determines when to ask, when to confirm, when to proceed, and when to stop and hand to a human. None of that comes free with the model. All of it must be designed.

What conversation design actually is

The easiest way to misunderstand conversation design is to treat it as copywriting for chatbots. It is not. Writing is the medium. Judgement is the work.

A conversation designer's core responsibility is deciding how an AI system behaves under uncertainty: what it does when understanding is partial, how it signals limits, how it asks for clarification, how it manages the transition to a human agent, and how it avoids abandoning the customer in the middle of something that matters to them.

That work involves linguistics, yes. But it also involves systems thinking, behavioural psychology, risk management, and deep familiarity with the operational environment the AI is being deployed into. A conversation designer is part writer, part architect, part risk analyst — and increasingly, part governance practitioner.

The discipline emerged because a consistent pattern kept repeating across every generation of AI technology: systems that could handle language at the sentence level still failed at the conversation level. They got the words right but lost the thread. They understood individual utterances but couldn't track what the customer actually needed across a multi-turn exchange. They sounded capable right up until the moment they weren't.

From Designing AI Conversations at Scale

"Conversation design exists because conversation is not a language problem. It is a coordination problem. The structures [of well-designed dialogue] are not optional. The systems that work in production are the ones whose designers built these structures deliberately, regardless of how capable the underlying model was."

That pattern hasn't changed with generative AI. It has intensified.

What generative AI actually changes — and what it doesn't

Generative AI has genuinely changed parts of the conversation design landscape. The improvements are real and worth acknowledging.

Classical conversational AI required explicit structure. Every intent had to be defined. Every entity had to be extracted into a predefined slot. Every response was selected from a library. The system was only as good as the intents it had been trained on and the flows that had been built for it. Anything outside that scope, the system would fail visibly.

Generative AI handles unexpected input far more gracefully. It can engage with phrasing the designer never anticipated. It adapts tone. It handles context across a conversation in ways that older systems couldn't. From the customer's perspective, interactions feel more natural and less constrained.

These are real improvements. But here's what hasn't changed:

Ambiguity. Customers still say things that could mean multiple things. The system still has to decide which meaning to act on.
Missing information. Customers still omit critical details. The system still has to decide how to ask for them without creating friction.
Emotional context. Customers still arrive frustrated, anxious, or grieving. The system still has to decide how to respond to emotional load, not just semantic content.
Risk. Transactions, escalations, and commitments still carry consequences. The system still has to decide how much to confirm before acting.
Escalation. Humans still need to receive the customer when automation isn't appropriate. The handoff still has to work.

None of these are language problems. They are design problems. Generative AI doesn't resolve them. It relocates them — from visible, explicit rules to invisible, implicit assumptions baked into prompts, system instructions, and orchestration logic.

Generative AI doesn't eliminate the need for conversation design. It makes the consequences of skipping it much harder to see.

The fluency trap

One of the most dangerous properties of modern generative AI is that it sounds confident and coherent even when it's wrong.

A rule-based bot that hit the edge of its capability said "I'm sorry, I don't understand." Frustrating, but honest. The customer knew they'd hit a wall. They could rephrase, or ask for a human, or try a different channel. The failure was visible.

A generative system that hits the edge of its knowledge doesn't usually say "I'm sorry, I don't understand." It generates a fluent, well-structured answer that sounds authoritative — and may be completely wrong. The refund policy it describes might not exist. The process it outlines might have changed. The exception it claims is possible might not be available to that customer. The customer has no easy way to know. They act on the answer. The system has failed, but it has failed persuasively.

In customer experience contexts, this is a serious problem. Customers tend to trust systems that sound human. The fluent answer gets taken at face value. By the time the error surfaces, it's in the wrong system — churn data, complaint queues, regulatory submissions — and the damage has already compounded.

This is what my book describes as fluent failure. Broken failure is obvious and fixable. Fluent failure is subtle, invisible on the dashboard, and deeply corrosive to trust.

Conversation design is the discipline that prevents fluent failure. It does this through deliberate structure: confirmation strategies, hedging language, escalation triggers, and refusal patterns that constrain where the model can commit without sufficient grounds. These aren't censorship mechanisms. They are the responsible exercise of the same judgement a senior human agent would exercise when they realise they're out of their depth.

Structure doesn't disappear. It moves.

The shift from classical to generative conversational AI is often described as a shift from structure to freedom. That framing is wrong.

In classical systems, structure is explicit and visible. Every flow has a defined path. The designer can point to a specific decision rule and trace exactly how the system responds to a given input. The structure is auditable.

In generative systems, structure lives in prompts, system instructions, orchestration logic, and conversation history. It's still there. But it's no longer easy to inspect. Two different prompts can produce dramatically different behaviours. Three months after launch, a system that started with tight constraints can drift into looser behaviour because someone updated a prompt without thinking through the implications. The structure hasn't disappeared. It has become implicit, and implicit structure is much harder to govern.

This is why conversation designers working on generative systems need to think differently about where the design actually lives. The dialogue scripts and flow diagrams of the previous era are not enough. The design must extend into prompt architecture, instruction design, context management, and the ongoing observation of how the system behaves in production across thousands of real conversations.

The design disciplines that matter most in GenAI systems

Prompt architecture. System instructions that constrain and guide model behaviour — what the AI should and shouldn't commit to, how it should signal uncertainty, where it must escalate.

Context management. How conversation history is maintained, what information carries across turns, and what gets dropped.

Escalation design. The triggers, logic, and handoff mechanics that determine when the AI stops and a human starts.

Production governance. The ongoing observation and modification of a live system based on what it actually does, not what the design said it would do.

Conversation design as a risk discipline

One frame that's often missing from how organisations think about conversation design is risk.

Every decision a conversation designer makes is implicitly a risk-pricing decision. How much to confirm before acting. How often to escalate. How much latitude to give the model. How much to ask of the customer. How much ambiguity to tolerate before seeking clarification. These are all decisions about how to allocate risk between the system, the customer, and the organisation.

In high-stakes domains — financial services, healthcare, government — the cost of a wrong answer is not a bad NPS score. It's a misrouted refund, a misunderstood eligibility decision, a medical instruction interpreted in a context the system couldn't have anticipated. The cost of getting it wrong far outweighs the cost of getting it slowly.

Designers under pressure to make systems efficient will reach for speed and brevity. They'll minimise confirmation steps. They'll allow the model to commit with lower confidence. Each of these decisions looks reasonable in isolation. Cumulatively, they produce systems that hold up under normal conditions and fail badly in the situations that matter most.

The best conversation designers I've worked with treat restraint as a design virtue. Knowing when not to automate an interaction — because it's emotionally charged, ambiguous, or dependent on nuanced human judgement — is as important as knowing how to automate one well. Choosing not to automate is not a failure of capability. It is a judgement that the cost of getting this wrong, at scale, exceeds the cost of routing it to a human.

The conversation designer as mediator and steward

There's a meeting that happens in almost every AI in CX programme. A product manager wants efficiency. A risk team wants more confirmations. Marketing wants stronger brand voice. Customer research wants emotional resonance. Engineering wants implementation simplicity.

The conversation designer is the person in that room whose job is to find the design that does enough of each without breaking the experience for the customer. That is a position of significant influence that doesn't always come with formal authority. Learning to operate in it effectively is a real professional skill — and one that becomes more valuable, not less, as the systems grow more capable.

Over time, the role shifts from builder to steward. Building a conversational system is finite work. Stewarding it across years, as the organisation changes, as the model updates, as new policies arrive, as customer expectations evolve — that is ongoing work. Stewardship means maintaining coherence, protecting customers from optimisation pressures that would erode the experience, and resisting changes that look efficient on a slide and harmful in practice.

This is where the best conversation designers spend most of their careers. Not in the initial build, but in the sustained, unglamorous work of keeping systems honest as everything around them changes.

Why this moment matters

Every generation of language technology has produced the same temporary impression: natural conversation is now solved. NLU models improved. Then large language models improved further. Each leap created a window in which it felt like the hard work of conversation design might finally be unnecessary. And each time that window closed, the same coordination failures appeared again — in slightly more fluent form.

We are in that window right now with generative AI. The models are genuinely impressive. The instinct to skip the hard conversation design work in favour of faster deployment is understandable. The consequences of acting on that instinct are the same as they have always been — systems that sound good in the demo and disappoint in production.

The organisations that will build conversational AI that genuinely works are the ones that understand a simple thing: the model is not the design. The model is the medium. The design is the judgement applied to how that medium is used — how it's bounded, structured, governed, and improved over time.

Conversation design isn't the work you do instead of using good AI. It's the work that determines whether good AI becomes a good system.

Generative AI amplifies that truth. It doesn't change it.

Fluency isn't design. It never was.