For a long time the AI conversation has been “is it good or bad.” The honest answer, for high-stakes knowledge work, is that it depends entirely on the task. The same model, the same operator, the same week — confidently better on one task and measurably worse on the next.
The most-cited controlled study of generative AI in professional services is the BCG × Harvard “Navigating the Jagged Technological Frontier” experiment. 758 BCG consultants were randomized: half got GPT-4, half worked the way they always had. The results split sharply by task type.
Inside AI's frontier
+40%
Higher-quality output on creative, ideation, and synthesis tasks.
Consultants using AI completed 12.2% more tasks, 25.1% faster.
Outside AI's frontier
−19 pp
Less likely to reach the correct answer on judgment-heavy quantitative reasoning.
Same operator. Same week. Different task. Worse than no AI at all.
The 40% rework problem most firms are seeing is not because AI is bad. It is because the same person is reaching for the same tool on tasks where it lifts them and tasks where it sinks them, and they cannot tell which is which until the deliverable goes out the door. The skill that separates senior practitioners who get durable value from AI from those who don't is not better prompting. It is task triage.
Two questions, four zones, and a task-by-task map. You can run the whole thing in under sixty seconds before any task you are about to hand to a model.
The two questions that decide everything
Before you reach for AI on any task, ask:
Question 01
Is the output verifiable in under five minutes?
Not “is it possible to verify it” — almost any output is verifiable given infinite time. The question is whether someone could spot-check it in less time than it would have taken to do the task without AI. If verification takes longer than the original work, AI is a productivity loss even when the output is correct.
Question 02
What is the asymmetric cost of being wrong?
If a wrong answer costs you ten minutes of editing, the asymmetry is small. If a wrong answer ends up in a press release, a regulatory filing, a quote attributed to a real human, or a client deliverable — the asymmetry can be career-defining. The verification rung you climb is set by the downside, not the upside.
The asymmetry is the part most teams underweight. Senior practitioners intuitively price the upside of a saved hour but routinely under-price the downside of a fabricated quote, a confabulated statistic, or an attribution that isn't quite right. The two sides of the ledger are not the same shape.
The asymmetry, visualized
Verification effort is calibrated to the bottom bar, not the top one. That single move — pricing the downside honestly — is most of the work.
Together, the two questions classify almost any task into one of four zones.
The four-zone map
Plot any knowledge-work task on two axes — verifiability on the horizontal, stakes on the vertical — and four meaningfully different zones emerge. Each zone has its own operating discipline.
The decision map
Stakes × Verifiability
Rose · Don't AI alone
Quote attribution, legal interpretation, regulatory math, source-of-truth research.
Verification approaches the cost of doing it by hand. Asymmetric downside.
Amber · AI + verification
Client-facing drafts, summary statistics, framework-driven analysis, cited research.
AI is faster — but verification is non-negotiable, every time, no exceptions.
Slate · AI carefully
Brainstorm directions, generated options, exploratory analysis, internal ideation.
Output is suggestive rather than load-bearing. Be skeptical; the downside is bounded.
Emerald · Use AI freely
Reformatting, simple translation, code refactoring, first-draft brainstorms, table cleanup.
Output is obviously right or wrong. A wrong answer costs minutes. Most real productivity gains live here.
The error most firms make is reaching for AI on rose-zone tasks because the time-savings look attractive on the upside. The math rarely works. A 30-minute saving on a quote that turns out to be fabricated costs a week of damage control, the relationship with the source, and a permanent dent in trust. The asymmetry is doing the work, not the model.
Done well, the four-zone map is not a rulebook. It is a 60-second pre-flight check that runs before any AI-assisted task and re-runs whenever the stakes or the verification cost changes mid-task.
Task by task: where each kind of work lives
The map is abstract; the work is concrete. The categories below cover most of what knowledge workers actually do in a week. The zone shifts within a category depending on whether the output is internal, client-facing, or attributed to a third party.
The most important row is the last one. The fastest way to do real damage with AI is to put words in someone's mouth they did not say. Even with attribution-style scrubbed from the prompt, the model can confabulate quotes that read plausibly enough to slip past a casual review. There is no version of this task that lives anywhere but rose. If you need a quote, get it from the human.
In Hone Studio
The Knowledge Base + retrieval-augmented generation shifts the verifiability axis meaningfully. When the model is grounded in your own documents and serves outputs with citations attached, many amber-zone research tasks become emerald: the model is not generating plausible-sounding text from training data, it is quoting from your sources and pointing at them. Verification collapses from “look up every claim” to “click the citation.”
The verification ladder
For any amber-zone task, the verification step decides whether AI was a net positive. Different tasks call for different rungs of the ladder. The discipline is matching the rung to the stakes — not defaulting to the easiest rung because the output looked good.
Spot-check · 10 seconds
Read it, ask “does this look approximately right?” Sufficient for low-stakes amber: reformatting, code refactor of familiar code.
Cross-reference · 60 seconds
Compare against one independent source. Sufficient for moderate amber: framework-driven analysis, structural drafts.
Original-source check · 5 minutes
For every load-bearing claim, verify against the primary source. Floor for client-facing deliverables.
Expert review · variable
A human with subject-matter authority signs off. Required for rose-adjacent tasks where AI is scaffolding, not the deliverable.
Rung 1 on a rung-3 task is how the 40% rework problem gets generated. Rung 4 on a rung-1 task is how AI productivity gains get eaten by verification overhead. Calibration is the whole game.
When memory and retrieval are in the loop
Most of this calculus assumes vanilla AI: a chat window, no document context, no memory of your past work. The four-zone map is built around that default. When the model is operating with retrieval against your own documents and persistent memory of your firm's voice and prior work, the geometry shifts.
The shift is asymmetric. The emerald and amber zones widen meaningfully, because the failure mode that creates the worst rework — confabulation of plausible-but-wrong specifics — is the failure mode that retrieval most directly suppresses. Citations move verification from “look up every claim” to “click the link.” Memory means the model has already absorbed your firm's voice, positioning, and prior client context, so the output arrives in-house-shaped rather than generic. The rose zone does not dissolve — quote attribution, legal interpretation, and the decision itself are still your work — but it gets smaller.
In Hone Studio
Hone Studio is built around this shift. The Assistant runs against a Knowledge Base of your firm's actual documents — playbooks, past deliverables, client materials — using retrieval-augmented generation and hypothetical-document expansion to find the right context before answering. Memory carries your firm's voice and accumulated institutional knowledge across sessions. The result is that the everyday user sends short, conversational prompts and gets back work that lives in the emerald or low-amber zone for tasks that would have been amber or rose without grounding. The triage discipline still matters. The number of tasks that survive it goes up.
The decision in one diagram
Every framework above collapses to four leaves. Before the next task you reach for AI on, walk this branch in your head:
The triage in one diagram
Rose
Don't AI alone
Slate
AI carefully
Amber
AI + verify
Emerald
AI freely
The skill that lasts
The wrong way to read this piece is as a set of rules. Two years from now the model will be smarter, the verifiability of certain tasks will be different, and the 2×2 will need to be re-drawn. New tools will pull tasks from rose into amber, from amber into emerald. Some tasks that look emerald today will turn out to have hidden downside that pulls them back into amber.
The skill that lasts is pattern recognition: looking at any task and asking, instinctively, “where does this live on verifiability and stakes?” — and adjusting accordingly. People who are durably good at AI for high-stakes knowledge work have internalized that question. They reach for AI confidently on the right tasks, suspiciously on others, and not at all on the third group. They calibrate verification to the downside, not the upside. They build their own task taxonomies for their own work, and they update them as the models change.
That is not a skill about AI. It is a skill about deciding where to spend your attention. Which is the actual job.