The line I keep repeating, the one that reorganised everything I build: UI is going to disappear, or become something else, and whoever keeps building UI-heavy apps is going to regret it.

I still believe it. But the lazy version of that sentence, everything becomes a chat box, is wrong, and it’s wrong for reasons worth understanding. Chat is not the endpoint. So here is the actual case, including the parts that argue against me, and the one place I had to narrow the claim to keep it honest.

What a UI is even for

An interface exists to cross two gaps Don Norman named forty years ago. The gulf of execution: how do I get the machine to do what I want? The gulf of evaluation: did it actually do it? Every button, form, menu, and flow is scaffolding over one of those two gulfs.1

You intent Machine state GULF OF EXECUTION turn what you want into what it needs GULF OF EVALUATION turn what it did into what you understand
The two gulfs. Every interface exists to close one of them.

If you want to know what agents do to the interface, ask what they do to each gulf. They don’t touch them equally, and that asymmetry is the whole story.

The part I’m confident about: execution collapses

For twenty years the interface was the product. Not the computation underneath. Generic CRUD scaffolding has been a commodity for a decade. What people paid for was the choreography: the sequence of screens that walked a human from “I want a flight” to a booked ticket. That choreography exists because the human had to translate intent into the machine’s terms, click by click.

An agent absorbs that translation. You say what you want; it does the walking. Jakob Nielsen, the closest thing usability has to a founding figure, calls this the third interface paradigm in sixty years: “intent-based outcome specification,” where “the user no longer tells the computer what to do… [it] completely reverses the locus of control.”2 Karpathy calls the same shift Software 3.0: “the hottest new programming language is English.”3

When intent is the input, the click-choreography becomes overhead. That is the part of the UI that dies. Even a16z, in Good News: AI Will Eat Application Software, which argues against the maximalist version of this, concedes the target: “frontend tools that serve primarily as thin wrappers around commodity functionality… are vulnerable.”4

So far the thesis holds. Now the naive version falls apart.

Why “everything becomes chat” is wrong

Collapsing the input gulf doesn’t collapse the whole interface. The interface was doing three other jobs that language is bad at.

Output. Vision is parallel; language is serial. Shneiderman’s line, from his 1997 debate with the agents camp: an interface can put “4,000 or more items on the screen… that enables people to see all of the possibilities,” and “it would be hard to see how you could program an agent to anticipate all of the possibilities that your eye can pick up in 1/10th of a second.”5

VISION ~4,000 items · one glance · parallel LANGUAGE one word at a time · serial You can glance at a dense table. You cannot listen to one.
Output bandwidth. Dense results have to be rendered, not narrated.

Discoverability. A blank prompt has no affordances. It can’t show you what it’s capable of. Norman, in 2010, long before this was fashionable: “The strength of the graphical user interface (GUI) has little to do with its use of graphics: It has to do with the ease of remembering actions, both in what actions are possible and how to invoke them.”6 We have a decade-long, eleven-figure natural experiment in ignoring that. It’s called Alexa.7

The evaluation gulf widens. This is the one that gets skipped. When an opaque agent does ten steps on your behalf, you now have to verify ten steps you didn’t watch. Karpathy, a believer, puts the constraint plainly in his Software 3.0 talk: he is “still the bottleneck” who “has to verify it for bugs and security issues.”8 The model drafts faster than a person can check. Automating execution doesn’t remove the interface. It shifts the interface’s job from input to inspection.

EXECUTION making it do the thing was the product → now overhead EVALUATION checking what it did was an afterthought → now the main job
The asymmetry. Outline is the old size; fill is the new one. This is the whole argument in one picture.

Why “this time” has to earn it

Every few years someone announces the death of the GUI. Conversational commerce in 2016. Voice before that. They failed, and the failure is worth studying, because it was specific.

$25Blost by Amazon's devices unit, 2017 to 2021 (WSJ)
<2%of Alexa users bought by voice in 2018
~90%of those never did it again
160k+skills shipped · most owners never tried a third-party one

Alexa had over 160,000 skills available,9 and even so a Voicebot survey found fewer than half of owners had ever tried a third-party one: a discovery problem more than a usage one.10 Fewer than 2% of Alexa users bought anything by voice in 2018, and of the few who did, about 90% never repeated it (The Information).11 Nielsen Norman Group watched real people use these assistants and called it “a return to the dark ages of the 1970s: the need to memorize cryptic commands, oppressive modes.”12

The honest part: Alexa had real language problems (multi-step, context, third-party skills). But comprehension isn’t primarily what killed it. It died of discoverability, because nobody knew what to say, and of there being no reason to speak when a tap was faster. Julian Lehr’s version: “Hey Google, what’s the weather in San Francisco today?” takes ten times longer than tapping the weather app.13

LLMs closed most of the comprehension gap. Not all of it; they still fumble ambiguity, grounding, and long-horizon constraints. So “this time is different” only gets to mean something narrow and honest.

What LLMs fixed

  • Understanding messy, open-ended intent
  • Mapping words to real actions and tools
  • Holding context across a conversation

What they didn't

  • Output bandwidth: dense results still need a screen
  • Discoverability: the blank box still can't tell you what it can do
  • Input speed: for a known task, a tap still beats a sentence

The language half went from broken to mostly working. That is enough to change the input side, and it does nothing for output bandwidth or discoverability. Any prediction that forgets the other two is selling you the demo.

So what actually happens: the interface splits in two

Put it together and you don’t get “no UI.” You get a split, with the two halves moving in opposite directions.

TRUST INTERFACE · PROMOTED authored · stable · permissions · review · provenance · rollback the old hand-authored task UI the product for 20 years · pulled apart TASK INTERFACE · DEMOTED generated per request · rendered · thrown away SUBSTRATE · THE MOAT memory · sandboxes · reasoning · routing
The split. One interface pulled into two, over the layer that outlives both.

The task interface gets demoted, and composed at runtime. Why per request, and not pre-authored screens? Because once the input is open-ended intent, the things a user might ask for stop being enumerable. You can’t hand-build a screen for every request when the requests are unbounded. So the matching surface gets composed on demand, sometimes snapped together from templates and components, at the far end generated from scratch, then discarded. The components doing the snapping are still authored, and carefully. What is disposable is the arrangement: assembled for one intent, then gone. Authored primitives, disposable screens. This isn’t speculative anymore. In November 2025 Google shipped generative UI on Gemini 3, interfaces produced per prompt, and its own caveat gives it away: it “can sometimes take a minute or more to generate results, and there are occasional inaccuracies.”14 OpenAI put apps inside ChatGPT.15 The task UI didn’t vanish; it moved to being rendered on demand and thrown away.

The trust interface gets promoted, and authored. The wider evaluation gulf doesn’t dissolve into chat; it demands the opposite. When agents act on money, health, contracts, or production data, the surface that survives is the one that lets you check and steer: permissions, review, provenance, diffs, rollback, the exception queue. That UI has to be stable, legible, and deliberately designed. You can’t verify a high-stakes action through an interface the model improvised half a second ago. This is the durable front-end. Audit logs, permission screens, and approval queues already exist, of course, but built for a human operator. Almost nobody is building the version where the thing you’re supervising is an agent acting on its own.

So the bet isn’t “UI dies.” It’s that the generic, choreographed, hand-authored task UI that was the product gets commoditised into a disposable projection, while value migrates to the substrate underneath and the trust surface on top. mozilla.ai put the substrate half well: “The deck, the doc, the dashboard. None of them are the source of truth. They are projections.”16

Whoever owns the state, not whatever app happens to render it this second, owns the defensible layer. The improvised pixels are just a view.

Where I could be wrong, and how it narrows the claim

This isn’t a box I check for credibility. The counter-evidence bounds the claim.

People use apps more, not less. 5.3 trillion hours in 2025, an all-time high (Sensor Tower),17 and the AI wave is itself consumed as a downloaded app full of buttons. ChatGPT, the flagship of the movement, keeps adding more graphical UI: interactive apps, widgets, checkout.15 That’s not a refutation. It’s exactly the trust-and-output half being promoted, inside the chat product. Generative UI is slow and unreliable today. Computer-use agents, the ones meant to drive the old apps for us, still top out around 66% on the OSWorld benchmark (Anthropic’s Claude Opus 4.5, late 2025), well short of a person’s roughly 72%;18 the real-world number on messy multi-step tasks is lower. And Nielsen notes a large share of people write far less fluently than they read,19 so “just type what you want” is its own wall.

So the claim narrows, and I think it survives: not that screens disappear, but that the hand-authored task interface stops being where the value is, pulled apart into a generated bottom and an authored, verification-first top. It bites hardest on generic task choreography, the CRUD and forms and flows; expert, spatial, and creative tools, the IDEs and design canvases and DAWs where dragging beats describing, keep their hand-built surfaces longest. If you’re building your moat in the middle, in the choreography, the timeline is not your friend.

The bet, and what it makes me build

If the task interface is a disposable projection, pouring your value into it is a mistake with a delay on it. The durable value sits in two places the generation can’t reach: the intent-and-verification loop, and the substrate that doesn’t care whether a human or an agent is driving.

So that’s what I build:

Memoryso the system knows what happened and what matters
Sandboxesso agents can act without breaking the world
Reasoninga sensible path, not just a plausible one
Routingthe right capability at the right moment

None of it screenshots well. That’s the point. When the front-end is generated per request and thrown away, the moat is the memory, the guardrails, and the audit trail behind it.

I could be early. I’ve been early before, and early feels identical to wrong for an uncomfortably long time. I’m a builder making a bet, not a researcher with a proof. But I’d rather be early on the layer that lasts than perfectly on time for the one that’s evaporating. This site dogfoods the split, not the death: a designed face for people, because a personal site is a trust surface, and an agent-readable twin at /llms.txt for every reader that isn’t one. I’ll keep sharpening the argument.

Footnotes

  1. Don Norman, “Cognitive Engineering,” in Norman & Draper (eds.), User Centered System Design, 1986; popularized in The Design of Everyday Things (1988). https://www.nngroup.com/articles/two-ux-gulfs-evaluation-execution/

  2. Jakob Nielsen, “AI: First New UI Paradigm in 60 Years,” Nielsen Norman Group, 18 June 2023. https://www.nngroup.com/articles/ai-paradigm/

  3. Andrej Karpathy, on X, 24 January 2023. https://x.com/karpathy/status/1617979122625712128

  4. Alex Immerman & Santiago Rodriguez, “Good News: AI Will Eat Application Software,” Andreessen Horowitz, 2 March 2026. https://a16z.com/good-news-ai-will-eat-application-software/

  5. Ben Shneiderman & Pattie Maes, “Direct Manipulation vs. Interface Agents,” interactions 4(6), Nov/Dec 1997. https://www.cs.umd.edu/users/ben/papers/Shn-Maes-v4n6-1997.pdf

  6. Don Norman, “Natural User Interfaces Are Not Natural,” interactions 17(3), May–June 2010. https://jnd.org/natural-user-interfaces-are-not-natural/

  7. Dana Mattioli, “Amazon’s Devices Unit…,” The Wall Street Journal, July 2024 (the devices unit lost more than $25B, 2017–2021). https://www.wsj.com/tech/amazon-alexa-devices-echo-losses-strategy-25f2581a

  8. Andrej Karpathy, “Software Is Changing (Again)” (Software 3.0), YC AI Startup School, June 2025. https://www.youtube.com/watch?v=LCEmiRjPEtQ

  9. Amazon’s own figure of over 160,000 available Alexa skills, cited in 2024. https://www.aftvnews.com/amazon-to-stop-paying-alexa-skill-developers/

  10. Voicebot.ai, Smart Speaker Consumer Adoption Report, 2018–2019 (fewer than half of owners had ever tried a third-party skill: a discovery problem). https://voicebot.ai/2019/03/12/smart-speaker-owners-agree-that-questions-music-and-weather-are-killer-apps-what-comes-next/

  11. Priya Anand, The Information, 8 August 2018, as reported by Retail Dive (fewer than 2% of Alexa owners bought by voice in 2018; about 90% did not repeat). https://www.retaildive.com/news/the-information-only-2-of-alexa-users-reportedly-purchase-off-the-devices/529608/

  12. Raluca Budiu & Page Laubheimer, “Intelligent Assistants Have Poor Usability,” Nielsen Norman Group, 22 July 2018. https://www.nngroup.com/articles/intelligent-assistant-usability/

  13. Julian Lehr, “The case against conversational interfaces,” 27 March 2025. https://julian.digital/2025/03/27/the-case-against-conversational-interfaces/

  14. Google Research, “Generative UI,” 18 November 2025 (built on Gemini 3). https://research.google/blog/generative-ui-a-rich-custom-visual-interactive-user-experience-for-any-prompt/

  15. OpenAI, “Introducing apps in ChatGPT and the Apps SDK,” 6 October 2025. https://openai.com/index/introducing-apps-in-chatgpt/ 2

  16. Alejandro Gonzalez, “The Interface Is No Longer the Product,” mozilla.ai, 19 May 2026. https://blog.mozilla.ai/the-interface-is-no-longer-the-product/

  17. Sensor Tower, “State of Mobile 2026,” January 2026 (5.3 trillion hours in 2025). https://sensortower.com/blog/state-of-mobile-2026

  18. Anthropic, “Claude Opus 4.5 System Card,” 24 November 2025 (66.3% on OSWorld-Verified; human baseline ~72%). https://www.anthropic.com/claude-opus-4-5-system-card

  19. Jakob Nielsen, “The Articulation Barrier: Prompt-Driven AI UX Hurts Usability,” UX Tigers, 2023. https://jakobnielsenphd.substack.com/p/prompt-driven-ai-ux-hurts-usability