The line I keep repeating, the one that reorganised everything I build: UI is going to disappear, or become something else, and whoever keeps building UI-heavy apps is going to regret it.
I still believe it. But the lazy version of that sentence, everything becomes a chat box, is wrong, and it’s wrong for reasons worth understanding. Chat is not the endpoint. So here is the actual case, including the parts that argue against me, and the one place I had to narrow the claim to keep it honest.
What a UI is even for
An interface exists to cross two gaps Don Norman named forty years ago. The gulf of execution: how do I get the machine to do what I want? The gulf of evaluation: did it actually do it? Every button, form, menu, and flow is scaffolding over one of those two gulfs.1
If you want to know what agents do to the interface, ask what they do to each gulf. They don’t touch them equally, and that asymmetry is the whole story.
The part I’m confident about: execution collapses
For twenty years the interface was the product. Not the computation underneath. Generic CRUD scaffolding has been a commodity for a decade. What people paid for was the choreography: the sequence of screens that walked a human from “I want a flight” to a booked ticket. That choreography exists because the human had to translate intent into the machine’s terms, click by click.
An agent absorbs that translation. You say what you want; it does the walking. Jakob Nielsen, the closest thing usability has to a founding figure, calls this the third interface paradigm in sixty years: “intent-based outcome specification,” where “the user no longer tells the computer what to do… [it] completely reverses the locus of control.”2 Karpathy calls the same shift Software 3.0: “the hottest new programming language is English.”3
When intent is the input, the click-choreography becomes overhead. That is the part of the UI that dies. Even a16z, in Good News: AI Will Eat Application Software, which argues against the maximalist version of this, concedes the target: “frontend tools that serve primarily as thin wrappers around commodity functionality… are vulnerable.”4
So far the thesis holds. Now the naive version falls apart.
Why “everything becomes chat” is wrong
Collapsing the input gulf doesn’t collapse the whole interface. The interface was doing three other jobs that language is bad at.
Output. Vision is parallel; language is serial. Shneiderman’s line, from his 1997 debate with the agents camp: an interface can put “4,000 or more items on the screen… that enables people to see all of the possibilities,” and “it would be hard to see how you could program an agent to anticipate all of the possibilities that your eye can pick up in 1/10th of a second.”5
Discoverability. A blank prompt has no affordances. It can’t show you what it’s capable of. Norman, in 2010, long before this was fashionable: “The strength of the graphical user interface (GUI) has little to do with its use of graphics: It has to do with the ease of remembering actions, both in what actions are possible and how to invoke them.”6 We have a decade-long, eleven-figure natural experiment in ignoring that. It’s called Alexa.7
The evaluation gulf widens. This is the one that gets skipped. When an opaque agent does ten steps on your behalf, you now have to verify ten steps you didn’t watch. Karpathy, a believer, puts the constraint plainly in his Software 3.0 talk: he is “still the bottleneck” who “has to verify it for bugs and security issues.”8 The model drafts faster than a person can check. Automating execution doesn’t remove the interface. It shifts the interface’s job from input to inspection.
Why “this time” has to earn it
Every few years someone announces the death of the GUI. Conversational commerce in 2016. Voice before that. They failed, and the failure is worth studying, because it was specific.
Alexa had over 160,000 skills available,9 and even so a Voicebot survey found fewer than half of owners had ever tried a third-party one: a discovery problem more than a usage one.10 Fewer than 2% of Alexa users bought anything by voice in 2018, and of the few who did, about 90% never repeated it (The Information).11 Nielsen Norman Group watched real people use these assistants and called it “a return to the dark ages of the 1970s: the need to memorize cryptic commands, oppressive modes.”12
The honest part: Alexa had real language problems (multi-step, context, third-party skills). But comprehension isn’t primarily what killed it. It died of discoverability, because nobody knew what to say, and of there being no reason to speak when a tap was faster. Julian Lehr’s version: “Hey Google, what’s the weather in San Francisco today?” takes ten times longer than tapping the weather app.13
LLMs closed most of the comprehension gap. Not all of it; they still fumble ambiguity, grounding, and long-horizon constraints. So “this time is different” only gets to mean something narrow and honest.
What LLMs fixed
- Understanding messy, open-ended intent
- Mapping words to real actions and tools
- Holding context across a conversation
What they didn't
- Output bandwidth: dense results still need a screen
- Discoverability: the blank box still can't tell you what it can do
- Input speed: for a known task, a tap still beats a sentence
The language half went from broken to mostly working. That is enough to change the input side, and it does nothing for output bandwidth or discoverability. Any prediction that forgets the other two is selling you the demo.
So what actually happens: the interface splits in two
Put it together and you don’t get “no UI.” You get a split, with the two halves moving in opposite directions.
The task interface gets demoted, and composed at runtime. Why per request, and not pre-authored screens? Because once the input is open-ended intent, the things a user might ask for stop being enumerable. You can’t hand-build a screen for every request when the requests are unbounded. So the matching surface gets composed on demand, sometimes snapped together from templates and components, at the far end generated from scratch, then discarded. The components doing the snapping are still authored, and carefully. What is disposable is the arrangement: assembled for one intent, then gone. Authored primitives, disposable screens. This isn’t speculative anymore. In November 2025 Google shipped generative UI on Gemini 3, interfaces produced per prompt, and its own caveat gives it away: it “can sometimes take a minute or more to generate results, and there are occasional inaccuracies.”14 OpenAI put apps inside ChatGPT.15 The task UI didn’t vanish; it moved to being rendered on demand and thrown away.
The trust interface gets promoted, and authored. The wider evaluation gulf doesn’t dissolve into chat; it demands the opposite. When agents act on money, health, contracts, or production data, the surface that survives is the one that lets you check and steer: permissions, review, provenance, diffs, rollback, the exception queue. That UI has to be stable, legible, and deliberately designed. You can’t verify a high-stakes action through an interface the model improvised half a second ago. This is the durable front-end. Audit logs, permission screens, and approval queues already exist, of course, but built for a human operator. Almost nobody is building the version where the thing you’re supervising is an agent acting on its own.
So the bet isn’t “UI dies.” It’s that the generic, choreographed, hand-authored task UI that was the product gets commoditised into a disposable projection, while value migrates to the substrate underneath and the trust surface on top. mozilla.ai put the substrate half well: “The deck, the doc, the dashboard. None of them are the source of truth. They are projections.”16
Whoever owns the state, not whatever app happens to render it this second, owns the defensible layer. The improvised pixels are just a view.
Where I could be wrong, and how it narrows the claim
This isn’t a box I check for credibility. The counter-evidence bounds the claim.
People use apps more, not less. 5.3 trillion hours in 2025, an all-time high (Sensor Tower),17 and the AI wave is itself consumed as a downloaded app full of buttons. ChatGPT, the flagship of the movement, keeps adding more graphical UI: interactive apps, widgets, checkout.15 That’s not a refutation. It’s exactly the trust-and-output half being promoted, inside the chat product. Generative UI is slow and unreliable today. Computer-use agents, the ones meant to drive the old apps for us, still top out around 66% on the OSWorld benchmark (Anthropic’s Claude Opus 4.5, late 2025), well short of a person’s roughly 72%;18 the real-world number on messy multi-step tasks is lower. And Nielsen notes a large share of people write far less fluently than they read,19 so “just type what you want” is its own wall.
So the claim narrows, and I think it survives: not that screens disappear, but that the hand-authored task interface stops being where the value is, pulled apart into a generated bottom and an authored, verification-first top. It bites hardest on generic task choreography, the CRUD and forms and flows; expert, spatial, and creative tools, the IDEs and design canvases and DAWs where dragging beats describing, keep their hand-built surfaces longest. If you’re building your moat in the middle, in the choreography, the timeline is not your friend.
The bet, and what it makes me build
If the task interface is a disposable projection, pouring your value into it is a mistake with a delay on it. The durable value sits in two places the generation can’t reach: the intent-and-verification loop, and the substrate that doesn’t care whether a human or an agent is driving.
So that’s what I build:
None of it screenshots well. That’s the point. When the front-end is generated per request and thrown away, the moat is the memory, the guardrails, and the audit trail behind it.
I could be early. I’ve been early before, and early feels identical to wrong for an uncomfortably long time. I’m a builder making a bet, not a researcher with a proof. But I’d rather be early on the layer that lasts than perfectly on time for the one that’s evaporating. This site dogfoods the split, not the death: a designed face for people, because a personal site is a trust surface, and an agent-readable twin at /llms.txt for every reader that isn’t one. I’ll keep sharpening the argument.
Footnotes
-
Don Norman, “Cognitive Engineering,” in Norman & Draper (eds.), User Centered System Design, 1986; popularized in The Design of Everyday Things (1988). https://www.nngroup.com/articles/two-ux-gulfs-evaluation-execution/ ↩
-
Jakob Nielsen, “AI: First New UI Paradigm in 60 Years,” Nielsen Norman Group, 18 June 2023. https://www.nngroup.com/articles/ai-paradigm/ ↩
-
Andrej Karpathy, on X, 24 January 2023. https://x.com/karpathy/status/1617979122625712128 ↩
-
Alex Immerman & Santiago Rodriguez, “Good News: AI Will Eat Application Software,” Andreessen Horowitz, 2 March 2026. https://a16z.com/good-news-ai-will-eat-application-software/ ↩
-
Ben Shneiderman & Pattie Maes, “Direct Manipulation vs. Interface Agents,” interactions 4(6), Nov/Dec 1997. https://www.cs.umd.edu/users/ben/papers/Shn-Maes-v4n6-1997.pdf ↩
-
Don Norman, “Natural User Interfaces Are Not Natural,” interactions 17(3), May–June 2010. https://jnd.org/natural-user-interfaces-are-not-natural/ ↩
-
Dana Mattioli, “Amazon’s Devices Unit…,” The Wall Street Journal, July 2024 (the devices unit lost more than $25B, 2017–2021). https://www.wsj.com/tech/amazon-alexa-devices-echo-losses-strategy-25f2581a ↩
-
Andrej Karpathy, “Software Is Changing (Again)” (Software 3.0), YC AI Startup School, June 2025. https://www.youtube.com/watch?v=LCEmiRjPEtQ ↩
-
Amazon’s own figure of over 160,000 available Alexa skills, cited in 2024. https://www.aftvnews.com/amazon-to-stop-paying-alexa-skill-developers/ ↩
-
Voicebot.ai, Smart Speaker Consumer Adoption Report, 2018–2019 (fewer than half of owners had ever tried a third-party skill: a discovery problem). https://voicebot.ai/2019/03/12/smart-speaker-owners-agree-that-questions-music-and-weather-are-killer-apps-what-comes-next/ ↩
-
Priya Anand, The Information, 8 August 2018, as reported by Retail Dive (fewer than 2% of Alexa owners bought by voice in 2018; about 90% did not repeat). https://www.retaildive.com/news/the-information-only-2-of-alexa-users-reportedly-purchase-off-the-devices/529608/ ↩
-
Raluca Budiu & Page Laubheimer, “Intelligent Assistants Have Poor Usability,” Nielsen Norman Group, 22 July 2018. https://www.nngroup.com/articles/intelligent-assistant-usability/ ↩
-
Julian Lehr, “The case against conversational interfaces,” 27 March 2025. https://julian.digital/2025/03/27/the-case-against-conversational-interfaces/ ↩
-
Google Research, “Generative UI,” 18 November 2025 (built on Gemini 3). https://research.google/blog/generative-ui-a-rich-custom-visual-interactive-user-experience-for-any-prompt/ ↩
-
OpenAI, “Introducing apps in ChatGPT and the Apps SDK,” 6 October 2025. https://openai.com/index/introducing-apps-in-chatgpt/ ↩ ↩2
-
Alejandro Gonzalez, “The Interface Is No Longer the Product,” mozilla.ai, 19 May 2026. https://blog.mozilla.ai/the-interface-is-no-longer-the-product/ ↩
-
Sensor Tower, “State of Mobile 2026,” January 2026 (5.3 trillion hours in 2025). https://sensortower.com/blog/state-of-mobile-2026 ↩
-
Anthropic, “Claude Opus 4.5 System Card,” 24 November 2025 (66.3% on OSWorld-Verified; human baseline ~72%). https://www.anthropic.com/claude-opus-4-5-system-card ↩
-
Jakob Nielsen, “The Articulation Barrier: Prompt-Driven AI UX Hurts Usability,” UX Tigers, 2023. https://jakobnielsenphd.substack.com/p/prompt-driven-ai-ux-hurts-usability ↩
