Karpathy's Scale & Solving Horseless Carriages

September 2025

In AI Horseless Carriages, Pete Koomen talks about how most AI apps today are designed incorrectly - people tack on AI features to a software rather than redesigning the software from the ground up with AI at its core. He also makes an example out of gmail, after they released their AI assistant that was supposed to help with emails.

Pete basically said

"it takes longer to use the gmail assistant than to actually write the draft"

That got me thinking - the true unlock that AI gives you is the ability to do the things you would have to do. In the quest to make apps truly AI-native, there should be only one metric that matters: TTC (time to completion). This is a measure of how long the human needs to spend completing a task. In Pete's gmail example, the task is to write emails. Humans use software to finish tasks quicker (i.e think snail mail vs email). It's why the software revolution even took place. It made it easier and quicker to achieve the same goals. The AI revolution is no different.

TTC is the ONLY metric that should matter, and should drive all design decisions. For any app, you can imagine a scale of automation, or agency. The left side of this scale is traditional software and the right is complete AGI – at the left end, you're using the software to do something (like we always have), and at the right end, the software is on complete autopilot. Other ways to think about this scale is a manual car (left) versus a Waymo (right), or software as a service (left) versus service as a software (right). Let's call this Karpathy's scale, alluding to his talk at YC's startup school. The goal should be to move rightwards. Not coincidentally, as you do this, TTC approaches 0.

Everything else must follow from this goal. And so, if you were to over-index on minimizing TTC, possibly the only way to succeed is through hyper-personalization - i.e your software should seem magical, as if you handcrafted it specifically to complete your user's tasks.

To elaborate, I'll extend the gmail example. The user is Gary, and Gary is a subcontractor for a concrete pouring company out in Indiana.

Gary's inbox is made up as the following (this is factual)

inquiries about his service
jobs from people that need concrete poured
internal emails
billing
marketing spam / other

When Gary reads a new email, his brain decides which of these categories to stick this email into. And each case warrants a specific set of actions.

respond to inquiries
check if the job is viable, log it somewhere, then respond
internal emails - respond adhoc
billing → move to accounting software
ignore / archive / spam

In order to minimize Gary's TTC, which I argue would yield the perfect solution for Gary, you would first have to figure out where he spends most of his time (so you can quantify his TTC). Then, you would automate what you could, even if it required him verifying the output every time. (We are assuming reliability is a given here).

You would find that Gary spends most of his time on dealing with incoming jobs and drafting responses. Now you have an insight into his TTC.

And you would then design a client that connects into the softwares HE uses - the old construction softwares, to get context from there, and to execute on these workflows. And even if it's not reliable, and you need him to verify before you trigger such workflows, you've done something powerful → you've put yourself on Karpathy's scale.

You know you're on Karpathy's scale because your software is doing things for Gary - it's agentic in nature. And so over time, as models and MCPs and context engineering get better, you can push that slider to the right.

The trade-off here is significant - hyper-personalizing software means you condense your audience size down. You cannot expect that a tech-founder will use the same client that Gary will. In general, the smaller the bubble, the better the software. There are probably several local maximas that serve as good startup ideas - the size of your audience that keeps the software focused is large enough to warrant starting a company. I think this is correlated to why creating agentic software for a vertical seems so lucrative (think Harvey.ai, Trunktools, Rogo.ai). And it's why I think generic software will struggle, and eventually be replaced. For instance, manufacturing companies and bedsheet companies both use SAP as their ERP platform. In this new paradigm, I don't see how SAP competes with an ERP that's specific to the trucking industry.

The Personalization Trade-off

Market Size

Excel

SAP

Gary's Tool

Personalization →

Market Size

The last thought I have on this is that software development is heading down the commoditization path. Soon my grandmother will be able to ask Siri to add levels to Candy Crush after finishing the game. I've spoken to construction guys like Gary who use lovable to create power-BI like dashboards. It's insane to think that guys out in tiny towns doing construction know and use Lovable.

The bar for software has exponentially increased. At the same time, development becomes much easier. Since you can recreate a software that does 80% of Microsoft Excel in one evening, excel itself is no longer as valuable as it once was. The value of any given piece of software has eroded, because the barrier to entry has been shattered.

AI might possibly be the worst thing that has happened to many of the giants, with the only shields they have left being distribution & stickiness. And by focusing on TTC, a lot of these giants can and will be disrupted.

The move clearly seems to be - focus on understanding the user, start automating their tasks to minimize TTC, build a UI around that, and traverse Karpathy's scale. Odds are, you'd be building a rocketship, not a horseless carriage.

Kush Bhuwalka