RL Your Life

January 2026

For years I've been told by my father to write a 1 year plan, a 5 year plan etc. To pen down my goals and then relentlessly work towards them. "If you really believe it, it will happen".

Growing up, I never actually took that advice. In fact, it annoyed me. The exercise seemed futile - I had no idea what I wanted to achieve except for the fact that I wanted to be rich, healthy, and happy. And those are too arbitrary anyways. I knew my world would drastically change as I grew up, and I'd figure it out then. Well, now I'm grown up and things are as arbitrary as they were. If anything, its even more arbitrary because I'm more aware of how infinite the action space is.

It wasn't until recently that I really started to understand how important it is to have goals and actively manifest them. It clicked for me on one of my thinking days - there were 3 ideas that converged in my mind - reinforcement learning (RL), definite optimism, and my father's value proposition.

I'll start with RL. I was convinced that fine-tuning construction specific models would be a game changer. In other words, I thought that if I built a construction-specific version of ChatGPT, construction companies would go crazy for it. So I started studying model development in depth, reading hundreds of research papers and code to understand how I could do this. RL was one of the ideas I had to wrap my head around.

The basic idea behind reinforcement learning is actually pretty simple - when you don't know the exact answer to a question and you can get hints from trying things, try a bunch of stuff, see what happens, reward the good outcomes and punish the bad outcomes.

A good example of this is brick breaker - you let the computer play the game, and after it tries some random moves, reward it if it hit some bricks and made some points, and punish it if it lost a life. Do this thousands of times and the computer will start to play the game near perfectly.

Mathematically, there does exist a perfect way to play brick breaker. The game is deterministic - every action has a perfectly predictable consequence, so in theory you could play through every single combination (I'm assuming of course, that the paddle is operating in a discrete space). But finding that perfect game is computationally infeasible - there are simply too many combinations.

RL is a great solution to approximate the perfect outcome. In some ways, evolution is a very sophisticated RL algorithm - it cycles through generations and mutations (the moves), and picks the best ones over time as an effect of what happened (i.e who survived, who died). It is not concerned with the why, just what happened. This is the strongest form of reality and grounded truth - no hypotheses, no conjectures, no predictions. Just try random stuff, see what happens, iterate. Nature's feedback loops are long, although those time periods are a blip when you consider how old the earth is. The big point here is that in order to do RL, you need a goal, and you need to make moves that you can then validate against the goal. And as you shorten the timespan for the execution of moves, you will reach your approximately perfect answer quicker.

If you apply this to startups, give an idea a go, shorten the feedback loop, and fail fast. Every time you cycle through an idea, you get valuable learning information that makes your next set of moves better. With startups, you can have a well defined goal - value accrual. With life however, the only way to apply this is to have goals in the first place, after which you can begin to RL them.

The second idea comes from Peter Thiel. In Zero to One, he outlines the idea of definite optimism. He outlines how thinking the future will be great is not enough - you have to envision what it looks like and then build that out. To drive life instead of letting life drive you. To exercise agency and make things happen, instead of banking on the idea of fate alone. I'm not doing justice to his arguments, but I think these simple heuristics are hard to argue against.

And the third idea is my father's value proposition for his startup. He has a goal of making a certain valuation with his company - they build electric trucks. He started with the goal, then walked all the way backwards to break that goal down into the logical subgoals needed. The exercise goes as follows. Lets say you decide, I want to build a billion dollar electric trucking business. The P/E ratio that electric truck companies get is near 100. Which means if you make $10M in profit after tax, the market will value you at $1 billion. The profit margin on a truck is 10%, so you need to sell $100M worth of trucks in a year to reach this valuation. Each truck costs $250,000, which means if you sell 400 trucks, you've sold $100M worth of trucks and will reach your goal. So you need to sell 400 trucks in a year. Now that you know what you need to sell, figure out who will buy and how you will make it (these are surprisingly detailed but I won't get into it here). Before starting, he had all of this mapped out. Then suddenly it doesn't seem that impossible. Strategy precedes execution, and clarity of thought is power. And every day, that is his north star. Or atleast, thats what he says.

Note: these figures are not accurate.

I wasn't wrong in thinking the exercise was a futile one - saying I wanted to be rich, healthy, and happy is not enough. Its a good starting point, but these goals have to be reverse-engineered. The steps to reach them need to be outlined, in a way that makes it achievable. Because once you have that, you can remove all doubt and focus on exactly what needs to be done.

Kush Bhuwalka