Curiosity-driven Exploration
Extrinsic rewards, low-entropy days, and a small RL experiment
Curiosity. It’s a strange word or maybe a personality trait, one that I’ve recently started to value a lot more. To me it’s the desire to learn more but it’s also sometimes seen as something peculiar or strange. I used to automatically poke at things, ask too many questions, go down rabbit holes without thinking twice. Somewhere along the way that has slowed down. I stopped trying new things. I don’t know when it started but I noticed it somewhat recently. It’s not that I’m afraid of new things. I just... don’t. I eat the same food, take the same routes, work on what’s in front of me. It works. But something feels off about it.
I relate curiosity with entropy (as would a lot of other RL folks but we’ll get to that in a bit). High entropy means unpredictability, randomness, not knowing what comes next. Low entropy means patterns, routines, the same Tuesday as last Tuesday. I’ve been living low entropy for a while now. But it’s sort of a chicken and egg problem. I don’t know if I stopped being curious because my life got predictable, or if I got predictable because of the lack of curiosity. I’m not saying this is bad. Low entropy works. It’s comfortable in a way and you know what you’re getting. No surprises, no disappointments. I get things done and enjoy my day but it makes me wonder what I’m really optimizing for and maybe discovery or exploration is the point of it all.
Over the last few months I’ve gotten pretty RL-pilled. Reinforcement Learning, for the uninitiated, is a way to make decisions by trying things and getting a reward if done well or a penalty when not, and thus you gradually figure out the best actions based on those signals. (For a more detailed explanation you should check out my previous blog). It’s also how most of life works. But standard RL has a problem. The whole system depends on what you/the environment count as a reward. Getting a job, that’s a reward. Finishing a project, reward. What about the things that don’t have a clear payoff? What’s the reward for wandering down a street you’ve never been on, reading a new book of a different genre, or the quiet “oh, that’s interesting” when you stumble onto a new idea at 1 a.m.? Standard RL doesn’t have a good answer for that.
People in RL call those missing pieces intrinsic rewards. The more I think about it, the more it feels like I have been over-weighting extrinsic rewards for a while. My days are full of things that “count”. Very few of them are there purely because I am curious.
For my Deep RL course, my team and I had to implement this paper called Curiosity-driven Exploration by Self-Supervised Prediction. Usually in RL, an agent only gets rewarded when it achieves some external goal. Reach the flag, get points. But this paper asks: what if the agent also got rewarded for encountering things it didn’t expect? What if surprise itself was the prize?
We used a simple environment called Frozen Lake. There is a little elf standing at one corner of a frozen grid and a gift in the opposite corner. There is an easy way to score this game. Some tiles are safe. Some are holes. If he steps into a hole he disappears into the water. If he reaches the gift he wins. This is a pure extrinsic setup.
If you watch it play out, you see a few patterns. Sometimes he gets lucky early, finds a path to the gift and then clings to it, taking the exact same route every time.
Heartwarming. But watch what happens when we run it without curiosity.
In this pure extrinsic setup, he just walks until he either falls into a hole (death) or reaches the gift (win). And there he stays, frozen in the lake. Sometimes he never finds a good path at all and just keeps walking into the lake. Sometimes he discovers a tiny safe patch and paces around there forever because it feels familiar. Either way, once something works, the behavior collapses into repeating it.
Then we turned on curiosity. We kept the gift in the same place, but now he also got a small bonus whenever he landed on a tile that surprised his internal picture of the world. At the beginning, when everything was unfamiliar, he bounced around a lot, visiting new tiles, falling into holes, getting confused. Over time he began to build a rough map of the lake. The curiosity bonus slowly faded, because fewer things were surprising, but by then he had already covered more of the space. The gift was no longer the only interesting thing. The path there was interesting too.
I’ve been this elf before. When I was younger I had this huge curiosity spike. I’d spend less time exploiting known paths, and instead explore new ones. Now most of my reward comes from “getting it right”. My intrinsic reward curve has quietly flattened.
I don’t have a neat ending for this. I haven’t suddenly become a high-entropy person. I still order the same coffee. I still take the same routes. But I’ve been thinking about it differently. The low-entropy life isn’t bad because it’s boring. It’s bad because it means I’ve stopped thinking. Maybe curiosity isn’t something you have. It’s something you do. And I’ve just... stopped doing it. I don’t know how to start again. But I think noticing is the first step. The little elf on the frozen lake didn’t know where the holes were either. He just started walking.


Great writing ! Loved it!
Love this!