A.I. Learns to Drive From Scratch in Trackmania

@bgosl 2 years ago

Great video - I think your explanations and illustrations explain some tricky concepts in a super understandable way! As to your issues at the end: I think maybe it's related to the way your rewards are structured? Looking at the illustration around 3:30, there's a massive reward associated with cutting a corner: while going along a straight bit of road, it's getting rewards like 1.4, 1.6, 1.7 - but once it cuts a corner, suddenly you get 8.7 in one step. So it makes a lot of sense that it learns to always cut corners aggressively, since that increases reward by a lot. But going quickly on the straights, which seems it doesn't like to do, doesn't in itself carry all that much more positive reward. Since you're using discounted rewards to evaluate the expected rewards of each action, you will see a slightly higher reward since you're moving further along - but relative to the rewards seen if it finds another corner to cut a little more, it's quite small. So it might just be favoring minor improvements to a corner-cut over basically anything else, including just pushing the forward button on a straight. I think maybe restructuring your rewards could help. An obvious improvement would be to give rewards not relative to the midline of each block, but place rewards along the optimal racing line - but at that point, are you even learning anything? You're just saying "you will get an increased reward if you follow my predetermined path", which to me isn't really learning. I think an intermediate step would be to place rewards for each 90 degree corner at the inside corner of that block (maybe a small margin from the actual edge): that should reduce the extreme impact of cutting corners vs going fast on straights, but you're still quite far from just indirectly providing the solution. Also; unless just didn't say, I don't think you have a negative reward at each timestep? That's typical for a "win but as fast as possible" scenario, which is the case here. It would make sense, as well: going in the right direction but super slowly, is kind of like going backwards, so should also be penalized. I think that would even eliminate the need for negative rewards if going backwards: by proxy, going backwards will always lead to taking more time, which leads to more negative rewards. You might even have to remove the negative rewards from going backwards, as going backwards and going slowly might see the same net reward, which would leave the agent puzzled/indifferent between the two. In the end, getting to the finish with less time spent will lead to the maximum reward. Finally, of course: introducing the brake button would give you possible improved times - and even might let the agent learn some cool Trackmania tricks like drifting (tapping brake while steering) to go around corners faster. It does increase the action space though, which of course means longer training time. But something to consider, if you want to iterate on this! Regards, I went to YouTube to procrastinate from his reinforcement learning course, and ended up using some of that knowledge anyway. I guess the algorithm now knows my interests a little _too well_. PS: really well done on introducing exploring starts! When you got to that part of the video, I almost yelled "exploring starts!" at the screen, and then that's exactly what you decided to do. I'm curious if that was from knowing that exploring starts are a thing in RL, or if you just came up with that concept from thinking about it?

@noahhastings6145 2 years ago

Turning AI: "I got this." Straights AI: "🤷‍♂️ Guess I'll die"

@LeoJay 2 years ago

The AI getting scared and slowing down is kinda adorable lol

@SelevanRsC 1 year ago

I love how at 7:01 the one car made such a well run that it was shocked in the end how good it was, and got totally confused, lol

@sygneg7348 1 year ago

Never have I felt so much emotion for a programmed robot, but here we are.

@p3mikka709 2 years ago

"but then, the AI got this run" music starts playing

@eL3ctric 2 years ago

Oh god yes finally someone that tackles the "my ai just learns the track layout" by adjusting the layout/starting position. Nice!

@TheStormyClouds 1 year ago

I'm so happy that you did the randomized spawn points and speeds. I was worrying that you might simply be teaching the AI how to play a single map by it learning just pure inputs rather than seeing the actual turns and figuring out what to do. I was incredibly impressed with how many made it through the map with all sorts of jumps and terrain types.

@Aoi_Haru763 2 years ago

"At one point, it even stops, as of it's afraid to continue. After a long minute, it finally decides to continue, and dies". Story of my life. I feel a connection between me and the AI. Empathy.

@lebimas 2 years ago

The fact that the model with random starting points achieved far more in 53 hours of training than the one with only one starting point did with 100 hours shows the value in choosing random samples for iterations

@DarkValorWolf 2 years ago

"and after 53 hours of learning, the AI gets this run" nice Wirtual reference there

@funx24X7 1 year ago

There’s been instances of AI finding exploits in games that humans have not found or are incapable of performing. I would love to see a trackmania AI trained to find insane shortcuts

@SixDigitOsu 1 year ago

This shows how sticking to the same thing doesn't make you improve, you just memorize it. But trying different things make you improve.

@marijnregterschot7009 2 years ago

I think trackmania is a great game to practice machine learning. It has very basic inputs and the game is 100% deterministic. Most importantly it's just satisfying to see.

@BryceNewbury 2 years ago

I really enjoyed the explanations of the different training methods paired with the excellent visuals. Keep up the good work, and I can’t wait to see what you try next!

@Linguinesticks 1 year ago

This made me feel better about the machine learning course I dropped out of a year ago. While I don't think I'll ever understand the actual construction or inner workings of machine learning models, it was nice to notice the overfitting problem before the script mentioned it. That's always a pet peeve in machine learning videos, like there's one where someone plays through a game with an ML model, but retrains from the start at each new level because the neural network won't generalize.

@hyper-focus1693 1 year ago

Honestly this recaps humanity, learning, logic, trial & error, problem solving, anticipation, texting, deduction, and so much more. I loved it. I learned so many things that are way beyond the scope of the video. Keep it up. 💪

@the_break1 2 years ago

I really wonder how fast would this AI pass A01 and it's reaction would be on final jump. Really cool stuff!

@jeromelageyre5287 2 years ago

What a fun way to learn about machine learning and its variants! Very good video and montages ! Very clear and accessible English ! The return of yoshtm is more than a pleasure!

@karmavil4034 2 years ago

What a lovely story 😍. I'm not just jealous about what you have accomplished but also how you did it. Starting from the simple idea, the goal, the experimentation, evaluations and improvements, and an outstanding audio-visual documentation. The is pure gold! Thank you for sharing this topic and the inspiration