![](https://crypto4nerd.com/wp-content/uploads/2023/07/092Ym20WVB0gG2kbo.jpeg)
Picture this, you were 10 years old and your family decided to take you to this amazing restaurant . You were skeptical on how the dish was going to taste , but you keep all your premonitions aside and trust your father against your best judgement. He keenly looks over the menu , peeking at you every now and then with a quizzical look (almost as if he is trying to figure out what ) and finally orders Roasted chicken .The dish arrives and you dig in to your first bite …. and then voila , you realize that his selection was actually quite good!
Score to dad for knowing your taste
The next day , you tell your friend about this . You tell her “I had this roasted chicken at this amazing restaurant that my dad took me to” .And your friend , she is listening intently to what you are telling her . She starts making a mental note inside her head , kind of jotting down notes on her imaginary RAM .After taking a few moments, she drags you downtown and takes you to another restaurant . She then takes a peek at the menu and orders roasted turkey hoping that she has just the perfect cuisine for you . You dig into that first bite ….and voila! It was good as well .
Looks like your friend knows you well!
Now for some odd reason , your friend is on a roll and she decided to try her luck once again . But this time she has decided to switch the genre . She takes you to her favorite seafood restaurant in hopes that she can sweep you off your feet with yet another brilliant selection .You enter the restaurant and you immediately feel your stomach rumble , You could feel it in your bones that this one is going to make the mark , it’s going to hit the spot and your friend is going to get commended . The food is ordered and delivered to your table, your mouth is all watery and you can’t wait to have your first bite .Your friend being the religious person that she is , started praying . You didn’t close your eyes (Eyes on the prize) . As soon as she said amen , you dig into your first bite .. AAAND it was terrible.
Unfortunate :/
You’re probably wondering “Why did I waste 5 minutes of my precious time reading this absurd story that has absolutely nothing to do with the title.” Hold on to your horses , this has a context ,I’ll be getting to that shortly
Take a clear look at the image .If you have skipped the story because you thought it was boring , then try reading it again and come back here.
Good? Alright. Let’s explore Monte Carlo Tree Search algorithm now
When your dad took you to your first restaurant he began the first selection . He wasn’t sure if it was going to work , he had some preemptive knowledge of what you might like and dislike , and went ahead and ordered roasted chicken for you . This worked out brilliantly in your case! .Maybe he noticed how you always loved it when your mom cooked chicken for you .The selection was done based on the parameters that he had over you .
Exploitation in reinforcement learning involves making decisions based on known information or past experiences to maximize the immediate rewards.
In the first restaurant, your dad observed how much you enjoyed chicken when your mom cooked them for you. He likely inferred that Roasted Chicken would be a safe and satisfying choice for you based on your past reactions to chicken meals. This decision was made by exploiting the knowledge he had about your taste preferences.
Now coming to your friend, she had some information on you , but she wasn’t with you your entire childhood. She had some idea of what you might have liked and what you might have disliked .And based on the information that you gave her , she decided to something different . A slight variation from your first dish roasted chicken , or rather its cousin : Roasted turkey .
Exploration involves taking actions that are less predictable or not based solely on past knowledge. While your friend had some idea of your likes and dislikes from the information you gave her, she decided to try something different, moving away from the known choice of Roasted Chicken.
In the third try , when she tried to get you to try seafood , you had an instinct that told you that you were going to like it (Based on just the smell ) . How does this relate to MCST? In MCST reinforcement learning, an agent (in this case, you) faces the challenge of balancing between exploitation, which involves choosing actions that are known to have high rewards based on previous experiences, and exploration, which involves trying out new actions to gather more information about their potential rewards.
When your cousin took you to a seafood restaurant, you had an instinct or a hunch that you were going to like it based on your previous good experiences with your father and friend’s previous restaurant choices. This corresponds to the exploitation aspect, where you decided to exploit the knowledge from your past successes and selected the seafood restaurant, expecting a good reward (delicious food) in the short run. Unfortunately this didn’t turn out good for you .
It wasn’t the outcome that you hoped for, and the dish turned out to be terrible.
This highlights the importance of exploration in reinforcement learning. Even though you had positive experiences before, it’s still essential to explore new options (new restaurants) to avoid getting stuck with a limited set of choices and risking losing in the long run (ending up with a bad experience).
Monte Carlo Tree Search Algorithm is the one of core fundementals of Reinforcement learning .A lot of algorithms work based on this concept , understanding it is pivotal and I hope this article helped you get a base idea on its inner workings . The actualy implementation is lot more rigorous , but having a basic understanding takes you a long way .
If you liked this article , then make sure to follow me! Reach out to me at https://www.linkedin.com/in/amos-eda-839870185/ . As always ,if you have any constructive criticism. I would love to hear all about it!
To infinity and beyond …