Machine Learning & Neural Networks are powerful tools that can solve many types of complicated problems. Through proper training and fine-tuning, these systems have the ability to perform certain tasks with incredible consistency and accuracy. In the previous article I wrote, we went over an example where we trained a neural network on some building data so that it could learn to classify different types of buildings. In other words, if you gave the neural network some information about a random building it would be able to predict if the building was an apartment, or house, or anything else. This example is known as a classification problem and requires what is known as “Supervised Learning” for it be solved. All supervised learning means is that us as humans provide the neural network with relevant training data. By relevant I mean a lot of good and clear examples. You can imagine if we provided a neural network examples with missing data or random information, it would not be able to properly discover patterns in the dataset and therefore would not learn anything useful. Due to the significance of the training data, the neural networks will not only learn the patterns in the dataset but also the biases associated with them. I’ve mentioned what a bias is before in “Fundamentals of Machine Learning & AI” but as a refresher a bias is just the y-intercept in a model. Biases in machine learning are exactly the same as in critical thinking; that is they are generalized assumptions made through learning or processing data. This is usually not an issue for most problems like classification problems and are actually important for making accurate models but when the problems become more complicated and intricate, for instance chat bots that are trained on user inputs, biases start to stand out and reveal issues with the training data. You might have heard stories of chat bots that would use racial or derogatory slurs which is a direct result of biases learned during their training. This is why you need to supervise the learning of neural networks. Another approach to training neural networks is to just toss out the initial training data entirely to avoid inherent biases and instead let the neural networks learn from the ground up; Unsupervised Learning.
For classification problems, unsupervised learning wouldn’t be a very good idea as setting up a neural network to learn how to identify building types with no examples to work off would be silly. The neural network would have to create random datasets of buildings and then train itself on them which would have no reflection of the real world. Unsupervised learning is usually better suited for optimization problems where the neural network is provided an environment where it is allowed to make different decisions which result in some outcome. In other words a game. To keep things simple let’s use the game of Tic Tac Toe as an example. Tic Tac Toe is by no means a complicated game and training a neural network or AI to learn how to play it is kind of redundant and inefficient but you can use the exact same process for other games that are orders of magnitude more complicated like Chess or Go. This is exactly what Google’s DeepMind did when they created AlphaZero, which to this day is most likely still the best Chess and Go AI agent.
To start we will begin in the same fashion we did with a classification problem. We will need to build and design a simple neural network architecture with a couple of layers of nodes with activation functions. Instead of the first layer of nodes representing each different type of information in the buildings dataset, the first layer of nodes will now be represented by the decisions that can be made during the game of Tic Tac Toe. In order to determine what type of decisions can be made during the game we will have to define the rules of the game for the neural network. For Tic Tac Toe these rules would include the spaces on the board which would be a 3×3 grid; one player starts and places an O on the board in one of the spaces; then the next player goes and places an X an unoccupied space; the game continues like this until all the spaces on the board are filled which results in a tie or until a player gets 3 X’s or O’s in a row, column, or diagonal. These rules can be coded relatively easily and provided to the neural network which it will use as it’s environment. Based on the rules it might seem that at the beginning of the game there are 9 possible moves which is true; so therefore we need 9 nodes in our initial neural network architecture. In actuality 9 is too many since one node is enough to decide the first move on it’s own. For this problem 4 nodes is more than enough for our initial layer. That’s because 1 node can handle picking the moves, another can handle analyzing the state of the board (knowing which moves are allowed and whether the game is still ongoing or over) and the other 2 can handle understanding the opponents moves and implement any strategies it has learned based on the state of the board. In reality it’s not actually this simple where each node is responsible for just one thing and is instead much more a black box where we don’t fully understand which node does what. Still, compartmentalizing the processes makes it easier to understand why we need a certain size of neural network. Now that we have defined the rules of the game and designed the neural network architecture we can move on to how we will train our AI agent.
In unsupervised learning there are various methods and techniques that are used to train neural networks. For our example we will use one of the simplest but most important methods: Reinforcement Learning. Just like when you were a kid and you did something bad and were punished by your parents or did something good and were rewarded, we will implement the same strategy with our neural network. We can define a simple reward system by giving +1 when the AI wins a game, -1 when the AI loses a game, and finally -0.1 if the game is ongoing. During training the neural network will continuously try and maximize it’s accumulated score. The reason we add a penalty if the game is ongoing is so that the AI continuously makes moves instead of stagnating and wasting unnecessary time. You can think of this as avoiding procrastination. Although it might seem like we have all the ingredients for the neural network to start training, there is one finally step that we need to include; Q-learning. In order to get the neural network to make decisions and to keep track of all those decisions we need to set up what is known as a Q-table. You can think of a Q-table as an Excel spreadsheet that records all the moves that are made during a game and then associates a value or weight to each move. When we finally train the neural network, the AI will play games against itself and update the Q-table after every game. This way it can keep track of which moves resulted in wins and which resulted in losses. Since the neural network gets rewarded for winning, it will learn which moves are the best in certain situations. When it first starts playing it will have no idea which move is best and will instead make random decisions. This is known as exploring the solution space and is an important component of Q-learning. Just like a child learning something new, making mistakes and trying new things is crucial in reinforcement learning. During in the early parts of training this is exactly how the neural network learns which moves lead to wins and which moves lead to losses. Of course, once your AI trains long enough you no longer want it to make random decisions and instead rely on what it’s learned to be the best moves. To do this you can use an epsilon greedy strategy which simply adjusts the ratio of random moves to learned moves. At first the ratio can be something like 90% exploration and then over time can be adjusted to 10% or less. For Tic Tac Toe, the training won’t take very long and by the end the neural network will have perfected the game so having 0% exploration by the end of training would be much better. This is only because Tic Tac Toe is a solvable game. For more complicated games it might actually be useful to introduce some amount of randomness or chaos into the game to create a sense of unpredictability.
We are finally ready for training. As a review of the steps we went through, let’s take a look at what’s happening under the hood of the neural network during the learning process. We start off with the game, which in this case is Tic Tac Toe, and all the rules that come with it. The neural network that’s connected to a Q-table then starts playing games against itself, taking notes of all the moves it makes both as itself and as it’s opponent. In the early stages the AI agent will randomly choose moves; it will explore it’s environment. After every game we award or penalize the neural network based on the outcome of the game. By doing this the Q-table will also update itself applying weights to all the moves made during the game. This process will then be repeated for every new game. As the AI agent plays more games it will slowly start to rely more on the moves it deems valuable to winning instead of just randomly picking moves. After playing thousands of games, we will get an AI agent that will have maximized the rewards it receives and therefore an AI that will have learned to play Tic Tac Toe perfectly from just the rules of the game.
This has been a very simplified overview of how Reinforcement Learning and AI agents work. There are lots of nuances that I have avoided which would have overcomplicated the topic. For instance Q-learning is just one type of method and there are many others with different pros and cons. Also as mentioned before, the neural network is very much a black box so understanding how the nodes interact with each other and their individual roles in decision making is a field of research in it’s own respect. Generally speaking though this is how setting up and training these kinds of AI agents works. If you have found this interesting, I suggest doing further research into these topics for a more in depth understanding. In future articles, I will go over how Deep Learning and Natural Language Processing works which is the core of Large Language Models like ChatGPT.