First attempt at a computer vision convolutional neural network with custom training loop.
This is my first computer vision project, so I wanted to start with something I have good foundational knowledge for already. This lead me to trying to create a convolutional network to detect and classify champions on the League of Legends minimap. If you don't know much about league (consider yourself lucky) you don't really need to know much, just that there are ~171 "champions" or characters in the game, and during a game the 10 chosen by the players in the match are displayed on the minimap at all times, assuming they aren't waiting to respawn.
Choosing a task like this I thought made the project a bit easier, as I'm able to leverage the constraints of the situation and turn them into advantages. For example, champion icons on the minimap are always the same size. This means our network wont need to adjust the size of its predicted bounding boxes. This lead me to selecting an anchor based approach, where I divide the minimap into a 29x29 feature map and draw an anchor of a single size in each of the cells in the feature map. I then calculate the intersection over union (IOU, how much they overlap) for each actual champion on the minimap and the anchors and assign each champion to the anchor it has the highest IOU with. In traditional anchor-based networks for object detection you usually want to draw multiple anchor boxes to account for different sized objects, but since we know the size of our champion icons this isn't needed. These minimap icons are also (almost) always the same visually, which eliminates a lot of complexity that the network will need to contend with when classifying objects compared to traditional object detection tasks.
Another advantage to this environment is that we almost always know what the minimap will look like. There will be some differences across the duration of an actual game, such as minions moving down the lanes, towers being destroyed, and other map objectives spawning/being taken, but these are small changes compared to the 10 champion icons drifting across the map with each players movements. I realized this constraint could be utilized, and I opted to generate my own dataset for the model, as opposed to scraping a Riot Games professional game API during live matches, recording actual minimaps in game or in videos/streams, etc. To accomplish this I took a screenshot in-game of the actual minimap, both at 1440p and 1080p, to use for the background. I then downloaded all champion icons from Riot.
I realized in doing this that some champions can have multiple icons (Kayn, Kayle), but there aren't many like this, so I decided to treat each icon as its own class (E.g Kayn1/Kayn2), as opposed to lumping them into one class. To create the dataset I overlaid 10 icons onto the minimap at random positions, with some buffer near the edges. This process can certainly be improved (and I most likely will attempt to), such as adding random pings, minions, and other objectives that account for more of the actual complexity of the minimap. At present I think it's a bit rudimentary but gets the job done. Considering champion icons are often stacked/partially overlaid on top of one another when players are close together in-game, and the current generator accomplishes this fairly frequently at random, I believe it should be sufficient. Since the minimap images are created programmatically, so too can the annotations/labels for each image, drastically reducing the amount of time that would be needed to create such a dataset otherwise. I've included the 200 training/validation images and annotations I've been using for testing.
The network currently uses a custom loss function, comprised of the categorical cross entropy lost between the predicted champion class and the actual, as well as smooth L1 loss of the actual champion icon bounding box coordinates and the anchor's bounding box coordinates it has the highest IOU with, a "foreground anchor." It also has a pretty rudimentary custom training loop currently, something I'd like to improve upon. I've been using Ultralytics YOLO models to benchmark my models performance against, which is also why I opted to create the dataset in a format YOLO likes of inidividual annotation files for each image.
This project is far from complete and I'm still actively working on it, so any suggestions/critiques are more than welcome.