Street Fighter II Playing Neural Network
If it wasn’t for my hacking homie, I wouldn’t have turned this into a blog post. I honestly thought it was just one of those little projects my github is filled with that I would put together and then shelve, never really thinking about again. I suppose this is cool and does deserve a revisit.
Never in my life have I ever finished this bloody game. I have played it on every console known to man and am STILL shit at it. I know the issue isn’t me; I kick ass at tekken. I cleaned house at an arcade in Tokyo playing randoms on t7. King is OP if you know how to use him, and a childhood of playing t3 means I in fact do know how to. Case in point; arcade mode in under two and a half minutes.
Deciding I needed a project to keep the bad feefees away whilst everything was falling apart around me (at the same time the part of the country I was located in was in lockdown as it was getting fisted by corona BUT ALSO at risk of burning down from bushfires), I decided to try my hand at this.
I got inspiration from a half completed project someone else was selling that they demonstrated being adapted for street fighter 5. I wanted to have this in pure honesty to watch the ai play; it is fascinating to see non human entities play games and play them well. I also enjoy when games are completed via scripting; I have a collection of one shot scripts to pwn certain boot2roots lying around. Perhaps that can be another topic for another day. But this is literally what I looked like when I stumbled the mar.IO project, my first introduction to game playing AI.
The entire project took me a few hours to work out all the kinks and fiddle with cheat engine. This program is most def NOT required; in fact the gym UI has tools to do memory inspection, I just chose CE because I was using it to put cheats into shogun 2, and I just got used to it.
There was quite a bit of missing info I had to collate to get this working. Much like the Diablo on Linux post, this looks simple as and shouldn’t have taken any time at all. The reality was it took most of a day.
Lets have a quick discussion about gym and why this project is inherently different to the JRPG music maker.
Gym works using reinforcement learning. You define things it should reward, and things it’s punished for. In sf, what should we be using? I decided to use the score; each hit against the enemy makes it go up, and in the interest in seeing more and more complex moves and combos, I set the score increasing being the reward trigger rather than the enemies health going down. To me this seems like a catch all; don’t die (cause you lose your score when you die) and do gooder combos.
I may tweak this a little in the future to simply have enemy health decrease as a reward. But then again, that is the methodology I as a human was utilising to try clock the fucker and it didn’t work. Perhaps score IS the better reward!
Backpropagation is a component in reinforcement learning; at each step (function call) the results of the reinforcement (rewards vs punishments) are looked at for how close they are to your goal (not hard defined, based on your variables reward/punishment) and the actions taken to get these results (the algorithm applied to the inputs [in this instance snes controls]) are adjusted to get you closer to what you want.
Whereas an rnn is used to predict the next sequence in a chain by being fed lots of data that is correct and correcting its guesses to align with the data, RL doesn’t need to be fed the data. All we are going to do is feed it the variables we care about, and let it loose to work out how to achieve its end goal (a good score and not getting a game over). Given this lack of information it may do things in a strange way [and it does, I have seen some hilarious matches], which we do not want for things like music, where strange mostly == bad.
Lets get started.
As we have discovered in the JRPG MIDI NN post, not all pythons are equal. You require a 64bit version of 3.7. Somehow, despite this project taking place before that one, the version caveat didnt catch me offguard. Perhaps I wasn’t as militant about keeping my shit up to date at that point, which ironically worked to my advantage. I don’t remember. A lot of beer has been consumed since this project.
pip3 install gym-retro
Get your own rom. I don’t care about anti piracy or any nonsense like that; simply put it was a goddamn mission for me to find the right type of rom, and now you get to enjoy the pleasure of it too. It needs to be an SFC.
Put it into a new folder in the following directory python_install_location\Lib\site-packages\retro\data\stable; the entire folder will be used as an argument when kicking things off so give it a useful name rather than just smashing the keyboard with your face like I would do to name all my work while at uni and would consequently lose track of everything. Copy in the scenario files from another of the game folders, or use the ones from my project. Rename the rom to rom.sfc
Get the sha1 of the rom, and paste it into a file called rom.sha I dropped my rom into a linux vm and simply ran sha1sum romname
Haha, you thought I was going to give you the rom name so you have a lead on where to find it didn’t you? It’s not hard bruss.
Pop open a shell in the folder you have put the rom, and run python -m retro.import
Have a guess what this does.
The install dir for gym is two folders back from where you put the rom; /retro and put the ui exe there. DL from here, because the link in the official docs isn’t working. I normally would send you off on a chase to retrieve it yourself, but I actually can’t find it anywhere!
Modify the agent script (\retro\examples\random_agent.py) so it doesn’t ask for input on each revert. This is EZPZ to do and if you don’t it becomes a huge pain in the ass. Comment out the input on line 51; input("press enter to continue")
or remove it, whatever you prefer. I also saved this as random_agent_modified because I wasn’t sure if it was going to work, but it did and I couldn’t be bothered cleaning up. With this change you can leave the script alone and it’ll keep running until it finishes the game. I picked this particular agent type for reasons I have long since forgotten, but rest assured it works, and that’s good enough for me.
Change the metadata.json in the folder you created to change the difficulty save state being run. I have a state for each difficulty in there already. If you would like to make your own, load up the integration program, feed it your rom, and when you get to the EXACT point you want your game to begin every time it restarts, ctrl+s to save a state. Incase you don’t like ryu or something and dont think my states are good enough. Whatever you name it, put it into the json file
Data.json contains all the different data points we may want to measure. This is where CE comes into play. When the ui is loaded up with your rom, it’ll actually play it. I am not sure which emulator is built into it, but it has one. Hook CE into the ui, and from there we can use it to find the data points we care about. As said above, you don’t have to use ce, I just found it easier than the gym application because I have used ce extensively for other things that may be documented in another post at some point.
This is where you get to be creative; what do you want the ai to care about? What pieces of data are you going to feed to a blind deaf and dumb program to understand in a loose way how to play sf? Break it down into its basic blocks; what do you prioritize when YOU are playing the game?
The addresses are the memory addresses converted from hex to decimal, with the rom base of 8257536 added to it. for example the memory address for player health is 000636, into decimal this is 1590, add 825736 and you get 8259126. this site will help your conversions. Use this reference to work out the “type”. It took a bit of tinkering to get the types right, but that reference has all the info you need to sort it.
I could go into detail on how to find addresses, but I won’t. I simply don’t want to, for many reasons including the fact this book will teach you better than I could. I suggest buying it, because no starch books got me to where I am today. It is no exaggeration that I owe my current lot in life to watching mr robot while at a desktop support job I hated very much and buying a hacking no starch book to elevate my position in life away from the plebs I worked with.
It is worth noting that I am NOT being paid to say nice things about them, but if they wanna chuck me some dollarydoos (or free books hint hint) I’ll gladly spruke for them.
I will give you some memory locations, but it is quite probable we won’t end up with the same base rom and they will probably be useless.
- 000636 for player health
- 000836 for enemy health
- 0005d0 for player matches won 0005d0 is base, 0005d1 for 1, 0005d2 for 2 wins
- 0006C2 score
- continue timer 0005c6 goes from 9 backward
Remember to turn them into decimals and add the rom base.
scenario.json is the rules of the program. I have only a single rule set; reward it when the score goes up. I both want to see this perform crazy moves (hence the emphasis on score than hp) and win, but I thought having it only focus on hp would make it play how I do, and I can’t win. I will spend some time tuning a new rule set to make it even quicker I think, just because. But having it focus just on the score is good enough to make it win, and make it win a lot.
python -m retro.examples.random_agent_modified --game NameOfFolderYouPutEverythingInto
It took approximately 8 minutes to finish on the easiest setting on the teeny tiny laptop I was using whilst away from home on the F-35 project. But that’s a long story filled with sighs as Scott Pilgrim would say.
I won’t lie, the program taking 8 minutes to clear the game left me a little unfulfilled. It seemed too quick for the amount of time I put into getting the env set up. That was on the easiest difficulty, but it is notable that it did in 8 minutes what i have not been able to in dozens of hours across nearly 30 years. That was a disturbing sentence to write and I received no pleasure from it.
Took about an hour on 2 star. I lost interest after this because I had accomplished my aim; I had, in a roundabout way, beaten street fighter.