Deployments and Disasters RPG
Incidents can be stressful so one of our engineers came up with a way to gamify preparing for them in the form of a role-playing game.
There are boring activities in life, and there are stressful ones. The truly tricky ones are those that start out boring but can turn stressful in a blink of an eye. Think of driving a car down a sparsely populated highway at night. It's boring most of the time and you might even get a little sleepy. But at first sign of trouble up ahead you need to be alerted and respond in a very short time. Operating a world-scaled high throughput software system can be like that.
At Infobip we try to make this easy with out-of-the-box metrics, centralized monitoring, and logging, etc. that are provided for you by teams from the Quality requirement area. But there's another side to this preparation.
Even with the best-performing service, you yourself can still get caught off guard by an incident. Most of the time our services run smoothly and require no intervention from developers. But when they don't, for a plethora of reasons, one of them degrades or goes down completely. That's when you, as that service's owner, get called to resolve the incident, and the adrenaline kicks in.
Incidents are stressful situations:
You might forget what the procedure for reporting client impact is.
You might not know how to query Prometheus metrics.
You might forget who you can contact to help you reconnect the network storage.
There's room for preparation beyond making sure your service is running smoothly: you need to ensure you are too.
There are many things you can do in advance to prepare yourself for this eventuality. Just like with your car, you can make sure your service is in top condition. Luckily, you are both the driver and the mechanic in this scenario, so you can maintain your own service.
We decided to change the boring part of incident training and make something meh a bit more fun. We looked at what is fun (games, duh!) and decided to apply it to what is important to us (incident management). And thus Deployments and Disasters tabletop RPG was born, designed by our very own Josip Antoliš.
Tabletop role-playing games, such as Dungeons & Dragons, Pathfinder or GURPS (if you want to get real nerdy) are based on group storytelling. All players sit around the table, throw dice and roleplay their assigned characters. Like in video game RPGs, just with more freedom of choice and less computers.
In Deployments and Disasters, we use this same concept to train for resolving incidents.
Each player gets to role-play as an incident responder, though not all characters in the game are developers;
You can be an analyst, a database admin or a panicking product owner. The goal of each session is to resolve a single incident;
You get to puzzle out what went wrong and game rules gently introduce you to our real world tools and incident management practices.
We have dice to determine the success or (critical?!) failure of your actions. And, just like in real world, you are not alone: the rest of the players are here and you work together to beat the game.
It's not all (just) fun and games either.
The goal is to get better at handling incidents. The game will help you to get to know what tools we use for monitoring and alerting. You'll get to know the bigger picture and see what other roles are involved with incident management; who talks to clients, how does management view all this, etc. And you'll get some directly applicable advice for managing your real world services.
If you figure out what metrics you need from an in-game queuing system (arrival rate, wait time and service time, right?) you can go on and expose them from your production queue and be set up for better monitoring in the future.
This is just one small part of how we achieve high availability of our system at Infobip. But it's one of the more fun ones. It's also something we encourage you to try in your own team. Or join us at Infobip Engineering. Besides Deployments and Disasters, we have a couple of "real" D&D campaigns going on. If dispatching Orcs with a cursed sword is your idea of a good time do let me know (I'm a 6th level Dwarf barbarian).
Last updated