Infobip Engineering Handbook
Start HereJoin Infobip EngineeringBack to Infobip.com
  • Start Here
    • Infobip At A Glance
    • What We Believe
    • Infobip Engineering Timeline
  • Become A Better Engineer
    • Are You Bored At Work?
    • Steep Learning Curve
    • Freedom To Choose Your (Engineering) Hammer
  • Tech Stack & Architecture
    • The Scale of Our Systems
    • Platform Architecture
    • Observability For Quality
  • How We Code & Deploy
    • Development Flow
    • Testing (And The Freedom To Choose Your Tests)
    • Troubleshooting
    • Incident Management
    • Deployments and Disasters RPG
    • Engineering Enablers
    • A-Team
    • Collaboration Tools
  • Engineering Culture
    • Engineering Principles - In Practice
    • How Growth Impacts Infobip's Values
    • Culture of Approachability
    • Paid Interventions
    • How We Improve Our Culture
    • Employee Feedback Process
  • Key Processes
    • LeSS
    • OKRs
    • One Backlog
  • Self-Managed Teams
    • You Build It, You Own It
    • Examples of Infobip Teams
  • Community
    • Student and Youth Programs
    • Engineering Insider
    • Dev Days Conference
    • Meetups
    • Writing for Engineers
    • Publishing your ideas
    • ShiftMag
    • Hack Days
    • Startup Tribe
    • Infobip Shift Conference
  • Career Development
    • Career Development
    • Switching Positions
  • Benefits
    • Benefits Overview
    • ESOP & Bonuses
    • Engineering Education Budget
    • Learning & Knowledge Sharing
    • Attending Conferences (And Speaking At Them)
    • Good Hardware
    • Vacation & Well-being
  • Hiring & Onboarding
    • Hiring Process - Step by Step
    • Your Onboarding Plan
    • Engineering Onboarding Program
    • Referral Program
  • A Day In The Life - At Infobip
  • An Engineer's Log: No Such Thing as a Typical Day
  • 😊Join Infobip Engineering
  • Impressum
Powered by GitBook
On this page
  1. How We Code & Deploy

Deployments and Disasters RPG

Incidents can be stressful so one of our engineers came up with a way to gamify preparing for them in the form of a role-playing game.

PreviousIncident ManagementNextEngineering Enablers

Last updated 3 years ago

There are boring activities in life, and there are stressful ones. The truly tricky ones are those that start out boring but can turn stressful in a blink of an eye. Think of driving a car down a sparsely populated highway at night. It's boring most of the time and you might even get a little sleepy. But at first sign of trouble up ahead you need to be alerted and respond in a very short time. Operating a world-scaled high throughput software system can be like that.

At Infobip we try to make this easy with out-of-the-box metrics, centralized monitoring, and logging, etc. that are provided for you by teams from the Quality requirement area. But there's another side to this preparation.

Even with the best-performing service, you yourself can still get caught off guard by an incident. Most of the time our services run smoothly and require no intervention from developers. But when they don't, for a plethora of reasons, one of them degrades or goes down completely. That's when you, as that service's owner, get called to resolve the incident, and the adrenaline kicks in.

Incidents are stressful situations:

  • You might forget what the procedure for reporting client impact is.

  • You might not know how to query Prometheus metrics.

  • You might forget who you can contact to help you reconnect the network storage.

There's room for preparation beyond making sure your service is running smoothly: you need to ensure you are too.

There are many things you can do in advance to prepare yourself for this eventuality. Just like with your car, you can make sure your service is in top condition. Luckily, you are both the driver and the mechanic in this scenario, so you can maintain your own service.

We decided to change the boring part of incident training and make something meh a bit more fun. We looked at what is fun (games, duh!) and decided to apply it to what is important to us (incident management). And thus was born, designed by our very own Josip Antoliš.

Tabletop role-playing games, such as , Pathfinder or GURPS (if you want to get real nerdy) are based on group storytelling. All players sit around the table, throw dice and roleplay their assigned characters. Like in video game RPGs, just with more freedom of choice and less computers.

In , we use this same concept to train for resolving incidents.

  • Each player gets to role-play as an incident responder, though not all characters in the game are developers;

  • You can be an analyst, a database admin or a panicking product owner. The goal of each session is to resolve a single incident;

  • You get to puzzle out what went wrong and game rules gently introduce you to our real world tools and incident management practices.

  • We have dice to determine the success or (critical?!) failure of your actions. And, just like in real world, you are not alone: the rest of the players are here and you work together to beat the game.

It's not all (just) fun and games either.

The goal is to get better at handling incidents. The game will help you to get to know what tools we use for monitoring and alerting. You'll get to know the bigger picture and see what other roles are involved with incident management; who talks to clients, how does management view all this, etc. And you'll get some directly applicable advice for managing your real world services.

If you figure out what metrics you need from an in-game queuing system (arrival rate, wait time and service time, right?) you can go on and expose them from your production queue and be set up for better monitoring in the future.

This is just one small part of how we achieve high availability of our system at Infobip. But it's one of the more fun ones. It's also something in your own team. Or join us at . Besides Deployments and Disasters, we have a couple of "real" D&D campaigns going on. If dispatching Orcs with a cursed sword is your idea of a good time do let me know (I'm a 6th level Dwarf barbarian).

Deployments and Disasters tabletop RPG
Dungeons & Dragons
Deployments and Disasters
we encourage you to try
Infobip Engineering