A Day In The Life - At Infobip

When somebody asks me where I work, and I answer Infobip, and they proceed to dig a bit deeper on the topic, it is usually very hard for me to explain what it really looks like - that is, if I want to go beyond “I do programming” and “We send SMS”. But what do you actually do? Well, I develop new services and make sure the whole system stays up and running. But what does your day look like?

Ok, first, surprise-surprise, there actually isn’t any typical day in Infobip. But I guess that’s the case in any fast-moving tech company. If you don’t like challenges and dynamic atmosphere, it’s probably not a thing for you. But I almost died of boredom in a state firm I was working on before I joined the company.

So here I’ll just go with how yesterday went for me – that should let you catch a glimpse of the atmosphere of a day in the life of one developer in Infobip. I guess I should put some disclaimer here? Nah.

Wakey-wakey

[8:27AM] I finally drag myself out of bed after the 3rd snooze – yeah, not really an early bird, and binge-watching 4 episodes of Queen’s Gambit until 2 AM yesterday did not help. Still, in my pajamas, I turn on my laptop and proceed into the kitchen to brew a coffee, a Turkish one, the only one that can get me going. While I wait for the water to boil, I check my calendar, email and Slack on my phone – I scroll through emails, and nothing seems to be urgent. Two invites for the meetings later this week, one for sync with my manager and one for a project takeover – I click accept – some general company news, internal hiring offer, course from Security (who doesn’t love those), two pull requests waiting for approval... I’ll deal with those later. On Slack, I skim briefly through a few unread chats but no urgent channels, so I decide to read them later in more comfort on my laptop.

[8:51AM] The coffee is done, I’m out of my nightly pajamas and into my daily one, and as I know I have to force myself to eat a bit in the morning I just pick a banana on my way back to my office. Ok, it’s really a laundry room, but I see the ironing board somewhere in the background in at least half of video calls with my peers. I sit in front of the laptop and check the time – 5 minutes remaining until my team’s daily.

I fire my tools up – IDEA for Java development, Teams, and Slack and open a series of bookmarks for various web-based tools to manage the platform. I open Jira to check on my tasks, one pull request I’ve sent late yesterday still waiting for the team’s review so look at two additional tasks in our sprint I was planning to do next. I check my personal to-do that I just keep in a plain old text file.

Daily (virtual) coffee

[9:00AM] I miss real people, everyone seems too virtual these days, but we in the team try to keep some level of interaction and fun (ok, after more than a year working from home, I've lowered my expectations on what to consider fun) so our daily call starts informally with all cameras turned on. There are 7 of us in my team, with various seniority, experience, and vastly different mindsets. I see literally everyone holding a cup in their hands. We are very open in our discussions and sometimes it takes much energy to focus us and align, but I like to think it keeps me sane, especially in these strange days where you’re left to yourself a bit too much. Much of the warmup goes on mocking a teammate who bought an electric bike, which turns info a fierce discussion on going all-electric.

[9:18AM] We finally start with the formal part of the meeting, which boils down to “who will screen share today” and looking in the team’s Jira Sprint board. We go in one by one quickly commenting on what we were working on and what is our plan for today. We managed to get the discipline to really keep it short, so once done, we go with one more quick round of “who needs any help”. Two people feel they could use help in their tasks, so we agree on two additional calls to discuss those items in more detail – but each call with 3 people. I should join one of those at 11 AM. I accept the invitation and walk to the kitchen for more coffee. And eat a cookie, but I won't admit it.

Work locally, test globally

[9:35AM] Aaaand, off to work. I open the code editor, put my headphones on, turn some upbeat music and proceed to write. The task is about extending an API endpoint in one of the microservices my team handles. My team works mostly on API-related stuff and the backend micro-service that I'm working on generates some client traffic reports that are later exposed on Infobip’s portal. I need to add some additional filtering capabilities. The code modification is actually conceptually simple, but I need to have in mind the wider environment where it will fit - so I fire up a few docker containers on my local, needed as the dependencies and consumers of the service I’m working on, and set up connections to integration environment’s Elasticsearch cluster to get the test data.

I run the code, find some problems, go back and fix, run again, find a new set of problems, the usual cycle. Time goes quickly and my music mix ends, which is how I know I've been immersed in the code for the last 45 minutes. I take a break.

[10:30AM] I'm back on my laptop, biting another cookie that I did not eat today. Few crumbs fall between the keys and I blow the little traitors away. The code still doesn't run as I would like to, but it seems I'm close. I dive back in. 15 minutes later it seems to do what I wanted.

A notification for the meeting with my teammates in 15 minutes pops up in the corner of my screen, I dismiss it and keep focusing on the code editor. Tests, tests, tests. I know it's important, and my pull request is going to get rejected if I don't cover it properly. I think and add a few more unit tests thinking about new corner cases I have introduced with my change and expanding the smoke test that was already present in the code. Two tests don’t seem to pass, but a notification for the start of the call interrupts me.

With a little help from my friends

[11:00AM] I connect to the call and the three of us proceed to dive in into the discussion about my colleague’s task. She is working on this service for the first time and needs a bit of guidance. She has also been working mostly on frontend stuff since she joined the team a few months back, and this is a purely backend task. In my team we try to make sure that all people work on all components my team handles - because of redundancy, or as we call it - to be a highly available team.

Redundancy and high availability are words you hear in Infobip a lot, as the platform needs to be up 24/7, and even a few seconds of interruption of traffic is something clients may notice and can have problems with. Actually, a large part of the work we do is here just to make sure all fail-safe mechanisms are in place. Often the feature itself is not so complicated to implement, but ensuring stability and reliability are more tricky parts.

We continue with brainstorming and suggestions for the next 20 minutes or so until our colleague feels safe to proceed with the task. For me, this is probably the best part of working in Infobip, being able to ask anyone for help. It initially took me the first few months of working here to get used to the fact that I can reach 900 engineers at my fingertips, and that everyone I’ve encountered so far was actually truly helpful. A huge change compared to the last two jobs where I was basically the most knowledgeable person in my field and had no one to consult.

You build it, you own it

[11:30AM] Back to my task. Not very surprisingly, after a break, my code magically works after a single tweak and all tests pass successfully. I seem to be done and proceed to deploy the service to our integration environment, lovingly called IO. I push the code to our git repo, check the automatically triggered build on Jenkins, watch the unit tests pass again, everything looks perfect. Maybe even too perfect? Longer streaks of straightforward work make me suspicious as my experience is that the progress in development moves in a very non-linear way. Gods of IT will surely punish me for this.

I switch to tab with our Deployment Manager, our central tool to manage the whole Infobip platform. I select one of test instances, pick my new version, hit deploy, and watch the log scroll across the screen as I sip the rest of my coffee.

Service is up, and smoke tests seem to pass as well. Great. I check Grafana dashboard for the service and everything looks as expected. I trigger a few requests and inspect the result – still all fine. I’ll let it brew on IO for a while during my lunch break.

[12:15PM] Lunch break. I will not describe what I have for lunch, I’ll just say it’s very healthy. Probably just veggies and something-something integral, I’ll let your imagination fly. I toss it in the microwave and look through the window. It’s very peaceful outside, sunny and clear. What a great day to stare at the screen – but that’s what I’ve chosen to do for a living. For a while, I envy the city gardeners that seem to be enjoying the sun while planting flowers in a park across my building. Microwave pings and brings me to reality – I know the grass looks greener looking from this side of the window, it’s probably super-hot out there and I would hate it after the first ten minutes.

While eating, I actively suppress myself to surf the web on my phone and focus on the food as I should. Almost perfectly synchronized with my last bite - my phone lights up and squeaks with a special tone I’ve set up for Infobip service alerts. I glance at the phone, severity=critical. Damn it.

Expect the unexpected

[12:37PM] I wipe my mouth clean and I’m back to my laptop - luckily for me I am not my team's troubleshooter this week. Still, the whole team gets the alerts through OpsGenie on our team’s mail and there’s already a thread on our internal Slack channel - “looking into this” writes the troubleshooter and few links and screenshots of dashboards appear in the chat. In a minute I see the call in the channel, and I decide to join as this is one of the micro-services I was working on lately.

As we join the call, the troubleshooter already has a few dashboards up. Instances in our India data center seem to be maxed out, and requests are queueing faster than they can be processed. The traffic from client-side seems still be unaffected except for some higher than usual latencies, but still within the limits of our SLOs. Everyone on the call is trying to figure out what is actually happening and very soon we agree it’s just a traffic spike beyond our expectations. Seems a few of our bigger clients started sending huge campaigns simultaneously and other micro-services are firing a lot of requests on our service. Java heaps are on max and CPU usage is very high, probably due to the garbage collector running often and slowing down the processing. We need to increase our capacity fast.

Another alert comes to my phone, not good, we have started returning some HTTP 500s to clients. We’ll have to report this so a new thread is created on our global incident management channel. The usual “how many clients are affected” message appears, but luckily, it’s just a few clients so far with a small percentage of the traffic, and I see my teammates already increasing RAM on Deployment Manager, tweaking JVM options as agreed, and restarting the instances one by one.

Phew

[1:05PM] I watch the graphs finally turning their slope downwards as our services are restarted. Total affected clients: 17, with up to 4% of their traffic affected during 13 minutes. Not great, not terrible. We have also decided to spin an extra 2 instances just in case. I follow the thread closure on the incident management channel.

[1:15PM] We agree and schedule a meeting later in a week to discuss this in more detail to prevent it from happening again, but for now, this episode seems to be over. Metrics are back down to normal. We will now have some extra work – an incident report to write, with detailed analysis and post-mortem actions. The troubleshooter will keep a close eye on the service for the next couple of hours. And I need a break.

Grab a coffee and deploy

[1:30PM] I know, I drink too much coffee. Regardless, I make another one. Was my last change on the service the one that caused the incident? The service should have withstood more traffic than it was the case today, I’ve tested it, but it was not the real traffic. It’s virtually impossible to simulate real-world conditions. I’ll have to look into this tomorrow.

[1:50PM] Aaaand... I’m again back on my stuff. My service on IO seems to be blissfully ignorant of the incident on the production, which brings me some peace. I create the pull request, requesting a review to merge my change into the main branch - while I notice my pull request on another service from yesterday has been approved by my teammates this morning. Time to roll that one out. Headphones and music on.

I trigger the merge, check the build - all green – and switch to production Deployment Manager. Need to be careful now. Deploying to production is always somewhat stressful, but this feeling kind of wears off. The company is deploying something to production more than 1000 times a day, making us truly agile in that sense. That somewhat diminishes the stress of it. But if there’s the time to mess up, it’s now.

[2:30PM] Once I’m sure I know what I'm doing, I send a message that I’m deploying to our team’s channel. A few mostly mocking reaction emoticons appear, but I know those are approvals. I trigger the canary deployment, which lets the system deploy to one of the production instances, and start comparing the predefined metrics with older versions of the service still running on other instances. If it detects an anomaly, it will automatically roll back the change. I watch the dashboards and logs on both monitors as a small percentage of the traffic pours into my instance. Looks good.

[2:50PM] 20 minutes later, I watch the deployment pipeline enroll my version of the service to all production datacenters. It’s at the same time cool and scary to watch the states turn green in 16 data centers around the world where this service is deployed. I imagine how cool is that my code flies through the optical wires at the speed of light with the potential of making some people very angry. Dashboards still look good as instances go out the balancers and back in with a new version. I drink my water from the bottle and nod my head to the rhythm of the music. Aaaaall good. The world is a good place.

Do the chores

[3:10PM] I’m keeping the dashboard open on my secondary monitor while going through my emails. I answer some, ping a few people on Slack. Ok, let’s fill in the damn survey. I understand the company tries to get a sense of some things, although it feels tiresome. Dashboards still look fine on the other monitor.

I check Jira and what could be next on my plate. I check other people’s pull requests, check the code analysis on SonarQube. Two warnings stick out on static code analysis of the service I was working on today. One is definitely a false positive, the other one might be worth looking into. Tomorrow.

I get back to the non-urgent Spike I took for this sprint. Its aim is to figure out the best possible way to avoid problems we have with caching in Redis on one of our services. I love tasks of that research type, as writing detailed analysis is my thing. I won’t say which nickname this resulted in for me in my team, but I am known as the unofficial maintainer of our documentation. It’s my fetish, and I’m not ashamed of it. I fiercely screenshot the graphs as I torture the Redis cluster.

Time to wrap-up?

[4:50PM] The advantage of the laptop, compared to a desktop - I was so stubborn to keep working on for years - is that you can slam the screen down once you’re done with the day. It feels similar to slamming the handset down on the old dial telephone. I miss that, it’s no longer possible on smartphones, you would just crack the damn screen.

People from the team are slowly signing off on the team’s Slack channel. However, I do not slam my laptop screen down yet, as my brain just started working perfectly. As I said, I am more of a late bird. I proceed to focus on testing the locally spun Redis as it turns into a very fun game for me - tweaking how much load I can put in which format until it breaks down. My laptop’s cooling fans are spinning fast. I cannot hear them, as the music is still playing in my ears, but I can feel their vibrations on my fingertips.

New day, new challenges

[5:37PM] I need to force myself to stop as it’s way beyond my working hours. I look at the clock in the corner of my screen - ok, I’ll compensate somehow tomorrow. Last glance at my to-dos. Last glance on Slack. Last glance on Jira. Last glance at the dashboards. All good. Tomorrow is a new day. Smack the screen down.

Last updated