Development Flow

As with many other things, the way our development flow looks like depends on the team. In fact, every team can decide how the process will look like and which steps and tools are required, but let's describe a usual flow for most teams:

The best way to resolve a ticket or a task is not to write any code, because any code becomes technical debt. If we can resolve it without development, and still satisfy the ticket's purpose, that is the way how we prefer to do it. But let's be honest, there are not a lot of tasks that do not require development.

When you open a ticket, the first thing to do is to analyze the ticket's purpose and check the scope of the task. If something is not clear, we usually contact our product owner or manager and then update the ticket's info. After everything is clear, we can start coding. Let's say that we need to add a new feature to the existing back-end service.

Pull code locally and create a branch

The first thing that you do is to pull existing code locally. We use the git version control system and all our git repositories are hosted on Bitbucket, which is deployed on our own servers (self hosted). You create a new git branch (feature branch), where new code will be added.

Write the code

Now the fun part starts. You should implement a feature and write tests, which are mandatory. We mostly write unit and functional tests, but some teams write other types of tests, like integration, end-to-end or smoke tests. We expect that developers don't over-engineer the implementation and don't do premature optimizations. Clean code is also a must, because many developers are working on the same project and it is important that code is readable and easily maintained. If you are struggling with the implementation, it is recommended to contact more experienced team members and ask for help. After you're satisfied with the implementation and the tests, you will push the commits to the remote Bitbucket repository.

Run Jenkins build

A Jenkins build is usually triggered automatically when the code is pushed to the Bitbucket, but some teams still run it manually. The build will compile the project and run the tests that are part of the project code. Additionally, Jenkins triggers the code analysis on the SonarQube, automated code quality inspection software. SonarQube reports the test coverage of the code and runs a static code analyzer, which reports potential issues related to security, code duplication, complexity, logical errors, performance and provides language specific suggestions and best practices that can be applied to code in order to improve the code quality. Jenkins will also run Snyk analysis, which will scan the code, dependencies and containers and report vulnerabilities. All detected vulnerabilities should be resolved immediately, if that is possible.

If Jenkins build is successful, snapshot artifacts will be pushed to the Artifactory, repository manager made by JFrog. Artifactory performs automated artifact security scan using XRay.

Code review

Once the project has been successfully built on Jenkins and all tests and analysis yields satisfactory results, you create a Pull Request, requesting the merge of your local branch into the main source branch. At least one person should be added as a pull request reviewer. It's the reviewer's job to check if all requested functionalities are properly implemented and tested. She/He can suggest code improvements like using some library instead of reinventing the wheel, better naming for components and variables and any other improvement that she/he can suggest. It is important to say that you should not be disappointed if you receive a lot of comments on your pull requests. You will learn from the feedback and the solution will be better. When all reviewers have approved the pull request, we can proceed to the next step.

Deploy SNAPSHOT version on testing environment

The next step is to deploy that feature on test environment, which we call "IO environment". Integration environment is a test data center separated from the production resembling the full production environment. It allows the developer to test the interaction of the new code with other services, making sure that the new implementation integrates correctly with the rest of the ecosystem. In the case of the services that are critical in terms of performance, developer performs the stress testing in integration environment. Stress testing means exposing the service to high loads in order to determine the implementation limits and make sure the service will be stable within the set requirements.

We use Infobip’s custom made deployment automation tool called Deployment Manager, or just DM, for deployment. You can deploy snapshot version on IO with just a few clicks. DM will pull artifacts from the Artifactory and deploy them to the target virtual machines. Then you check if everything works as expected and run smoke or integration tests, or just test it manually. If something is not working as expected, you should fix the issues in the code, and ask reviewers to check the PR again.

Make the RELEASE version

When everything is successfully tested on IO, pull request can be merged to the master or release branch. Most teams will merge PR directly to the master branch and then trigger release job on Jenkins. Artifacts with release version will be pushed to the Artifactory. Now they are ready to be deployed to production.

Deploy to production

As an integral part of production deployment cycle, Infobip uses an automated canary process that is recommended for all pushes into production environment. Canary deployment method reduces the risk of introducing undesired changes into production by deploying the new code in a small part of the production environment for a limited time-frame and then comparing the performance of the new version with the previous stable version before deciding whether to roll out the new version on all production instances. Canary deployment is handled by Deployment Manager as a fully automated process.

You select one of the instances where the canary deployment will be tested. After running the new version on the selected instance for a certain time, the metrics of the new version are compared either against other instances running the older version or against the same instance in previous time-frame before the version upgrade. User configurable metrics are being gathered during this canary testing period, and if the performance and reliability of the new service are within defined boundaries, Deployment Manager automatically deploys the new version on all instances. If the canary testing metrics did not meet the expectations, canary instance is rolled back to the previous stable version.

Closing the ticket

Before closing the ticket, you would check whether the completed work conforms to team’s Definition of Done (DoD). DoD is a team’s internal agreement that ensures that members of team agree about the completeness and the quality of work they are producing. Infobip has defined minimal DoD requirements while the teams have the possibility to build upon those requirements in line with internal needs (scope of work, technologies used etc).

Most of the teams will mark the ticket as closed when the new service version is deployed to production, but there are some exceptions.

Continuous monitoring and alerting

After the new code has been deployed to production, service metrics are to be collected in a time series database (Prometheus). Service logs are automatically shipped to central logging service (Graylog) allowing analysis of the logs coming from different services. The service metrics are visualized in a form of graphs (Grafana) displaying real-time overview of the service state and performance. Development, support and QA teams monitor the system as a whole and its elements using various dashboards aimed to reveal the potential performance and stability issues. Custom defined alerts based on service metrics are configured for each service, in order to immediately notify the development teams if the service performance falls out of defined boundaries. Alerts from different sub-systems are collected in a central system (OpsGenie) where they are de-duplicated and routed to appropriate team using custom defined workflows.

Last updated