Platform Architecture

How to describe Infobip's platform architecture, keeping in mind the size, complexity and versatility of the systems in place? One way is to look at it in the terms of layers.

Infrastructure layer

This is the underlying layer that in most cases contains no business logic for Infobip‘s core (messaging) business, but serves as a foundation on which the Infobip platform is built.

Hardware layer

Includes networking equipment, storages and compute servers. A large part of Infobip platform runs on on-premise hardware - hardware owned by Infobip collocated in various datacenters across the world - but a part of the platform also stretches into some cloud providers (AWS, Azure). In the on-premise case Infobip infrastructure teams manage the hardware - we purchase the servers and networking equipment, install them in datacenters, manage hardware and software upgrades (e.g. replace broken disk, or install firmware updates). Infobip's Global Infrastructure teams often travel to datacenters to install, fix or upgrade physical hardware. Sometimes we also do it using "smart hands" - datacenter's on-site teams that can quickly respond to our requests and perform some of the work on our hardware. In case of cloud, this layer is managed by a cloud provider so we don't really care about the underlying physical hardware, but we do care about configuring it and connecting it with the rest of the infrastructure.

Virtualization and core infrastructure

Virtualization uses software to create an abstraction layer over computer hardware that allows the hardware elements of a single computer - processors, memory, storage and more - to be divided into multiple virtual computers, commonly called virtual machines (VMs). These virtual machines, from the perspective of developer, function just as the physical machines. However, virtualization enables much more flexibility than working with a physical machines - it increases the development speed as new virtual servers can be easily created, decommissioned or altered through software. It also allows better resource utilization as more virtual machines can be put on the physical hardware. Using virtualization also increases stability of the whole system, as VMs can be replicated or moved from one underlying server to another even while running. Infobip in most cases uses Microsoft Hyper-V and VMware ESXi virtualization software that provides virtualization layer. On top of this layer we run various Linux and Windows VMs that are built based on standardized images.
Along with hypervisors and VMs, there is a number of networking and management services like DNS, LDAP and IPAM, various firewalls, VPNs and load balancers that keep the core infrastructure running. Those are managed by Global Infrastructure and IaaS (Infrastructure as a Service) teams using technologies like Terraform, Chef, Ansible and other automation and infrastructure-as-a-code tools. Core infrastructure also takes care of various monitoring and logging systems that operate and support the infrastructure layer. IaaS teams care about the backups and data replication, and do regular maintenance and patching when required.
On the top of VMs in some cases we run Kubernetes clusters. Kubernetes is a orchestration platform that enables management of (Docker) containers that can be run on a network of servers which in this case act as one big computing engine, as opposed to containers being run directly on VMs. Kubernetes clusters are also managed by IaaS teams, or run as a service in clouds. In some cases the platform seamlessly stretches to the cloud where part of the system is running.

Core software platform

Defines what we call the Infobip platform - standardized way of how our micro-services are structured, controlled and monitored, how they communicate (RPC) and how the load balancing works. For load balancing we use either external network balancers (HAProxy) or client-side balancing implemented within the services. Core software platform also provides our CI/CD pipeline - fully automated process of how source code is built and tested, how applications are compiled and packaged and how they are deployed to production. Platform layer tools allow us to have visibility and control over all our services.
Platform teams also develop some core services that provide common functionalities like authentication/authorization and exposing services on Infobip's external APIs. Platform teams also develop a number of standard libraries for our services, as well as various helper proxies for functionalities like storing data to cloud storage, reaching Internet from internal network, executing db procedures and such. Our Core software platform layer is maintained by PaaS (Platform as a Service) teams. The goal of this layer is to make life of developers easier, providing common tools and out-of-the-box functionalities so that developers can focus on solving business problems instead of thinking about how to solve usual infrastructure problems. Our services can be built in various technologies, as long as they conform to our platform contract, but one could say that Infobip is primarily JVM based company. Most of the core backend services are written in Java or Kotlin, but there is also a large .NET community. There are other tech stacks in use like Golang (primarily for infrastructure) and Python / Groovy / Ruby / Bash / Powershell used mostly for scripting and automation. We do not restrict our developers from using any technology that solves business problems. Observability tools - monitoring (Prometheus, New Relic), logging (Graylog) and alerting (Opsgenie) are provided by Quality part of this layer. Those tools are in large part automatically configured for each new developed service, so the developers can easily create dashboards and configure metrics and alerts for our new products.

Data layer

The data layer is here to provide the system with - data. Here we use different technologies: from classic relational databases (like SQL Server and PostgreSQL) grouped to clusters, through Kafka as our central data streaming platform up to data warehouses and columnar storages (Clickhouse, Cassandra) and ElasticSearch clusters. These systems transfer, aggregate and prepare the huge amount of data that flows through the platform so it can consumed by our services. Data teams develop and maintain this data infrastructure. In some cases, machine learning is used on top of the data to help with the cases like fraud detection, which is also developed by our data engineers.
Ideally, from the perspective of a developer, one should be able to just define the business logic not caring much about underlying infrastructure it runs on. In practice, one usually needs to be aware of the infrastructure to some extent, even when running your code "in the cloud". Infobip infrastructure layers serve as "internal cloud" for the rest of development, the one built for our specific business needs. We try to abstract the infrastructure for developers, making it transparent whether you are deploying to cloud or on-premise instances.

CPaaS layer

On top of the infrastructure layers, which provide required resources, we run our core messaging platform.
The basic business layer in Infobip is CPaaS - Communication Platform as a Service. You can think of this layer of Infobip as a huge communication hub between businesses and end-clients. On one end businesses connect to our APIs or use the system through our portal and software services, and on the other end we connect to various providers - from hundreds of mobile network operators for SMS/RCS and voice traffic, to various other channels like WhatsApp, Viber, Telegram or Email - that in the end reach the end-clients mobile devices.
Why would a client use Infobip instead of integrating with those channels directly? Primarily, Infobip unifies the experience of using different network operators, or even using different communication channels. It simplifies life for businesses that do not have to care about the underlying complexities - they just want to deliver the message to the end client, or have a two-way communication channel. Someone can, of course, directly integrate with various channels, but this comes with the development cost, so it's much faster, cost-efficient and even more reliable to use a provider like Infobip.
Core of the CPaaS layer consists of message relay services, accompanied with routing logic and billing-related services. Those services implement the business logic of the messaging platform. CPaaS services are fueled by the data provided by our data layer. A lot of various message queuing and caching mechanisms like Redis and RabbitMQ are used to assure message delivery in minimum possible time, usually within a fraction of the second. Service architecture allows operating the system at large scale, with large throughput and multi-level redundancies to assure system stability. Most of the CPaaS layer is written in Java and Kotlin.
CPaaS is exposed to our clients through our APIs, or it can be used with zero-coding through our Customer Portal, which is a part of our SaaS layer.

SaaS layer

On top of our messaging platform, we are building the software for our clients that can be used in Software-as-a-Service model. SaaS primarily consists of our Customer Portal (that we call CUP), a web interface where our clients can log in to use the system. Main functionality of CUP is sending of messages. In its basic form this means sending single messages or message campaigns, but on top of CPaaS functionality we have several products that are a part of CUP:
Campaigns are built using the People module which enables our clients to manage their end-clients and build targeted communication based on demographics, behavioral, engagement, transactional and mobile data. Using Moments module our clients can create omni-channel campaigns with advanced automation and personalization capabilities. A visual editor allows clients to build complex communication workflows that can use multiple channels from a single unified interface.
Conversations is a digital-first cloud contact center solution. This means that our clients can purchase this product and start a contact center, where their agents can interact with their clients through multiple channels. Agents can use conversations through web interface, but we also provide mobile (Android/iOS) applications allowing agents to work on-the-go.
Answers is our chat-bot building platform. Our clients can create automated chat workflows, that can range from simple questions and answers to complex virtual assistants that use AI to recognize user's intent and provide information.
SaaS technically consists of:
  • web-based frontend layer (CUP) written in React and Typescript which communicates with our backend services. We use micro-frontend architecture to allow scaling. It’s a core layer for each of our SaaS products.
  • backend layer written in Java or Kotlin that supports all the CUP features offered in CUP
  • data layer where have various caching systems like Redis and other DBs like Kafka, Clickhouse, MsSql, Postgres and Mongo, depending on the product use cases
  • API layer that can be internal (for CUP purposes) or public (for our clients) and is specific to each SaaS product
  • Mobile App Messaging SDKs (Android, iOS, Huawei, Flutter, React Native, Cordova, Ionic) which enable smartphone application developers and Web SDKs which enable web developers to use SaaS capabilities
  • Conversations Mobile applications (Android, iOS) allowing contact center agents to work on-the-go and use SaaS capabilities directly

Telco stack

Telco products are somewhat specific to the rest of the Infobip platform. This is mainly because Telco includes not just the services that are part of Infobip infrastructure, but also some services that are installed in premises of Mobile Network Operators (MNOs) or some bigger clients.
Some Telco products exist to enable MNOs with various features like implementing APIs on their end or doing operations on their data, talking to the rest of the MNO infrastructure through telecommunication protocols like SS7. Some Telco products are installed in bigger clients' premises, in order to seamlessly connect them to our platform. This implies that this part of the Telco stack uses somewhat different deployment tools and procedures, as it is not a part of our network and datacenters.
Depending on client and project requirements, we sometimes to deploy services on physical servers provided by Infobip (usually on VMs virtualized on that hardware) or we deploy those services on VMs provided by clients. This part of Telco has a separate infrastructure and deployment team that deals with installation and maintenance of Telco products.
The part of the Telco stack consists of standard Infobip (micro)services that are a deployed in our datacenters, same as the rest of the Infobip services.
Main Telco products include:
  • SMSC (SMS Service Center) is a standard part of mobile network operators, which is enabling them to deliver SMS messages to end user. All MNOs must have SMSC components that enable P2P traffic (sending SMS from one mobile phone to another). Infobip offers specialized SMSC that enables MNOs to receive A2P (application-to-person) traffic. This means that Infobip enables MNO to expose API over which businesses can send SMS to end users. This API is used to connect Infobip platform to MNOs, but it can also be used by other MNO's clients.
  • sGate (Security Gate) is value added service (VAS) that is used by telecommunications industry to describe all non-core services used by MNO - any services that offers additional capabilities to MNO beyond default capabilities like Voice, SMS etc. sGate is in a nutshell SMS firewall installed on MNO's end, and it provides MNO with the ability to detect and block unwanted SMS messages (like SMS Spam). sGate connects to MNO's mobile network through telecommunication protocols to receive and process the data required for analysis. sGate uses various detection mechanisms to detect unwanted messages, which also includes machine learning used for advanced detection and filtering.
  • mGate is flexible client integration tool, deployed on client’s infrastructure. These are mostly big clients like banks who want to use Infobip services without much integration on their end. mGate is installed and configured by Infobip and connects to Infobip's platform so clients can quickly start using our services (anything available through public API, e.g. SMS, Email etc.). mGate is usually connected to client's internal data sources, sending the messages on certain events from the client's end (e.g. sending SMS notification to end-user when value in client's database changes). This tool helps clients to quickly integrate their services with Infobip, without the need to do any development required for integration by themselves.
  • Mobile Identity (MI) is an end-user verification and protection toolset based on user's phone number and integration with MNOs. MI services are deployed on Infobip's infrastructure and they use MNO's provided APIs to check and verify phone numbers. MI can be used to authenticate users securely and silently in the background, providing the capability similar to two-factor authentication but without the need to send explicit message that requires interaction from the user's end. MI can also detect frauds like SIM-swap, which is a growing threat, especially for finance sector, where attacker can attempt to take control over person's account using SIM-Swap, which is mostly handled utilizing social engineering.
  • Biometry In today's world where it's more and more important to be able to verify someone‘s identity for online digital services, Infobip can provide its Biometry services stack to help solve this problem. Infobip's Biometry is an umbrella of services that vary from document scanning (like passports and IDs, using OCR, machine readable zone (MRZ) or NFC) and face recognition / comparison, to voice recognition and liveliness detection (e.g. for enrollment or verification processes), complemented with various other scanning services like barcode and QR code readers. There are Javascript client / WASM and SDKs that enables Biometry on the client side, along with backend services for processing deployed in our datacenters.
Tech stack wise, Telco services are mostly coded in JVM based languages (Java, Kotlin, Groovy) but we also have services written in C/C++ (applications that are connecting directly to MNO network). Telco infrastructure team is using Ansible to deploy and maintain services deployed at MNOs, as each MNO uses different configurations and setup. Scripting is mostly done using Python and Bash.
From the data side Telco products primarily use rational databases like PostgreSQL and SQL Server, and Redis clusters in various configurations used for data sharing and caching.
Telco products put high emphasis on scalability/redundancy and high performance/throughput. With that in mind we opted in for micro service architecture when designing and developing Telco services.