Sometimes developers have to use technologies in their work for other purposes. There could be several reasons for this: starting with a limited budget, lack of experience of developers, or because of the limitation of the company's infrastructure, or even a direct ban on the company's management. For this case, below you can see one non-standard way to use the Apache Kafka message broker.
The work of many services needs to be configured to control the logic of the application. And it's not about configuring server ports or database credentials - it's meta-information used in system management aka "business configuration". One of the most common scenarios for such a configuration is the state of the feature toggle, variable coefficients for business formulas, AB-test branching, and so on. In other words, something that must remain unchanged between releases, but, if necessary, be modified at any given time.
If a service is required to have high availability and low latency, then this configuration cannot be something that is retrieved from a database on every request. If a system involves a large number of microservices and requires horizontal scaling, then to manage the business configuration usually is used a separate dedicated microservice. Such a service could be an admin panel, with authentication and authorization.
The solution described below is created on the cluster of dozens of high-throughput servers that routed messages from http to kafka, or from kafka to kafka. The routing logic changed without restarting the servers, thus the routing rules can be considered as a business-configuration. Rule contained a SQL-like predicate, which mapped against the payload of the incomming message. So rules should be updated online.
Editing rules is implemented in a dedicated server - the administrator's panel, access to which was regulated by roles and granted only to of users. The best choice for the admin panel is a SQL database that gives quick access and ACID transactions - perfect for creating master data.
Often, the Shared Database approach is used for admin panels, but in our case this was excluded due to security reasons. The router services were located on different subnets and there were physical security gateways between internal network, which made it difficult to use solutions like Hazelcast, Redis, Consul, and so on, and the entire available arsenal was limited to Postgres, Kafka, and HTTP-requests.
The first implementation was based on the REST API. The router-service at startup came for the configuration to the admin-panel server and started working after it was received. Further, the service requested updates at a certain frequency.
After the appearance of external consumers, for which http access to our network was closed. Therefore we came up with an idea to use Kafka topics for distributing configs, as this is one of the simplest and most reliable solutions available.
Each service read the entire topic from the beginning during startup, thus receiving all updates and applying them to the system. Kafka guarantees FIFO within 1 partition, so changes can be applied incrementally - in the same order as it’s done on the master data, and also Kafka allows you to set up message retention so we can keep messages forever.
Since there was a problem of having security gateways between networks, it was decided to bypass it through replicas as follows:
Eventual consistency was an acceptable trade-off there. The backlog of configuration consumers is not critical, it is enough to choose the message scheme so that they are not connected to each other and have a consistent state
It is impossible to make a guaranteed record in two heterogeneous sources, in the database there are own transactions, in kafka there are their own. There are a few common solutions:
Transactional outbox is a common pattern consisting of transactional creation of tasks as a record in an additional SQL table, which is an outbox and subsequent asynchronous shoveling of these tasks.
Kafka transactions allow you to record that is not available to clients with certain settings. This will continue until a signal to complete the transaction is sent. It is also possible to synchronize a write and a commit with a relational database transaction, but this is still not a guaranteed way to synchronize since the commits are done one by one.
CDC (ChangeDataCapture) - solution that captures updates from the database logs, publishing everything to an outer consumer, which could be Kafka. But there is one drawback to this solution - it is cumbersome, as well as the need to use a separate component and introduce an additional point of failure.
A solution has been created that is similar to a transaction outbox and guarantees the sequence of sending. Each rule in database has a status sent
. So after each update in the same transaction it marked as false, after it we sent message to kafka and then update the status to true. If something goes wrong on sending we retry the sending in a failover job. To avoid concurrency issues we also added a version to the row, so clients can ignore outdated versions.
Such a solution could be criticized by other developers, especially if they joined the team after you’d left it. But there's nothing wrong with getting things to work at the right time, with the resources that are available.