Having worked with Kafka for more than two years now, there are two configs whose interaction I've seen be ubiquitously confused.
Those two configs areĀ
acks
Ā andĀ min.insync.replicas
ā and how they interplay with each other.This pieceĀ aimsĀ to be a handy reference which clears the confusion through the help of some illustrations.
Replication
To best understand these configs, itās useful to remind ourselves of Kafkaās replication protocol.
Iām assuming youāre already familiar with Kafka ā if you arenāt, feel free to check out my āThorough Introduction to Apache Kafkaā article.
For each partition, there exists one leader broker andĀ nĀ follower brokers.
The config which controls how many such brokersĀ
The config which controls how many such brokersĀ
(1 + N)
Ā exist isĀ replication.factor
.Ā Thatās the total amount of times the data inside a single partition is replicated across the cluster.The default and typical recommendation is three.
Producer clients only write to the leader broker ā the followers asynchronously replicate the data. Now, because of the messy world of distributed systems, we need a way to tell whether these followers are managing to keep up with the leader ā do they have the latest data written to the leader?
In-sync replicas
AnĀ in-sync replicaĀ (ISR) is a broker that has the latest data for a given partition.
AĀ leaderĀ is always an in-sync replica. AĀ followerĀ is an in-sync replica only if it has fully caught up to the partition itās following. In other words, it canāt be behind on the latest records for a given partition.
If a follower broker falls behind the latest data for a partition, we no longer count it as an in-sync replica.
Broker 3 is behind (out of sync)
Note that the way we determine whether a replica is in-sync or not is a bit more nuanced ā itās not as simple as āDoes the broker have the latest record?ā Discussing that is outside the scope of this article. For now, trust me that red brokers with snails on them are out of sync.
Acknowledgements
TheĀĀ setting is a client (producer) configuration. It denotes the number of brokers that must receive the record before we consider the write as successful.acks
It support three values āĀ
0
,Ā 1
, andĀ all
.āacks=0ā
With a value ofĀ
0
, the producer wonāt even wait for a response from the broker. It immediately considers the write successful the moment the record is sent out.The producer doesnāt even wait for a response. The message is acknowledged!
āacks=1ā
With a setting ofĀ
1
, the producer will consider the write successful when the leader receives the record. The leader broker will know to immediately respond the moment it receives the record and not wait any longer.The producer waits for a response. Once it receives it, the message is acknowledged. The broker immediately responds once it receives the record. The followers asynchronously replicate the new record.
āacks=allā
When set toĀ
all
, the producer will consider the write successful when all of the in-sync replicas receive the record. This is achieved by the leader broker being smart as to when it responds to the request ā itāll send back a response once all the in-sync replicas receive the record themselves.Not so fast! Broker 3 still hasnāt received the record.
Like I said,Ā the leader broker knows when to respond to a producer that usesĀ
Like I said,Ā the leader broker knows when to respond to a producer that usesĀ
acks=all
.Ah, there we go!
Acksās utility
As you can tell, theĀ
acks
Ā setting is a good way to configure your preferred trade-off between durability guarantees and performance.If youād like to be sure your records are nice and safe ā configure your acks toĀ
all
.If you value latency and throughput over sleeping well at night, set a low threshold ofĀ
0
. You may have a greater chance of losing messages, but you inherently have better latency and throughput.Minimum In-Sync Replica
Thereās one thing missing with theĀ
If the leader responds when all the in-sync replicas have received the write, what happens when the leader is the only in-sync replica? Wouldnāt that be equivalent to settingĀ
acks=all
Ā configuration in isolation.If the leader responds when all the in-sync replicas have received the write, what happens when the leader is the only in-sync replica? Wouldnāt that be equivalent to settingĀ
acks=1
Ā ?This is whereĀ
min.insync.replicas
Ā comes to shine!Ā is a config on the broker thatĀ denotes the minimum number of in-sync replicas required to exist for a broker to allowĀmin.insync.replicas
Ā requests.acks=all
That is, all requests withĀ
acks=all
Ā wonāt be processed and receive an error response if the number of in-sync replicas is below the configured minimum amount. It acts as a sort of gatekeeper to ensure scenarios like the one described above canāt happen.Broker 3 is out of sync here.
As shown,Ā
min.insync.replicas=X
Ā allowsĀ acks=all
Ā requests to continue to work when at leastĀ xĀ replicas of the partition are in sync. Here, we saw an example with two replicas.But if we go below that value of in-sync replicas, the producer will start receiving exceptions.
Brokers 2 and 3 are out of sync here.
As you can see, producers withĀ
acks=all
Ā canāt write to the partition successfully during such a situation. Note, however, that producers withĀ acks=0
Ā or acks=1
Ā continue to work just fine.Caveat
A common misconception is thatĀ
min.insync.replicas
Ā denotes how many replicas need to receive the record in order for the leader to respond to the producer. Thatās not true āĀ the config is theĀ minimumĀ number of in-sync replicas required to exist in order for the request to be processed.That is, if there are three in-sync replicas andĀ
min.insync.replicas=2
, the leader will respond only when all three replicas have the record.Here broker 3 is an in-sync replica. The leader canāt respond yet because broker 3 hasnāt received the write.
Summary
And thatās all there is to it! Simple once visualized ā isnāt it?
To recap, theĀ
acks
Ā andĀ min.insync.replicas
Ā settings are what let you configure the preferred durability requirements for writes in your Kafka cluster.
Ā āthe write is considered successful the moment the request is sent out. No need to wait for a response.acks=0
Ā ā the leader must receive the record and respond before the write is considered successful.acks=1
Ā ā all online in sync replicas must receive the write. If there are less thanĀ min.insync.replicasĀ online, then the write wonāt be processed.acks=all
Further Reading
Kafka is a complex distributed system, so thereās a lot more to learn about!
Here are some resources I can recommend as a follow-up:
Here are some resources I can recommend as a follow-up:
- Kafka Consumer Data-Access SemanticsĀ ā A more in-depth blog of mine that goes over how consumers achieve durability, consistency, and availability.
- Kafka Controller InternalsĀ ā Another in-depth post of mine where we dive into how coordination between brokers works. It explains what makes a replica out of sync (the nuance I alluded to earlier).
- ā99th Percentile Latency at Scale with Apache Kafkaā ā An amazing post going over Kafka performance ā great tips and explanation on how to configure for low latency and high throughput.
- Kafka Summit SF 2020 videos
- Confluent blogĀ ā a wealth of information regarding Apache Kafka
- Kafka documentationĀ ā Great, extensive, high-quality documentation
Kafka is actively developed ā itās only growing in features and reliability due to its healthy community. To best follow its development, Iād recommend joining theĀ mailing lists.