The following blog will make you understand about the Kafka Consumer Group i.e. Multiple Consumers in a single group.Please read it carefully because it is the most important topic as per interview.
Consumer Groups
If your producers are sending data to a topic at moderate speed then a single consumer is sufficient to handle that data but if you want to scale up your system and want to read data in parallel from your topic then you need multiple consumers reading your data in parallel. There are many real applications which requires multiple producers sending data at one end and multiple consumers receiving data at another end. It is important to note that in case of multiple producers there is no complexity because it is as simple as that starting another instance of kafka producer but in case of multiple consumers there are various questions which may create problem for you. So lets discuss them with their solutions.
How to read data in parallel ??
When we talk about reading data in parallel it means that one single application reading data in parallel. It does not mean that multiple applications reading the data from same kafka topic in parallel. Now, you might be thinking that how single application read data in parallel. The answer is Consumer Groups. You will be creating multiple consumers and binding them to a single group and this group is known as Consumer Group. It looks very simple but further this arises some questions. Lets discuss them.
How to avoid duplicate reads ??
You might be thinking that when the multiple consumers read data from same topic then there may be chance that all consumers in the group reads the same message from topic. The answer is No because kafka provides a simple solution to this problem i.e. only one consumer in group owns a partition at one point of time. It means that the consumers in the group does not share the partition. Thus it prevents duplicate read of data. The maximum number of consumers in the group is the number of partitions of your topic but there is no problem if you have more consumers than your partition because in that case your consumer will be sitting idle.
Consumers Entry or Exit from Group
This blog is about Consumer Groups so you will be thinking that how the group will be created and how consumer enters or leaves the group. How the partitions assigns to new Consumer when it enters to group and how the rearrangement of partition takes place when any consumer leaves the group. The following paragraph will let you know the answers of all the above questions.
The Group Coordinator manages all of the above factors. One of the kafka broker in cluster is elected as Group Coordinator. When first consumer wants to joins the group then it sends the request to group coordinator and this first consumer is elected as the leader of the group. The remaining consumers joining the group will be known as the members of the group. Now we have two actors in picture Group Coordinator and Leader.
Group Coordinator manages list of group Members. Every time a new consumer joins the group or existing member leaves the group then the coordinator modifies the list. On the event of members change the coordinator realises that it is the time of partition reassignment because you may have new members inside the group then you have to assign some partitions to it. Every time the list is modified then the coordinator initiates the rebalance activity and now the leader is responsible for executing the rebalance activity. The leader will take the list of members , assign new partitions to them and send it back to the group coordinator. After that the group coordinator communicates about the new partition assignment to the respective members of group. One important thing to note here is that during rebalance activity none of the consumers are allowed to read the data.
Hope Now you get an idea of what are Consumer Groups and how they are created by the Kafka. The next blog will make you understand about the coding perspective of Consumer Groups. Till then keep reading Kafka.