Hope you have understood the basic need of Kafka The following blog will let you know the various terminologies of Kafka, which are very much necessary in order to get better understanding of Kafka.

Terminologies

The main terminologies used in Kafka are :-

  1. Message
  2. Producer
  3. Consumer
  4. Broker
  5. Cluster
  6. Topic
  7. Partitions
  8. Offset
  9. Consumer Groups

 

  1. Message

Message is basically a small , medium or any big data. It can be any thing like your text file, your database records etc. Generally the term is called as message but for Kafka it is simply an array of bytes. Does not matter which type of data you are sending, the Kafka will take it as in the form of array of bytes. Lets understand with an Example :-

Suppose you have one following table and you want to store it into Kafka. Each row of table will be treated as a message (i.e array of bytes) and will be stored in Kafka.

     2. Producer

The application which will be responsible for sending data to Kafka is Producer application. In order to send data to Kafka, you have to create an producer application. How producers send data to Kafka ??  Don’t worry about that because I will discuss it in detail in upcoming parts of blog.

    3. Consumer

Consumer is again an application which receives data from Kafka. Producers are sending data to Kafka and consumers are receiving data from Kafka. Please keep in mind that the producers don’t send data to consumers directly. Consumers requests the data sent by any number of producers but provided, they have permissions to read it.

The following image show you the outline of producer and consumer.

     4.  Broker

Broker is just a name given to Kafka Server. This name itself make sense because Kafka server acts as a broker or an agent between producer and consumer application. Producer and consumers interact with each other with the help of broker i.e.Kafka Server.

The following image gives you an idea of Kafka cluster and Kafka Broker (Kafka Server).

    5. Cluster

Cluster is a group of computers acting together in order to achieve a common purpose. For Kafka, cluster has the same meaning i.e. group of computers, each having one instance of Kafka broker. 

    6. Topic

Till Now, you have understood that producers send data to Kafka and consumers receives data from Kafka. Now, the question arises how consumers can identify that which kind of data is sent by which producer. Here, the Topic comes into picture.

Topic is just an unique name given to your data stream. In Kafka, when producers are sending data to Kafka server, actually they are sending data to a topic. Producers have to create a topic whenever they are sending data to Kafka and based upon that topic the consumers will receive data from Kafka server.

In other words, Producers provide Topic Name to their message and then the whole message along with the topic name gets stored into Kafka server i.e.(Broker).

   7. Partitions

We knew that the data is stored in Kafka server and the data can be of any volume i.e.(size). What would you do, if the data coming is too much larger then the storage capacity of your  computers.  This may create a challenge for broker that how to store the large amount of data.

The solution of the above problem is also given by the Kafka. Kafka will break your Topic into multiple parts and distribute it into multiple computers and store one partition into one computer .

you may be wondering that how Kafka will decide that in how much partitions the topic should be divided?. This decision of making number of partitions is not taken by the Kafka, it is taken by the you at the time of creating the Topic. You have to specify that much number of partitions of topic you have needed for the your data and then Kafka broker will create the specified number of partitions. Overview of Partitions of a topic is shown in the below image.

Note :- Please keep in mind that every partition will be stored in single machine and you are not allowed to break it further.

   8. Offset

Offset is basically the sequence number given to your message in a partition. Your messages gets stored in the partition in arrival form which means what ever message comes first, it will get stored in 0th offset number of partition, next message will takes place 1st offset number and so on.

You can treat offset as an array. like in array we have indexes like 0, 1, 2 etc. and in partition we have offset number as 0, 1,2. Please keep in mind that these offset numbers are immutable which means they can’t be changed. Following picture will give you the glimpse of Offset Number.

Here, the M1, M2 are your Messages and you can see clearly that M1 is getting place at 0th Offset Number and M2 is at 1st Offset Number and so on.

  9. Consumer Groups

Consumer groups can be defined as a group of consumers sharing the work. For Example – Just like you work in a team of 3-4 members in order to complete a project.

Suppose you have huge amount of data coming from multiple producers to Kafka but if you have only single consumer to handle all those data then that would create a problem. In order to solve this problem, the concept of consumer groups comes into picture. Consumer group is a single unit in which multiple consumers are running and they would be accessing the data from Kafka in parallel.

Now, the question arises how many consumer you can create inside consumer groups??. The answer is:- the maximum number of consumers you can create is the number of partitions you have created for a topic.

Note:- Kafka does not allow a particular consumer to read data from more than  two partitions otherwise it will lead to double reading of data.

The following picture gives you an overview of consumer groups. The picture includes an example of Retail Organisation. There are multiple stores and in each store there is a billing counter. we want to read or put  invoices from all billing counters to one common place i.e. our Data centre. we will start one producer application at each store and which will be sending invoices data to Kafka and the picture shows that multiple consumers are also started forming a consumer group in order to read huge amount of data coming from multiple producers. Finally, consumers will send data to required data centre.

Hopefully, Now you are familiar with basic terminologies of Kafka. I will discuss the installation of Kafka along with some implementation in next part. Till then keep reading Kafka.

Comments

comments

About the author

Dixit Khurana