The following blog will make you understand about the internals of producer i.e. what actually happens in the background when a producer sends data to kafka.
Producer WorkFlow
1) Initially, it is mandatory to create java properties class object which includes the all the necessary configurations related to producer. bootstrap.servers, key.serializer, value.serializer are three mandatory configs which must be there in properties class object.
2) Secondly, we will create a Producer Record object and whose constructor will includes topicName, Partition Number, Timestamp, key, value as parameters. The Partition Number, Timestamp, key are optional because it totally depends on our requirement. Producer record is basically your message which you want to send to Kafka.
3 ) Now, we will instantiates the Producer object using properties object and send the producer record to the producer object.
4) When the producer record send to the producer, the producer will apply the searilizer to serialize the key and value. As you know that serialization is process of converting some object into array of bytes and producer will use the serializer class which we have specified in properties object.
5) After serialization, producer will send the record to partitioner. The partitioner will decide the partition for the message. Kafka Default partitioner plays crucial role in deciding the partitioner for the message. If you have specified key in your message then the default partitioner will Hash the key to get the partitioner number. So, if we define the same key to different messages then all of them will go to same partition number.
6) If message key is not specified then default partitioner will evenly divide the message to available partitions by using round robin Algorithm.
7) Once we have the partition number, the producer is ready to send the message to Kafka. But inspite of sending the messages to kafka directly, the producer sends the messages to partition buffer and sends the record in batches to kafka. You can configure your buffer size in the properties class object which is used earlier to instantiates the producer.
8) Finally, the producer sends the record to broker and broker will send the acknowledgement to client. In between if any thing goes wrong then kafka will send an error but some errors are recoverable and then the producer will retry the process and again send the message to kafka.
9) We can also configure the number of retries and time between two retries in properties class object. It is important to note that the producer will not attempt for retry if the error is not recoverable error.
Hope Now you get an idea to what happens in the background when producer sends data to kafka. In upcoming parts i will be covering the Consumers workflow till then keep reading Kafka.