Before you proceed , please read about the High Availabilty Feature of Hadoop. The following blog will make you understand about another beautiful part of Hadoop Cluster i.e. Secondary Name Node.

Secondary Name Node

Generally, people treat Stand-By Name Node and Secondary Name Node as same thing but actually they are different all together. Stand-By Node is actually a node which comes into picture when Active Name Node fails. (For More Details Please Click Here. On the other hand the Secondary Name Node has totally different responsibility from Stand-By Name Node.

In earlier part, you have seen two things i.e. Fs-image and edit-log file. Fs-Image is the In-Memory latest image of whole HDFS and edit-log file is the file which contains all the information whenever any change is performed by Name Node over a file. Before going into Secondary Name Node let me discuss a problem with you.

What will happen if restart your Name Node (Active Name Node)?

You already know that Name Node maintains a updated image of whole File System in memory. If we restart our Name Node due to some maintenance activity then the updated Fs-image in memory get lost because it is present in memory not on disk. Now, the question arises how to get the lost Fs-image back? You already have answer with you i.e. edit-log as it is already discussed in part-4 that you can re-construct the Fs-image with the help of edit-log. This recreation is helpful if your edit-log file contains an hour or two hours of data but imagine the situation when your edit-log file contains the data of 5 days or more. If you get an requirement to restart your Name Node then it will take so much of hours to read that edit-log file which contains data for last about 5 to 6 days and your most of time will go in reconstructing your Fs-image. This extra time taken by Name Node will lead to several other problems as you know that Hadoop Cluster would not work if there is no Name Node because client performs all the operations in Hadoop Cluster with the help of Name Node.

Working of Secondary Name Node

In-order to solve above problem, the Secondary Name Node is deployed in the Cluster. Secondary Name Node performs checkpoint activity. Secondary Name Node reads edit-log file of every one hour and then reconstruct the Fs-image and store it into local disk. This Fs-image stored in disk is known as On-Disk-Fs-Image. Once the Fs-image get stored in disk, the Secondary Name Node truncates the edit-log file. After an Hour, when secondary name Node reads the edit-log file then it merges the Fs-image saved on disk with current edit-log and reconstruct an updated Fs-image and replace the updated image with earlier saved image in disk. The Name Node again truncates the edit-log file and the process of reading, merging and saving will go in the same way. The main motive of the checkpoint activity is to merge the On-Disk-Fs-image with currently read edit-log file and recreate an updated Fs-image and store it into disk. The checkpoint activity does not take too much time because we are reading an edit-log file of just one hour. Reading an Fs-image stored on disk also takes less time because the Fs-image is very small as compared to edit-log file. Edit-log is a file which contains all the transactions i.e. records with updated information and On-Disk-Fs-image is like a Balanced Sheet. The following picture will give you an glimpse of how the things would have been working in background. The abbreviation SNN used in following image stands for Secondary Name Node.

The Active Name Node performs the same checkpoint activity and the main purpose of Secondary Name Node is to minimize the restart time of Name Node. There is no need of Secondary Name Node while performing High Availability because there, the Stand-By Name Node performs the check-point Activity.

Hope you have got an idea of Secondary Name Node. The term Zookeeper and the Single Node Hadoop Cluster Installation will be discussed in next part. Till then keep reading and keep learning. Please feel free to share your views and doubts.

For Basic Knowledge of Big Data and Hadoop please Click Here

Thank you for Reading.



About the author

Dixit Khurana