Hope, Now you have better understanding of Hadoop Cluster, how a file is created in HDFS and how the data is stored in that file as discussed in HDFS Architecture. The following blog will make you understand about one of beautiful concept of HDFS i.e. Fault Tolerance.

Fault Tolerance

You know that your data in HDFS file is stored in the form of blocks (size: 128MB) on different Data Nodes as discussed earlier. If, someone would ask you if one of Data Node in cluster gets failed then would you read the file stored in HDFS? You will simply say No because if the particular Node gets failed then some portion of your file has been lost and you will not be able to read the file.

But, HDFS through Fault Tolerance provides you the facility to read your file in-spite of your one of Data Node gets failed. Following paragraph will let you know how Fault Tolerance is achieved.

Let’s assume that your file data is stored in 4 blocks on different Data Nodes. Hadoop will create a copy of each block and will store in separate Data Nodes. In Hadoop Terminology, creating a copy of data is known as Replication Factor. By default the value of Replication Factor is 3 i.e. 3 copies of each block but if your data is too much critical then you can increase the value of Replication Factor to desired value and even you have the flexibility to change your replication Factor value after your file is being created in HDFS. Now if your any Data Node fails then you can read the data from another Data Node on which the data is copied.

The following picture will give you a glimpse of Fault Tolerance and Replication Factor. You can see clearly that 2 copies of each block has been maintained in different Data Nodes. The red color cross on one of the Data Node indicates that the following Data Node gets failed.

Fault Tolerance


  • Rack Aware

Someone may ask you if your data is stored only in one block and all the three copies of data is st1ored under one rack i.e. Nodes coming under one rack. Now, what will happen if the whole rack fails?  You will simply say the whole data get lost because of failure in rack.  Hadoop solves this problem by providing a solution i.e. Rack Aware. Once you configured your Hadoop cluster with Rack Aware then Hadoop ensures that one copy of your data-block must be stored in different rack.

You can clearly observe from following image that when you configure your hadoop cluster with Rack Aware then it will create one copy of your block on data node placed under different rack as shown below.




  • Role of Name Node in Replication

Suppose, you have three copies of your data-block stored in different data-Nodes. Due to some reason your two of Data Nodes gets failed and you are left with only one copy of your data. You may think that you have set Replication Factor values as 3 and now you are left with only one copy so where the Replication Factor concept has gone!. Now the Name Node comes into picture. Name Node continuously tracks the replication value of each data block. As you know that the Data Node continuously send heartbeat to Name Node in a regular interval of time as a indication  that whether the Data Node is alive or gets Failed. If Name Node does not receive heartbeat from any Data Node then it assumed that the particular Data Node gets failed and Name Node will copy the data to another Data Node and maintains the replication factor value to 3. In this way, with the help of Name Node the Hadoop will continuously maintain the Replication Factor value set by you.

The subsequent image gives you an idea that whenever any Data Node fails then Name Node creates a copy of data on separate Data Node in order to maintain the constant replication factor value as 3 as shown below.

Rack Aware


The whole motive behind replication is to provide you protection against failures but it costs you a lot of space. Setting a replication factor value as 3 will reduce the storage capacity of cluster by (1/3) rd. Different versions of Hadoop contains different concepts as an alternative to replication method but replication method is a traditional method for avoiding faults and cost concerns are not high because the disks used are reasonably cheap.

Hope you have got an idea of Fault Tolerance. I will cover some other beautiful concepts of Hadoop in upcoming parts till then keep reading and keep learning. Please feel free to share your views and doubts.



About the author

Dixit Khurana