YARN(Yet Another Resource Negotiator)

YARN framework is responsible for providing resources (CPUs, memory) for execution of applications. The main motive of YARN is to divide the service of resource management and job scheduling on two separate different daemons (Programs which runs as a background process and are not under direct control of user). YARN act as a tool which enables other data processing frameworks to run on Hadoop.

Components of YARN

Components of YARN

The main components of YARN framework are :

  • Client (Job Submitter)

  • Resource Manager (Master)

  • Resource Scheduler

  • Application Manager

  • Node Manager

  • Container

  • Application Master




The following steps explains how the execution of an application flows in YARN Architecture:-

  1. The client(Job Submitter) will submit an application/job to Resource Manager. Application will be a distributed application which have multiple processes and that will be processing data in parallel.

  2. The Resource Manager which is the master and it will be one per cluster. The duty of Resource Manager is the management of resources among all the applications in the system. Resource Manager itself has many components which will be responsible for performing different actions. Two main Components are Resource Scheduler and Application Manager.

  3. Resource Scheduler decides how to assign resources to an application. Let us assume a situation when all the CPUs in the cluster are being used and still there are multiple applications present in the queue. Then, resource scheduler plays crucial role by deciding which application will get resources first.

  4. The application submitted by client will be handover to Application Manager with the help of Resource Scheduler. Now, what application manager do is to find out the free Container (Container is basically a run-time environment used to run any process and that is nothing but simply a CPU and Memory) on Slave Node.

  5. Node Manager plays crucial role in finding that free container because Node Manager is aware of the amount of CPU and memory is present on particular Data Node. Node Manger can be many per cluster and it act as a slave of the YARN infrastructure. Periodically, it sends the heartbeat to Resource Manager.

  6. When the Application Manager finally gets a free container (free-resources)in respective Data-Node then it launches the Application Master.

  7. Application Master will be one per every Application and will be responsible for complete life-cycle of that application.

The above seven steps shows the execution flow of an application in YARN Infrastructure. Hope You got an basic idea of YARN Architecture. Feel Free to ask your questions and to get basic understanding of Hadoop please visit Big Data and Hadoop




About the author

Dixit Khurana