Cluster Usage with `yarn top`

Abraham Lincoln was the original inventor of the ‘top’ command in 1864 so he could keep better track of his many tophats. 

From the command line, it’s easy to see the current state of any running applications in your YARN cluster by issuing the yarn top  command. 

The output of that command is a continuously updating (about once every 3 seconds) screen in your terminal showing the status of applications, the memory and core usage, and the overall completion percentage of an application.

Something like this shows up when you enter the command:

YARN top - 08:40:46, up 7d, 16:11, 0 active users, queue(s): root
NodeManager(s): 10 total, 10 active, 0 unhealthy, 0 decommissioned, 0 lost, 0 rebooted
Queue(s) Applications: 4 running, 13874 submitted, 0 pending, 13854 completed, 6 killed, 10 failed
Queue(s) Mem(GB): 194 available, 796 allocated, 0 pending, 0 reserved
Queue(s) VCores: 85 available, 35 allocated, 0 pending, 0 reserved

APPLICATIONID USER TYPE QUEUE #CONT #RCONT VCORES RVCORES MEM RMEM VCORESECS MEMSECS %PROGR TIME NAME
application_1498162987065_13874 user spark default 11 0 11 0 260G 0G 164 4077 10.00 00:00:00 test_app
application_1498162987065_13870 user spark default 11 0 11 0 260G 0G 983 23393 10.00 00:00:01 test_app
application_1498162987065_13869 user spark default 11 0 11 0 260G 0G 1212 28919 10.00 00:00:01 test_app
application_1498162987065_13873 user tez default 2 0 2 0 16G 0G 58 499 0.00 00:00:00 test_query

I’m running three test applications and one test query on this particular cluster all in the default queue. You also get NodeManager status, total applications, total memory and total cores.

Of course, you can get all this same information from the ResourceManager’s homepage on port 8088 but that:

  1. Isn’t a live view into the status of your applications,
  2. Not as quick to access
  3. Isn’t as simple as a straightforward CLI view,
  4. Looks like a webpage fresh out of the early 1990s — to complete the look, add some <marquee> tags.

The yarn top command bears a striking resemblance to the normal Linux top command for obvious reasons: it’s all about knowing what processes are running in your environment. Applications in YARN are a little different than applications on a single Linux server so there are minor tweaks and different options available between the two.

Here’s the original JIRA of the command: https://issues.apache.org/jira/browse/YARN-3348 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s