We run HBase as part of our Hadoop cluster. HBase sits on top of HDFS and is split into two parts: the HBase Master and the HBase Regionservers. The master coordinates which regionservers are in control of each specific region.
Automating Recovery Responses
We periodically have to do some minor maintenance and upkeep, including restarting daemons that have died. The first pass of restarts are handled by Nagios checks with eventhandlers. We have Nagios periodically connecting to the daemon status url provided on each HBase Master and HBase Regionserver. If that connection times out or is considered slow, then the eventhandler attempts to restart the daemon exactly once. If the daemon doesn’t recover, Nagios treats the daemon as down and notifies us. More often than not, this resolves the issue.
But, for cases where the HBase Regionserver daemon repeatedly dies in a short period of time (or just never restarts when the eventhandler triggers it), we can end up with a number of dead Regionservers that need to be manually restarted. So, how do we do that?
The naive is to login to each server, check to see if the HBase Regionserver is running and optionally restart it then. This can take a long time, depending on the number of servers in your cluster.
But, there’s a shortcut we could utilize. The HBase Master keeps track of all Regionservers that have ever contacted it. If one of those Regionservers dies and never connects back up, the Master places that Regionserver on a dead list. The dead list is published in the HBase Master status page. Unfortunately, it’s wrapped in a bunch of HTML and no one really wants to code something to unwrap that.
Further Cleanup using HBase JMX Metric Data
Alternatively, the HBase Master publishes that information in the JMX data. Let’s figure out how to access that and make use of it for more automated ways to cleanup dead Regionservers.
First, let’s access the JMX url. This will always live at /jmx
on the HBase Master status port. It provides a lot of information. Go check out yours, then come back.
$ curl -s http://hbase-master:60010/jmx | wc 178869 2000518 12915506
In our cluster, we’ve got a lot of tables, a lot of servers, and just a lot of metric information about what’s going on. You can see that the JSON returned from our master comprises 178 thousand lines of JSON. That’s a 12 megabyte JSON data structure. We want to parse that out from the command line so we can iterate through each section, but we don’t want to attempt to use a custom-designed scripting tool (such as using python, ruby, or perl with JSON modules).
Instead, we’re going to use a tool called jq. Think of em
as sed
for JSON data. You can use it to parse, query, and output specific parts of your JSON data for easier use on the command line. jq
, like sed
, can be as easy or as difficult as you want to make it. You should take a few moments to read through the tutorial and manual if you’d like to learn more.
Let’s begin with understanding how jq
works.
All the HBase data lives in a top-level key called beans. Each major area of metrics lives in several data structures under that. Each data structure has a name key that defines what area of metrics that data structure covers. There are sections that cover Java memory statistics, logging information, buffer pool information, and HBase statistics.
Let’s figure out what every chunk is.
curl -s http://hbase-master:60010/jmx | jq '.beans[].name' | sort "JMImplementation:type=MBeanServerDelegate" "com.sun.management:type=HotSpotDiagnostic" "hadoop:service=HBase,name=Info" "hadoop:service=HBase,name=RPCStatistics-60000" "hadoop:service=Master,name=Master" "hadoop:service=Master,name=MasterStatistics" "java.lang:type=ClassLoading" "java.lang:type=Compilation" "java.lang:type=GarbageCollector,name=ConcurrentMarkSweep" "java.lang:type=GarbageCollector,name=ParNew" "java.lang:type=Memory" "java.lang:type=MemoryManager,name=CodeCacheManager" "java.lang:type=MemoryPool,name=CMS Old Gen" "java.lang:type=MemoryPool,name=CMS Perm Gen" "java.lang:type=MemoryPool,name=Code Cache" "java.lang:type=MemoryPool,name=Par Eden Space" "java.lang:type=MemoryPool,name=Par Survivor Space" "java.lang:type=OperatingSystem" "java.lang:type=Runtime" "java.lang:type=Threading" "java.nio:type=BufferPool,name=direct" "java.nio:type=BufferPool,name=mapped" "java.util.logging:type=Logging"
Since we’re looking for dead Regionserver information, we probably want to look in the hadoop:service area. I know that this data lives in hadoop:service=Master,name=Master because I’ve previously looked through the full JSON output. You should spend some time looking at each section to figure out what useful information may be in each.
Let’s focus on this Master section.
One thing you’ll note is that the pipe character instead the jq
command acts just like the pipe character would on a normal command line. You use it to filter the output of the left side of the pipe into the right side of the pipe.
Here, we want to look into the beans hierarchy and extract out the data structure that contains the hadoop:service=Master,name=Master data.
$ curl -s http://hbase-master:60010/jmx | \ jq '.beans[] | select(.name == "hadoop:service=Master,name=Master")'
This is going to print out information about the current Regionservers, what regions are assigned to those Regionservers, the load each region is seeing. This is all great and useful information if you want to know per-Regionserver statistics from the Master’s viewpoint. We only need a subset of the Master data that does not deal with the individual Regionservers.
Let’s have jq
remove the Regionservers data so we can focus on the Master info.
$ curl -s http://hbase-master:60010/jmx | \ jq '.beans[] | select(.name == "hadoop:service=Master,name=Master") | del(.RegionServers[])' { "IsActiveMaster": true, "DeadRegionServers": [ "hdn01.example.net:60020", "hdn13.example.net:60020", "hdn01.example.net:60020", "hdn20.example.net:60020", "hdn12.example.net:60020" ], "ZookeeperQuorum": "hdn01.example.net:2181,hdn01.example.net:2181,hdn01.example.net:2181", "RegionServers": [], "RegionsInTransition": [], "name": "hadoop:service=Master,name=Master", "modelerType": "org.apache.hadoop.hbase.master.MXBeanImpl", "ClusterId": "1777204b-2fba-49ab-ae93-e9f9a8bbe10b", "MasterStartTime": 1420585763679, "MasterActiveTime": 1420585914045, "Coprocessors": [ "AccessController" ], "ServerName": "hbase-master.example.net,60000,1420585762914", "AverageLoad": 85.88349514563107 }
Immediately, you can see that the Master we’re talking to is the active one. You can see that we have a handful of dead Regionservers. You can see that we have some load on the cluster. Also, the RegionServers
section is empty because we’ve purposefully deleted it out of the output with jq
.
We’ve whittled down the data structure and discovered that what we want is the DeadRegionServers data structure. Let’s extract that further so we can use it.
:; curl -s http://hbase-master:60010/jmx | \ jq '.beans[] | select(.name == "hadoop:service=Master,name=Master") | .DeadRegionServers[] ' "hdn01.example.net:60020" "hdn13.example.net:60020" "hdn01.example.net:60020" "hdn20.example.net:60020" "hdn12.example.net:60020"
We’ve extracted the list of dead Regionservers. You’ll note that there’s some extraneous stuff on here that needs to be cleaned up. We want to feed this host list into pdsh so we can distribute the ssh commands in parallel.
$ curl -s http://hbase-master:60010/jmx | jq '.beans[] | select(.name == "hadoop:service=Master,name=Master") | .DeadRegionServers[] ' |\ sed -e 's/:60020//' |\ tr -d '"' |\ tr '\n' , hdn13.example.net,hdn01.example.net,hdn20.example.net,hdn12.example.net,
Finally, this means we can do this to initiate the actual restarts.
$ host_list=$(curl -s http://hbase-master:60010/jmx | jq '.beans[] | select(.name == "hadoop:service=Master,name=Master") | .DeadRegionServers[] ' | sed -e 's/:60020//' | tr -d '"' | tr '\n' , ) $ echo $host_list hdn13.example.net,hdn01.example.net,hdn20.example.net,hdn12.example.net, $ sudo pdsh -w $host_list service hbase-regionserver restart hdn12: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 17917 failed with status 1 hdn01: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 29969 failed with status 1 hdn13: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 18833 failed with status 1 hdn12: hbase-regionserver. hdn01: hbase-regionserver. hdn20: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 10849 failed with status 1 hdn20: hbase-regionserver. hdn13: hbase-regionserver. hdn12: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn12.out hdn01: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn01.out hdn13: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn13.out hdn20: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn20.out hdn12: hbase-regionserver. hdn13: hbase-regionserver. hdn01: hbase-regionserver. hdn20: hbase-regionserver.
Finally … PROFIT!
We now have the basic outline of what we need to do to automate this recovery. For example, we could build a Nagios check around this to automatically restart dead Regionservers if any ever show up in the dead list. Or we could make it only restart if there are more than X percent of your cluster in a dead state. The point is, there’s a lot of information in the JMX JSON output that you can use to automate responses to. jq
makes it easy to do for those tools or responses that don’t require a full-blown scripted environment to handle.
The JMX data is not limited to HBase daemons. We also use this within HDFS for tracking things in the Namenode and within the JobTracker to extract some data about jobs. If you’ve got other Hadoop daemons running, you should see if they export the JMX data for further interesting uses!