Checking hbase regions are online.

At $work we have a client with a hadoop cluster, they wanted a nagios check that would check that all of the regions for a specified table were online/queryable. After some research it seemed like the best way was to use stargate which is the REST api for hbase.

The way the check works is that it gets a list of all the regions in the table and then the start/end key, the name, and the location of each region. It then queries the API with '$table . "/" . $key' to see if it returns a result, if it doesn't it adds to to an array for nagios. Once it has finished checking all the regions the script looks at the nagios array, if it's not empty it will echo out all the missing regions and exit 2 (Critical). Else it will say everything is OK and exit with 0.

As the check takes a while to run for the table being checked the script uses file locking to ensure it only runs one at a time. If nagios tries to run it while it's running it will return message saying it's already running and exit 0.

The full script is: