Kerberos kt_renewer failures with HUE on CDH4

First off, I’m not exactly sure if this is a Hadoop User Environment (HUE) issue or if this is a broken setup on my Kerberos environment. I have a thread open on the HUE users list, but haven’t had any followup. I’ve just fired up HUE for the first time to talk with a kerberos-enabled […]

Mass-gzip files inside HDFS using the power of Hadoop

I have a bunch of text files sitting in HDFS that I need to compress. It’s on the order of several hundred files comprising several hundred gigabytes of data. There are several ways to do this. I could individually copy down each file, compress it, and re-upload it to HDFS. This takes an excessively long […]

Using cobbler with a fast file system creation snippet for Kickstart %post install of Hadoop nodes

I run Hadoop servers with 12 2TB hard drives in them. One of the bottlenecks with this occurs during kickstart when we’re using anaconda to create the filesystems. Previously, I just had a specific partition configuration that was brought in during %pre, but this caused the filesystem formatting section of kickstart to take several hours […]

Seeing RabbitMQ memory usage

While working in our RabbitMQ environment this week, we noticed that there was a large, unexplained amount of memory in use by RabbitMQ that we couldn’t account for by normal queue and message use. One of the first tools we use when poking around Erlang and RabbitMQ is to do a memory dump. $ sudo […]

Hadoop DataNode logs filling with clienttrace messages

So, you’re probably like me. You have a shiny, new Cloudera Hadoop cluster. Everything is zooming along smoothly. Until you find that your /var/log/hadoop datanode logs are growing at a rate of a bazillion gigabytes per day. What do you do, hot shot? WHAT DO YOU DO? Actually, it’s pretty simple. We were getting alerts […]

Hadoop, facter, and the puppet marionette

I’ve been working with puppet a lot lately.  A lot.  It’s part of my job.  We’ve been setting up a new hadoop cluster in our Xen environment.  Nothing big.  It started out with 4 nodes, all configured the same way (3 drives each).  We added an additional 2 nodes with 2 drives each.  This, of […]

Playing with the perl RTM client.

I began playing with a perl-based RememberTheMilk command line tool today from http://www.rutschle.net/rtm/.  There aren’t any RPMs of it that I found, so I ended up building some.  This is what I had to do to get it to the stage of at least working. :;  sudo cpan2rpm WebService::RTMAgent — cpan2rpm – Ver: 2.028 — […]

The “Enterprise” …

From a discussion with a few peers in the industry.  I was entertained. peer> Now when I hear someone use the word “enterprise”    as an adjective, I have to ask them which of the four meanings    they intend: peer> 1.  defunct and destroyed (the Enterprise aircraft    carrier from WW2) peer> 2.  ancient and nearly dead […]

Lack of backup foils Va.’s new IT system | Richmond Times-Dispatch

“Every time we’re down for an hour, that’s about 2,500 people inconvenienced,” Smit said. “They’re blaming my people for it and [state IT officials] have an obligation to fix it.” Lack of backup foils Va.’s new IT system | Richmond Times-Dispatch. One of the things we’ve been grappling with lately is some unfortunate unplanned outages […]