17 January 2012


For the next 24 hours this blog is censored.  This is what blogs look like on SOPA (and PIPA).  Read up and make your own decision:  http://projects.propublica.org/sopa/ .  And if you agree, join Wikipedia, Reddit, and others by "going dark" for 24 hours.  You can support the protest via Facebook, Twitter, and Google+ by changing your profile picture:  http://www.blackoutsopa.org/

10 February 2010

Hadoop HDFS: Deceived by Xciever

It's undocumented. It's misspelled. And without it your (insert euphemism for "really big" here) Hadoop grid won't be able to crunch those multi-hundred gigabyte input files. It's the "xciever" setting for HDFS DataNode daemons, and the absence of it in your configuration is waiting to bite you in the butt.

The software components that make up HDFS include the NameNode, the SecondaryNameNode, and DataNode daemons. Typical installations place NameNode and SecondaryNameNode on a single "master" node. Installations following best practices go the extra mile and place the NameNode and SecondaryNameNode daemons on separate nodes. DataNode daemons are placed on participating slaves nodes. At the highest level the NameNode/SecondaryNameNode daemons manage HDFS metadata (the data used to map file names to server/block lists, etc.), while the DataNode daemons handle the mechanics of read/writing to disk, and serving up blocks of data to processes on the node or to requesting processes on other nodes.

There are many settings that can be use to configure and tune HDFS (as well as the MapReduce engine and other Hadoop components). The HDFS documentation lists 45 of them (along with their default values) last time I checked. These configuration settings are a somewhat disorganized mix of elements used to configure the NameNode and DataNodes.

The DataNode daemons have an upper limit on the number of threads they can support, which is set to the absurdly small (IMHO) value of 256. If you have a large job (or even a moderately-sized job) that has many files open you can exhaust this limit. When you do exceed this limit there are a variety of failure modes that can arise. Some of the exception traces point to this value being exhausted, while others do not. In either case, understanding what has gone wrong is difficult. For example, one failure mode observed
java.io.EOFException, with a traceback flowing down into DFSClient.

The solution to all this is to configure the Xcievers setting to raise the maximum limit on the number of threads the DataNode daemons are willing to manage. For modern Hadoop releases this should be done in conf/hdfs-site.xml. Here is an example:

< property > 
    < name >dfs.datanode.max.xcievers< /name > 
    < value >4096< /value > 
< /property >

[Notez-bien: I apologize for the lame formatting above. I really need to improve my template]

Interestingly, HBase users seem to have more conversations about this setting and related issues than do predominantly MapReduce users.

The central reason for this posting is to point out this setting as it might be helpful to others. We encountered this problem while porting from an older version of Hadoop to 0.20 (via the Cloudera distribution, which I highly recommend). The failure mode was not immediately evident and we wasted a lot of time debugging and chasing assorted theories before we recognized what was happening and changed the configuration.  Ironically the bread crumbs which led us to change this setting came from a JIRA I opened regarding a scalability issue in 2008.  As I recall the setting was not supported by the version of Hadoop we were using at that time.

A secondary reason for this posting is to point out the crazy name associated with this setting. The term "xceiver" ("i" before "e" except after "c" blah blah blah...) is short-hand for transceiver, which means a device which transmits and receives. But does the phrase "transceiver" really describe "maximum threads a DataNode can support"? At the very least the spelling of this setting should be corrected and the underlying code should recognize both spellings. What would be even better would be add a setting called "transceivers" or even more explicitly "concurrent_threads" like this:


Finally, why is the default value for this setting so low?  Why not have it default to a more reasonable value like 2048 or 4096 out of the box?  Memory and CPU is cheap, chasing infrastructure issues is expensive.

So summing it all up, lack of documentation and poor spelling of this setting caused us to lose hours of productivity.  If you're deploying a big grid, be sure to configure this properly.  If you're seeing strange failures related to I/O within your Hadoop infrastructure be sure to check for this value, and potentially increase it.

31 December 2008

Down Time == Productive Time

Back in my college days - long enough ago that mainframe computing was still the rage - I discovered the pleasure of The Holiday Break.  Classes were over, everybody went home, the university was largely empty, and as a result you could grab all the computing time you wanted, and you could work for hours undisturbed.   The Christmas holiday was always the best. 

Fast forward to present day, and I continue to find the time from just before Christmas until just after the New Year a sort of magic time to get things done.  Many offices are shut down for the holiday, so my unpredictable 45-minutes-or-maybe-3-hours commute becomes predictable.   The shock jocks and on-air "personalities" on drive time FM radio are all on vacation, and so the radio actually plays music.  Things in the office are quiet - folks don't normally schedule releases, death marches, etc., around the holidays.  The time of year puts people in a mellow mood.  All-in-all, it's a great time of year to grab some quiet time.

Quiet time becomes productive time in unexpected ways.  Since I'm not in the throes of a major release crunch I actually have time to catch up on some reading.  Today, for example, I read about  20-30 pages from some of the books shown on my current reading list at the top of the blog.  I also read a few great blog posts about a number of things technical.  So great, I'm reading and surfing...but is it productive?

Oh yeah, it sure is.   I concocted a new way to visualize how well or poorly a very complicated n-tier application is performing, in part based on some blog reading.  Very cool stuff, for which I cannot take all the credit, but about which I will blog more in the near future.

I also got a couple hours to write up some much needed documentation, and spent time catching up with colleagues who actually have time to talk about what they are working on and what sorts of challenges they are facing.  This naturally leads to more ideas, and the ball starts rolling.

People often talk about the December holidays as down-time for business, and I'm sure that it is.  But the flip-side of the coin is that it can be a great time to open up the mind, solve some problems, and come up with good stuff to tackle in the upcoming year.   Down time offers the chance to productively daydream without (usually) getting off schedule.  So for engineers, down time can be super productive time.  And maybe that's why when the weather gets cold, and the Christmas bell-ringers appear on every street corner, my mind turns to thoughts of  software design, architecture, and coding.  

03 June 2008

Visible Measures wins 2008 MITX Technology Award!

Visible Measures (my current gig) won an award tonight at the Massachusetts Innovation and Technology Exchange (MITX) 2008 What's Next Forum and Technology Awards. We were recognized in the Analytics and Business Intelligence category - the same category that Compete (my old company) was entered in last year.

A lot of great companies were finalists in our category, including Progress Software, salary.com, SiteSpect, and Lexalytics. This was tough competition, which made winning this award all the more sweet. A big shout out to Version 2 Communications as well - we were their guests at the awards.

Visible Measures is an awesome company, with an extremely hard-working highly motvated team. I am extremely proud and humbled to be part of this company.

The MITX event was very nice. There were plenty of opportunities to network with a lot of interesting people doing a lot of cool stuff. It was great to listen to Larry Weber (Chairman of the Board for MITX and founder of W2 Group) host the awards and dispense free advice ("...with 37 offices worldwide - that's too much overhead..."). MITX honored Amar Bose, who gave a very interesting talk. Bose is legenday - at least in the New England high tech community and particularly within MIT, so hearing him speak live is a privilege.

The only downside to the evening was the fire alarm going off mid-way through the ceremony. This lead to a rather awkward pause in the action while the fire department made sure nothing was wrong.

22 April 2008

Hadoop Summit Slides

A few weeks ago I went to California for the Hadoop Summit. I posted a bunch of notes in real-time during the summit until the network connection became too flakey to continue.

The Yahoo folks have come to the rescue however. The slides from the presentations, which are tons better than my notes, are freely available on-line here. There are also slides from the Data Intensive Computing Symposium which was held the next day.

I wish I had know about the Data Intensive symposium as it looks really really interesting (not to mention an excuse to stay in Califorinia one more day...).

10 April 2008

Infrant/NetGear ReadyNAS NV+

Just picked up one of these last week. The plan is to use the box as a shared storage resource to back up family data (pictures, etc.), and to back up other systems, and the grid machines in the rack.

I was originally going to build a box to handle the task, but a friend of mine recommended the ReadyNAS server as a cost effective (and less labor intensive) alternative. This box is basically plug-and-play...the operating system is delivered in firmware, and you configure and operate the box via a web interface and with a program called RAIDar. The box speaks a variety of protocols and can talk to Windows, Linux, Macs, and streaming media players so it should get along well with all the servers, workstations, etc.

I bought a diskless version, and populated it with 2x500G Western Digital drives. Initially nothing worked and for a brief time I thought the server was DOA. After a bunch of trial and error I concluded that one of the WD drives was DOA. I brought the box up on 1 drive, configured things, and it just worked. NewEgg RMA'd the bad drive (and even gave me freebie shipping label to send the bad device back...good stuff).

I've got 2 more 500G drives arriving tomorrow - the box is hot-pluggable so in theory installation is simple. It should be interesting to get the box up to 2T with X-RAID and do some performance testing.

Product reviews of the ReadyNAS have been widely varied, but so far all things look positive. I'll post more about the box once I get my bad drive issues sorted out...