25 March 2008

Hadoop Summit: Andy Konwinski & X-trace

Monitoring Hadoop using X-trace: Andy Konwinski is from UC Berkeley RAD Lab

Motivation: Hadoop style processing masks failures and performance problems

Objectives: Help Hadoop developers debug and profile Hadoop; help operators monitor and optimize Map/Reduce jobs

[CCG: This is cool, we REALLY need this stuf NOW]

RAD Lab: Reliable Adaptive Distributed Systems

Approach: Instrument Hadoop using X-Trace framework. Trace Analysis: virtualizatioin via web-based UI; statistical analysis and anomaly detection

So what's X-Trace: Path-based tracing framework; generate event graph to capture causality of events across network (RPC, HTTP, etc.). Annotate message with trace metadata carried along execution path (instrument protocol APIs and RPC libraries). Within Hadop the RCPC library has been instrumented.

DFS Write Message Flow Example: client -> DataNode1 -> DataNode2 -> DataNode3
Report Graph Node: Report Label, Trace ID#, Report ID3, Hostname, timestamp

builds a graph which can then be walked.

[CCG: again you need to see the picture, but you get the idea right?]

Andy showed some cool graphs representing map/reduce operations in flight...

Properties of Path Tracing: Deterministic causality and concurrence; control over which events get traced; cross-layer; low-overhead; modest modification of the app source code ( <>

Architecture: Xtrace front ends on each Hadoop node, communicates with Xtrace backend via TCP/IP, backend stores data using BDB, trace analysis web UI communicats with Xtrace backend. Also cool fault detection programs can be run - interact with backend via HTTP.

Trace Analysis UI: "we have pretty pictures which can tell you a lot": perf stats, graphcs of utilization, critical path analysis...

Applications of Trace Analysis: Examined perf of Apache nutch web indexing engine oin a Wikipedia cral. Time to create an inverted link index of a 50G crawl - with default configuration, ran in 2 hours.

Then by using trace analysis, was able to make changes to run same workload in 7 minutes. Used workload analysis to determine that one single reduce task which actually fails several times at the beginning: 3 ten minute timeouts. Bumped max reducers to 2 per node, and dropped execution time to 7 minutes.

Behold the power of pretty pictures!

Off-line machine learning: faulty machine detection, buggy software detection. Current work oin graph processing and analysis.

Future work: Tracing more production map/reduce applications. More advanced trace processing tools, migrating code into Hadoop codebase.

No comments: