25 March 2008

Hadoop Summit: Ben Reed and Zookeeper

Zookeeper: General, robust coordination service...

Observations: Distributed systems always need some form of coordination. Programmers cannot use locks correctly (note that google uses "chubby" lock service): you can learn locks in 5 minutes, but spend a lifetime learning how to use them properly; Message based coodination can be hard to use in some apps.

Wants: Simple, robust, good performane; tuned for read-dominant workloads; familiar models and interfaces, wait-free: failed client wil not interfere with requests of a fast client; need to be able to wait efficiently.

Design starting point: start with File API and strip out what we don't need: partial writes/reads, name. Add what is needed: ordered upates and strong persistance guarantees, conditional updates, watched for data changes, ephemeral nodes (client create files, if client goes away the file goes away), generated file names (i.e. mktemp()).

Data Model: Hierarchical namespace, each znode has data and children, data is read and written in its entirety

Zookeeper API: create(), delete), setData(), getData(), exists(), getChildren(), sync()...also some security APIs ("but nobody cares about security"...). getChildren() is like "ls".

Create flags: epheeral - the znode gets deleted with the session that created it times out; sequence the path name will have a monotonically increasing counter relative to the parent appended.

How Zookeeper works: Service made up of a set of machines. Each server stores a full copy of data in-memory....gives very low latency and high throughput. A leader is elected at startup, the rest of the servers become followers. Followers service clients, all updates go through leader. Update responses are sent when a majority of servers agree.

Example of Zookeeper service: Hadoop on Demand. I got distracted by something during this part of the presentation and didn't get all the basic pieces here, but this uses Zookeeper to track interaction with Torque, and handles graceful shutdown, etc.

Status: Project started in 2006, prototyped in fall 2006, initial implementation March 2007, code moved to zookeeper.sourceforge.net and Apache License November 2007. Everything is pure Java: quorum version and standalone version. Clients are Java and C clients.








No comments: