25 March 2008

Hadoop Summit: Kevin Beyer and JAQL...

Kevin Beyer is from IBM Almaden Research Center

JAQL (pronounced "jackel") - Query Language for JSON

JSON: JavaScript Object Notation: simple, self-describing and designed for data
Why: want complete entity in one place, support schema that vary or evolve over time; standard: used in web 2.0 applications, bindings available for many languages, etc.; not XML - XML designed for doc markup, not for data (hooray, thanks for saying THAT).

JAQL processes any data that can be interpreted as JSON (JSON text files, binary, CSV, etc.). Internally, JAQL processees binary data-structures.

JAQL similar to Pig Latin, goal is to get JAQL accepted anywhere JSON might be used (document analytics, doc management [couchDB], ETL, Mashups,....

Immediate Goal: Hide grungy details of writin map-reduce jobs for ETL and analysis workloads; compile JAQL queries into map/reduce jobs.

JAQL Goals: designed for JSON data, functional query language (few side effects, not a scripting language - set-oriented, highly transformed) , composable expressions, draws on other languages, operator plan -> query (rewrites are transforms within the language, any plan is representabke in the language itself).

Core operations: iterations, grouping, joining, combining, sorting, projection, constructors for arrays, records, atomic values, "unnesting", function definition/evaluation.

Some good examples were presented that are too long to type in...wait for the presentations to appear on-line I guess...sorry. Good stuff though, I am liking the language presented more than Pig Latin.

ImplementationL JAQL input/output designed for extensibility...basically reads/writes JSON values. Examples: Hadoop InputFormat and OutputFormat.

Roadmap: Another release is imminent, next release this summer (open the source, indexing support, native callout, server implementation with a REST interface).


No comments: