Merced Systems, Inc., the company I work at, hosted a great meet up with Scott Carey as a guest speaker.
http://www.meetup.com/MRDesignPattern/
We went though the general concepts and tip/trick about avro
Here is the general outline of the meeting:
What
is Avro?
•Avro
is a serialization framework developed
within Apache's Hadoop project. It uses JSON for defining data types and
protocols, and serializes data in a compact binary format. Its primary use is
in Apache Hadoop, where it can provide both a serialization format for
persistent data.
•Avro
provides good way to convert unstructured and semi-structured data into a
structured way using schemas
The Avro schema used to write data is required
to be available when reading it.
● Fields are not tagged
● Fields are not tagged
-
○ More compact
-
○ Potentially faster
● Code generation is optional.
-
○ Simple implementations can read and write data
-
○ Dynamic, discoverable RPC is also possible (but not
implemented)
● Schema storage explicitly or by reference required.
The compression is awesome!!!
class Card { int number; //ace = 1, king = 13 Suit suit;
}
enum Suit {
SPADE, HEART, DIAMOND, CLUB; }
Java Heap: 24 bytes (32 bit JVM) to 32 bytes Avro binary: 2 bytes
Card card = new Card(); card.number = 1; card.suit = Suit.SPADE;
Avro binary: 0x02 0x00
First byte: the integer 1, encoded
Second byte: the ordinal of the Suit enum (0), encoded
Go Avro!!!!