AGENDA
Hive and Cassandra: Open Source Tools for Managing Data at Web Scale
Jeff Hammerbacher
Abstract: Jeff Hammerbacher will discuss two projects produced by members of the Data team at Facebook and released to the open source community under the Apache 2.0 software license. Hive is a data warehousing framework built above Hadoop which manages hundreds of terabytes of data, provides a SQL-like interface to the end user, and is meant for offline, analytical processing. Cassandra is incrementally scalable, highly available system for managing structured data that stores tens of terabytes of data across multiple data centers and is meant for online, transactional processing. Both systems have been running in production at Facebook for several months.