AGENDA

Chukwa: A large-scale monitoring system

Jerome Boulon, Andy Konwinski, Runping Qi, Ariel Rabkin, Eric Yang

Abstract: We describe the design and initial implementation of Chukwa, a data collection, monitoring and analysis system for large clusters. Chukwa is built on top of Hadoop, an open source distributed filesystem and MapReduce implementation. Chukwa trades a few minutes of latency between data collection and availability in order to scale to thousands of nodes and beyond. Chukwa also includes a flexible and powerful toolkit for querying and processing collected data. These tools support a flexible interface for displaying monitoring and analysis results, enabling human decision makers to operate and optimize the clusters being monitored.