AGENDA
Sector and Sphere: Towards Large Scale Data Storage, Sharing, and Simplified Processing
Robert Grossman, Yunhong Gu
Abstract: Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming structure. Work to date, including Google’s GFS/MapReduce and Hadoop, has been used to store and process very large datasets, especially from Web related applications. In this paper, we present a new cloud computing software, which consists of the Sector storage cloud and the Sphere compute cloud. In contrast to existing data clouds, Sector supports not only data storage within a data center, but also data distribution across wide area networks. On the other hand, Sphere implements stream processing paradigm to support data intensive applications. Sphere supports all applications that can be done with MapReduce, but it is more straightforward, simpler to use, and is about as twice fast as Hadoop according to our experimental studies. We have released Sector/Sphere as open source software and used it in various real world applications.