Functionally homogeneous clustering: a framework for building scalable data-intensive internet services

dc.contributor.authorSaito, Yasushi, 1967-en_US
dc.date.accessioned2009-10-06T16:53:57Z
dc.date.available2009-10-06T16:53:57Z
dc.date.issued2001en_US
dc.descriptionThesis (Ph. D.)--University of Washington, 2001en_US
dc.description.abstractThis dissertation proposes functionally homogeneous clustering (FHC), a new software architecture for building data-intensive Internet services that are manageable, available, fast, and inexpensive. FHC lets any node in the cluster manage any function and any piece of data, freeing humans from making specific decisions about the workload distribution. Its dynamic and self-regulative nature is the key to its scalability.This dissertation also presents three mechanisms that synergistically realize this architecture: automatic reconfiguration, high-throughput replication, and fine-grain load balancing. FHC offers an efficient and scalable automatic reconfiguration mechanism for redistributing functions and data after configuration change. It ensures that users can access all the data on live nodes after any number of failures. The replication mechanism stores important data on multiple disks with small overhead, while ensuring the consistency of their contents. The load balancing mechanism distributes incoming data evenly among nodes and masks the non-uniformity in the workloads and the cluster configuration.FHC scales without sacrificing its service quality by taking advantage of the semantics of data-intensive Internet services. For example, the name database used to locate on-disk data is stored only in memory and is recomputed after failure by scanning disks. While such a design makes the contents and operations of the name database application-specific, it makes the system fast and robust. Our replication algorithm also takes advantage of application semantics and ensures only eventual data consistency. In return, this strategy makes the system extremely resilient against failures.We develop the Porcupine email server as proof of the concept of functionally homogeneous clustering. Porcupine distributes user management and email message storage dynamically to maximize system throughput and ensure continuous service to all users. It replicates the user profile and email messages to ensure their availability. We evaluate the manageability, availability, and performance of Porcupine on a 30-node PC cluster. Through the evaluation, we show that Porcupine's performance indeed scales well and that it reacts to configuration changes gracefully and quickly. We also show that Porcupine's load balancing service efficiently utilizes heterogeneous hardware resources and handles non-uniform workloads by automatically discovering idle resources in the cluster.en_US
dc.format.extentxiv, 191 p.en_US
dc.identifier.otherb46248298en_US
dc.identifier.other48445735en_US
dc.identifier.otherThesis 50558en_US
dc.identifier.urihttp://hdl.handle.net/1773/6936
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.rights.urien_US
dc.subject.otherTheses--Computer science and engineeringen_US
dc.titleFunctionally homogeneous clustering: a framework for building scalable data-intensive internet servicesen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3014023.pdf
Size:
7.99 MB
Format:
Adobe Portable Document Format