Scalable clustering algorithms and optimization methods for parallel architectures
MetadataShow full item record
Clustering algorithms provide a way to analyze and understand huge amount of data that is present and evolving today in various areas such as sciences, engineering, marketing, finance, etc. Although numerous serial clustering approaches have been developed, only few of them are viable nowadays given algorithm complexities and sizes of problems. The focus of this dissertation are the parallel optimization methods and tuning techniques that will enable the computing society to perform clustering of massive data on shared memory parallel architectures. The first part of the dissertation investigates clustering methods and their parallel optimization for data represented by coordinates on a two-dimensional Cartesian plane. These clustering methods are based on the well-known plane sweep algorithm that scans a coordinate plane with a goal to find intersections of the rectangles. The applications of these algorithms find their places in earth surface processing, very large scale integration (VLSI) design and military surveillance, just to cite a few. The contribution presented in this dissertation is a simple and highly scalable plane sweep algorithm named Scan-List (SL). Despite this method having a higher order of sequential operation complexity than other well-known serial algorithms, its scalability allows it to surpass those algorithms when run in parallel. This algorithm is profiled against the best-known plane sweep method and was applied to tests generated using industrial Electronic physical design automation (EDA) tools. The second part illustrates clustering methods focused on data represented as network systems or graphs with node and edge sets. The practical applications of graph clustering algorithms are nowadays popular social networks, biological research, marketing and financial products. The methodology presented in this thesis investigates and filters only those clustering algorithms that produce acceptable qualities while being low complexity methods and are capable on working with massive data. A well-known Label propagation algorithm (LPA) is characterized to be a unique algorithm that meets the requirements and in addition is capable of efficient scaling on shared memory parallel architectures. The present deficiencies of LPA are addressed to achieve the linear scalability on a conventional shared memory Intel® Xeon architecture. Intel® Xeon Phi (Phi) Label Propagation algorithm (PLPA) is presented. PLPA is the first community detection algorithm implemented on a Phi that is a novel many integrated core architecture. PLPA is fully scalable up to 32 cores and achieves above 100 speedup on Phi with a maximum of 56 cores and 224 hardware threads while maintaining the quality of detected communities. The analysis as to why the speedup is limited by the Phi hardware, and how this shall be resolved in the next generation of MIC based products is provided. The existing possibilities to utilize Phi on massive networks that cannot fully fit in a limited capacity Phi memory are illustrated. A PLPA extension, a modified Phi based LP algorithm PLPA-M to utilize Phi on a network with billions of edges is presented and the scalability analysis is provided. Future opportunities for massively parallel processing of networks using LPA on Phi (multiple cards) and other architectures are analyzed. We provide heuristics for hybrid LPA implementations that would enable LPA on next more advanced generation of heterogeneous Phi platforms and distributed types of architectures. Finally, we provide initial empirical work and set a base for future research.
- Electrical engineering