Fuzzy Rough Set Approximations in Large Scale Information Systems

Asfoor, Hasan M.

Fuzzy Rough Set Approximations in Large Scale Information Systems

dc.contributor.advisor	De Cock, Martine	en_US
dc.contributor.author	Asfoor, Hasan M.	en_US
dc.date.accessioned	2015-05-11T20:27:19Z
dc.date.available	2015-05-11T20:27:19Z
dc.date.issued	2015-05-11
dc.date.submitted	2015	en_US
dc.description	Thesis (Master's)--University of Washington, 2015	en_US
dc.description.abstract	Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. In this thesis, we present a parallel and distributed solution implemented on both Apache Spark and Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem. We also present two distributed prototype selection approaches that are based on fuzzy rough set theory and couple them with our distributed implementation of the well known weighted k-nearest neighbors machine learning prediction technique to solve regression problems. In addition, we show how our distributed approaches can be used on the State Inpatient Data Set (SID) and the Medical Expenditure Panel Survey (MEPS) to predict the total healthcare expenses of patients.	en_US
dc.embargo.terms	Open Access	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.other	Asfoor_washington_0250O_14189.pdf	en_US
dc.identifier.uri	http://hdl.handle.net/1773/33133
dc.language.iso	en_US	en_US
dc.rights	Copyright is held by the individual authors.	en_US
dc.subject	approximations; big data; fuzzy rough set; machine learning; MPI; Spark	en_US
dc.subject.other	Computer science	en_US
dc.subject.other	computer science and engineering	en_US
dc.title	Fuzzy Rough Set Approximations in Large Scale Information Systems	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Asfoor_washington_0250O_14189.pdf
Size:: 1.76 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering