Machine learning as massive search
Machine learning is the inference of general patterns from data. Machine-learning algorithms search large spaces of potential hypotheses for the hypothesis that best fits the data. Since the search space for most induction problems grows exponentially in the number of features used to describe the data, most induction algorithms use greedy search to minimize search cost. Greedy search is a polynomial-time algorithm that achieves its efficiency by exploring only a tiny fraction of all hypotheses. While greedy search has good performance, it often misses the best hypotheses.This thesis proposes massive search as an alternative to greedy search. Massive search aggressively searches as many hypotheses as possible in the time available. Since massive search explores a larger portion of the hypothesis space, it is less likely to miss good hypotheses. This thesis develops a massive-search algorithm for rule learning called Brute. Experiments with Brute show that massive search is both practical and effective. Brute can completely search the hypothesis spaces of most benchmark problems in only a few minutes. Brute learns better rules than greedy search on 13 of 18 databases, while performing equally well on the remaining five. We demonstrate massive search's wide applicability by extending Brute to handle data-mining and classification problems with comparable results.