Finding Features Via Hull Approximation
Abstract
Abstract
With the recent explosion in data collection, big datasets are becoming more difficult to
handle. The size of these datasets can cause algorithms to become intractable in time
or space, but noise and outliers can also cause machine learning models to overfit data,
yielding poor general performance. Additionally, applications that require users or experts
to interpret or inspect data also suffer from big datasets as the sheer size makes it impossible
for a human to grasp. These considerations motivate reducing the size and complexity of a
dataset by selecting key features which allow one to compactly represent the data.
This dissertation approaches the basic task of simplifying a dataset from a geometric angle.
First, we view datasets from several types of applications as being naturally modeled by
three geometric hulls: the staircase hull, the convex hull, and the conic hull. Then, the goal
of this dissertation is to research geometric properties and algorithms related to simplifying
each of these hulls which in turn results in a simplified dataset. Further, the dissertation
provides experimental evidence that show these methods are useful in practice.