Finding Features Via Hull Approximation
With the recent explosion in data collection, big datasets are becoming more difficult to handle. The size of these datasets can cause algorithms to become intractable in time or space, but noise and outliers can also cause machine learning models to overfit data, yielding poor general performance. Additionally, applications that require users or experts to interpret or inspect data also suffer from big datasets as the sheer size makes it impossible for a human to grasp. These considerations motivate reducing the size and complexity of a dataset by selecting key features which allow one to compactly represent the data. This dissertation approaches the basic task of simplifying a dataset from a geometric angle. First, we view datasets from several types of applications as being naturally modeled by three geometric hulls: the staircase hull, the convex hull, and the conic hull. Then, the goal of this dissertation is to research geometric properties and algorithms related to simplifying each of these hulls which in turn results in a simplified dataset. Further, the dissertation provides experimental evidence that show these methods are useful in practice.