Sample Efficient Cost-Aware Active Learning




Journal Title

Journal ISSN

Volume Title



The grand vision of Artificial Intelligence is to build agents that can continuously learn from experience and reason about how to act in the world. Traditional machine learning models generally assume access to huge amount of data. This allows effective learning from huge historical data. However, certain domains are inherently sparse, for example, healthcare. It is essential to build machine learning models that can learn and reason effectively in such sparse domains. This dissertation aims to address the problem of efficient learning with human in the loop in sparse, noisy and structured domains. To this effect, the dissertation has two different yet related directions. The first direction focuses on identifying the most informative training instances and features for learning in data-scarce domains. For example, in a medical domain, certain features like MRI, lab tests, genetic sequencing etc., are expensive. In such cases, it becomes crucial to identify the appropriate set of instances (clinical subjects) for whom these expensive features need to be solicited. It is also challenging to identify the appropriate set of features for different subjects since collecting all the expensive features might not be feasible due to budgetary restrictions (both time and cost budgets). We aim to develop strong predictive models at a reasonable cost. The second direction focuses on learning to act in noisy, structured domains. Deciding how to act in complex domains like healthcare should take into account the rich relational structure that exists between various interrelated entities present in the domain. For example, a patient’s current medical condition depends on the medical history of his/her family members. By capturing the existing structure in such domains, an agent can learn how to act and generalize well to unseen conditions. In this dissertation, to handle the challenge of data-scarcity, a unified active learning framework is developed which can identify the most informative samples for whom the expensive feature subset needs to be elicited. Further, the challenge of feature elicitation is addressed by identifying important feature subsets for different clusters of similar instances. Feature acquisition cost is taken into account in an optimization framework to handle the trade-off between acquisition cost and model performance. To address the second direction, an efficient symbolic reinforcement learning algorithm is developed to learn utility functions in complex structured domain. This approach is capable of capturing the various relations that exists among entities, thus resulting in learning effective and generalizable policies. Addressing these challenges in predictive modelling and decision making can help in building smart and resource efficient AI agents that can reason well in data scarce and structured domains.



Machine learning, Active learning, Reinforcement learning