Medical Question Answering and Patient Cohort Retrieval
With the advent of the electronic health record (EHR), there has been an explosion of rich medical information available for automatic and manual analyses. While the majority of current medical informatics research focuses on easily accessible structured information stored in medical databases, it is widely believed that the majority of information in EHRs remains locked within unstructured text. This dissertation aims to present research that will unlock the knowledge encoded in clinical texts by automatically (1) identifying clinical texts relevant to a specific information need and (2) reasoning about the information encoded in clinical text to answer medical questions posed in natural language. Specifically, we address the tasks of medical question answering -- analyzing the knowledge encoded by EHRs documenting medical practice and experience as well as medical research articles to automatically produce answers to medical questions posed by a physician -- and patient cohort retrieval -- identifying patients who satisfy a given natural language description of specific inclusion and exclusion criteria. Novel systems addressing both of these task are presented and discussed. Moreover, this dissertation presents a number of approaches for overcoming some of the most significant complexities of processing electronic health records. We present new approaches for (1) modeling the temporal aspects of electronic health records -- that is, the fact that the clinical picture of a patient varies throughout his or her medical care -- and show how these approaches can be used to infer, represent, and predict temporal interactions of clinical findings and observations; (2) inferring underspecified information and recovering missing sections of records; and (3) applying machine learning to learn an optimal set of relevance criteria for a specific set of information needs and collection of clinical texts. Combined, this work demonstrates the importance of harnessing the natural language content of electronic health records and highlights the promise of medical question answering and patient cohort retrieval for enabling more informed patient care and improved patient outcomes.