Framework for Deep Learning on Healthcare Time Series Data
With recent advances in artificial intelligence, there is an increased demand in the adaptation of deep learning for decision support systems in consumer applications. Despite the success, their widespread acceptance, especially in the critical domain of healthcare, has met with resistance due to challenges in the efficient development of deployable systems. In healthcare, it is important that not only are the deployed systems high performing, but also are unbiased and provide a functional understanding of their outcomes in critical decisions. Furthermore, owing to the distributed nature of healthcare data and requirement of privacy, there is an acute shortage of good quality data required to train the data hungry black-box deep learning models which leads to model drift and lack of generalization on deployment. Predominant research to handle these challenges individually has been focused primarily on 2D data modality such as X-rays with nascent interest in the 1D time series data modality. A prime example of challenging 1D time series physiological signal is Electrocardiogram (ECG), used to diagnose various health conditions such as arrhythmia. The multi-channel structured nature of ECG signals with spatial P, Q, R, S, T peak features and the distance between the peaks of the corresponding beats as temporal features motivate their use in empirical evaluation of our framework. This kind of structured spatial and temporal information in time series data makes it more challenging to learn. This dissertation presents a modular framework to address the predominant challenges en- abling the development of unbiased, explainable, data efficient and high performing health- care systems for physiological signal classification. Each module of our framework is focused on addressing one of five challenges namely model explanations, data availability, data qual- ity, data and model bias, and performance for the development of deep learning decision support systems. To the best of our knowledge, this is the first approach in the use of explanations to not only quantify and benchmark 1D Convolution Neural Network model capacity and quality but also use the generated performance explanations to assert qual- ity of data samples maximizing the performance and reducing any derogatory effects. This allows for efficient model development and functional understanding by developers of the learned spatial, temporal, frequency, and clinical features. Additionally, to address data limitations in the iterative development of decision support systems, we present a method to generate synthetic ECG signals for multiple classes of various lengths and demonstrate significant improvement in model performance using our synthetic data for augmentation. Our framework also provides a novel tool to interpret and mitigate the presence of bias in time series datasets and its amplification on training deep learning models for developers. While the above modules handle various challenges, the models also need to be high per- forming in classification outcomes for consumer applications. We achieve state-of-the-art accuracy on the task of arrhythmia classification leveraging the knowledge from both single and multi-channel models through meta-learning. Our framework through its various modules provides a thorough evaluation of model per- formance capacity to developers and equips them with tools and methods to address the predominant challenges in the development of deployable healthcare decision support sys- tems.