Exploration of Podcast Corpora, Summarization, and Search
Abstract
Abstract
Podcasts have emerged as an increasingly ubiquitous form of media. This new medium carries
several idiosyncrasies, such as multiple speakers, varying audio quality, oscillating topics,
(etc.). As podcast consumption grows, so too does the need for knowledge and algorithms
to apply to this burgeoning data space. We focus on two useful data tasks: summarization
and search, developing methods to tackle both problems and discuss how existing methods
in both areas can be tailored to podcast data. Specifically, we use Spotify’s podcast dataset,
comprising episodes from their ever-growing database of podcasts, as a case study in the
data space. Also, we explore this novel dataset, drawing several judgements and patterns
regarding the nature of podcast data. Then, we conclude by considering future work and
improvements as podcast data continues to grow and its analysis matures.