Exploration of Podcast Corpora, Summarization, and Search

dc.contributor.advisorKhan, Latifur
dc.creatorPerez, Mathew
dc.date.accessioned2021-02-05T20:30:54Z
dc.date.available2021-02-05T20:30:54Z
dc.date.created2020-12
dc.date.issued2020-11-17
dc.date.submittedDecember 2020
dc.date.updated2021-02-05T20:30:55Z
dc.description.abstractPodcasts have emerged as an increasingly ubiquitous form of media. This new medium carries several idiosyncrasies, such as multiple speakers, varying audio quality, oscillating topics, (etc.). As podcast consumption grows, so too does the need for knowledge and algorithms to apply to this burgeoning data space. We focus on two useful data tasks: summarization and search, developing methods to tackle both problems and discuss how existing methods in both areas can be tailored to podcast data. Specifically, we use Spotify’s podcast dataset, comprising episodes from their ever-growing database of podcasts, as a case study in the data space. Also, we explore this novel dataset, drawing several judgements and patterns regarding the nature of podcast data. Then, we conclude by considering future work and improvements as podcast data continues to grow and its analysis matures.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10735.1/9176
dc.language.isoen
dc.subjectPodcasts
dc.subjectSearch engines
dc.subjectCorpora (Linguistics)
dc.titleExploration of Podcast Corpora, Summarization, and Search
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.levelMasters
thesis.degree.nameMSCS

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PEREZ-THESIS-2020.pdf
Size:
588.28 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: