Knowledge Based Speaker Analysis Using a Massive Naturalistic Corpus : Fearless Steps Apollo-11




Apollo-11 was the first manned space mission to successfully bring astronauts to the moon and return them safely. As part of NASA’s goal in assessing team and mission success, all voice communications within Mission Control, Astronauts, and support staff, were captured using a multi-channel analog system, which until recently had never been made available. For such time and mission-critical naturalistic data, there is an extensive and diverse speaker variability, which impact performance of speaker recognition and diarization technologies. Hence, analyzing and assessing speaker recognition for this dataset has the potential to contribute to improved speaker models for such corpora and address multi-party speaker situations. In this study, a small subset of 100 hours derived from a collective 10,000 hours of the Fearless Steps Apollo-11 audio data were investigated, corresponding to three challenging phases of the mission: Lift-Off, Lunar-Landing and Lunar-Walking. A speaker recognition assessment is performed on 140 speakers from a collective set of 183 NASA mission specialists who participated, based on sufficient training data obtained from 5 (out of 30) mission channels. Since Apollo data consists of variable speaker turn duration per each speaker, analysis on how limited vs. sufficient train duration per speaker model could impact alternate baseline systems is investigated. Furthermore, observations for test duration were made by testing these trained speaker models with very short to long duration test segments. Speaker models trained on specific phases are also compared with each other to determine how stress, g-force/atmospheric pressure etc, can impact the robustness of the models. This represents one of the first investigations on speaker recognition for massively large team based communications involving naturalistic communication data



Apollo 11 (Spacecraft), Speech processing systems, Automatic speech recognition, Speech synthesis


