Manuscript received September 15, 2023; revised November 26, 2023; accepted January 19, 2024; published April 26, 2024
Abstract—This study employs OpenAI’s Whisper to explore the manifestation of variance in an Automatic Speech Recognition (ASR) system. Three trained languages from Whisper’s current offerings (English, French, and Haitian Kreyòl) and one untrained (Saint Lucian Kwéyòl) completed thirty consecutive runs each, across five model sizes. Etymologically complex yet orthographically simple, mutually intelligible languages may challenge ASR system capabilities. However, a phonetically similar trained language model generated approximate phonetic transcripts for an untrained one. Despite implicit variance hurdles like non-determinism and data deficiencies, ASR systems may aid in documenting high-orality, low-resource languages.
Keywords—automatic speech recognition, creole, low-resource languages, Whisper
Cite: Laurel Lord and Mark Newman, "Automatic Speech Recognition Variance: Consecutive Runs of Low-Resource Languages in Whisper," International Journal of Machine Learning vol. 14, no. 2, pp. 43-47, 2024.
Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).