Back to news
ai Priority 4/5 6/10/2026, 11:05:15 AM

ServiceNow AI Releases AU-Harness Benchmark to Evaluate Code-Switching in Automatic Speech Recognition

ServiceNow AI Releases AU-Harness Benchmark to Evaluate Code-Switching in Automatic Speech Recognition

ServiceNow AI has released AU-Harness, a benchmark dataset and evaluation toolkit designed to assess Automatic Speech Recognition systems on code-switching speech. Code-switching occurs when bilingual speakers seamlessly alternate between languages within a single utterance, a common behavior in international customer service centers and multilingual workplaces. Despite its real-world prevalence, conventional ASR benchmarks frequently assume a single primary language, leading to performance degradation in practical deployments. The benchmark evaluates model performance using four distinct language pairs that mix English with Spanish, French, and German. The datasets represent typical IT support and human resources dialogues, containing spoken utterances ranging from 12 to 40 words. By testing seven modern speech models, including OpenAI's Whisper and several Large Audio-Language Models, the research revealed that recognition accuracy varies significantly depending on the language pair and the length of word embeddings. Traditional commercial ASR models often struggle when forced to parse multiple languages dynamically, resulting in high Word Error Rates. AU-Harness provides a standardized framework to quantify these error rates under realistic, mixed-language conditions. This benchmark offers developers and enterprise architects concrete metrics to guide the selection and fine-tuning of speech-to-text models for global applications.

Related tools

Recommended tools for this topic

These picks prioritize high-intent tools relevant to this topic. Some links may include partner or affiliate tracking.

#asr#servicenow#huggingface#benchmark#multilingual

Comparison

AspectBefore / AlternativeAfter / This
Language AssumptionAssumes a single, pre-declared primary language for the audio streamAccommodates dynamic language switching mid-utterance (code-switching)
Evaluation ContextGeneral-purpose read speech or monolingual conversational datasetsDomain-specific enterprise dialogues (IT support and Human Resources)
Performance Metric FocusStandard global Word Error Rate (WER)WER variations analyzed across specific language pairs and embedding lengths

Source: Hugging Face Blog

This page summarizes the original source. Check the source for full details.

Related