Word Error Rate (WER)
WER measures speech-recognition accuracy as the share of words a transcript got wrong — the insertions, deletions, and substitutions needed to fix it, divided by the number of words spoken. Lower is better, and unlike most metrics it can exceed 100%.
Also known as: WER
Word Error Rate is the standard metric for speech-to-text and other transcription tasks. You align the system’s output with the reference transcript and count three kinds of mistakes — substitutions, insertions, and deletions — then divide by the number of words in the reference. A WER of 0 is perfect; because errors can outnumber the actual words (lots of spurious insertions), it can climb above 100%.
Its blind spot is that it treats every word equally: getting a critical name wrong counts the same as a dropped “the,” even though one ruins the transcript and the other doesn’t. The character-level version, CER (Character Error Rate), applies the same idea to characters and is used for languages or tasks where word boundaries are fuzzy.