AI Glossary

METEOR

METEOR is a text-generation metric that scores overlap with a reference more flexibly than BLEU — it credits synonyms and word-stem matches, not just exact words, and accounts for word order. It was designed to correlate better with human judgment on translation.

· Chain of Thought

AI Evaluation & Reliability

METEOR (Metric for Evaluation of Translation with Explicit ORdering) was built to fix BLEU’s bluntness. Where BLEU only counts exact n-gram matches, METEOR also credits matches on word stems and synonyms, and adds a penalty based on how fragmented the matching is, so it rewards correct content even when the wording differs and the order shifts. The aim was tighter agreement with human ratings, especially at the sentence level.

It still needs reference text and still measures surface similarity rather than true meaning, so it sits between the pure n-gram metrics (BLEU, ROUGE) and embedding-based ones (BERTScore). It’s most useful for translation and similar reference-based tasks; for open-ended generation, semantic metrics or LLM-as-a-judge tell you more.