What is Internal Coherence Maximization (ICM)?

June 13, 2025

Internal Coherence Maximization (ICM) is a method for fine-tuning language models without needing any human-provided labels. Developed by Anthropic, it helps a model improve itself using only unlabeled data by leveraging its own ability to find the most internally consistent answers.

In practice, the model starts with a batch of inputs like math problems or factual questions - without knowing the correct answers. It then generates possible answers on its own and evaluates how consistent those answers are with each other. The core idea is that if a set of answers logically supports itself and follows any known task rules, it’s more likely to be correct.

The algorithm searches for the most coherent set of answers, filtering out contradictions and low-quality guesses. It then treats these high-coherence answers as if they were verified labels and uses them to fine-tune itself, just like traditional supervised learning but with no humans in the loop.

ICM has shown strong results on tasks such as math problem solving and truthfulness benchmarks, often matching or exceeding the performance of models fine-tuned with real human data. Anthropic has also used ICM to train a reward model that guides further improvements, including for Claude 3.5 Haiku, which outperformed a version trained with human feedback.

In short, Internal Coherence Maximization enables a language model to teach itself by picking its most self-consistent responses and using those to get better over time with no extra human effort required.

To learn more about ICM, read the report Unsupervised Elicitation of Language Models.