Return to site

What is Nested Learning?

A Clear Guide to Google's New Continual Learning Paradigm

November 16, 2025

The limits of continual learning have held back even the strongest AI systems. Models learn new skills, but risk losing older ones. This tradeoff is known as catastrophic forgetting. Google Research is proposing a new approach that aims to fix this problem at the root: Nested Learning.

Below is a clear explanation of what Nested Learning is, why it matters, and how it could reshape the future of adaptive AI.

The core idea

Nested Learning reframes an AI model as a collection of smaller learning systems nested inside one another. Each of these systems has its own objective, its own information flow, and its own update speed.

Instead of treating “architecture” and “optimizer” as separate concepts, Nested Learning treats them as levels of the same structure. Each level learns at a different frequency. Each level carries a different kind of memory.

This creates a model that can adjust to new information without overwriting what it already knows.

Why this matters

Current models rely on a single global training process. When you teach them something new, they update all parameters at roughly the same pace. That’s what triggers catastrophic forgetting.

Nested Learning introduces multi-time-scale updates. Some parts of the model update quickly. Others update slowly. And some may not update at all unless necessary.

This mirrors how the brain handles learning. Short-term adjustments happen quickly. More permanent changes happen gradually.

The result is a path toward self-improving systems that remember long-term knowledge while gaining new skills.

Key concepts inside Nested Learning

1. Associative memory everywhere

Nested Learning shows that many components of modern AI can be described as simple associative memory modules. Transformers. Optimizers. Even backpropagation.

Each one is effectively learning a mapping between an input and an associated signal, such as an error or a token relationship.

Seeing these pieces through the same lens makes it easier to design systems with deeper computational depth.

Upskill your understanding of how modern AI systems are built and trained with the Introduction to AI and Machine Learning on Google Cloud course on Coursera. It’s a solid fit if you want to go deeper into concepts like memory systems, model updates, and adaptive architectures, all of which relate directly to how Nested Learning pushes the field forward. Strengthen your grasp of these foundations with the course on Coursera*.

Instead of two types of memory — short-term attention and long-term feedforward weights — Nested Learning defines a spectrum of memories. Each has its own update frequency.

This leads to richer, more stable memory behavior over long contexts.

3. Deep optimizers

If optimizers are also memory modules, their update rules can be redesigned. Google shows that replacing dot-product similarity with more robust loss functions, such as L2 regression, makes optimizers more reliable when data is noisy or surprising.

4. Self-modifying models

Using these ideas, Google built a prototype architecture called Hope. It applies Nested Learning to create a model that can modify its own memory levels through a self-referential loop. Early experiments show improved long-context reasoning and lower perplexity than comparable models.

What this means for developers

Nested Learning isn’t a model. It’s a framework for designing models. It suggests several practical directions for the industry:

  • Build architectures where different components learn at different speeds.
  • Treat optimizers as first-class learning modules instead of simple update rules.
  • Create more stable long-context systems by spreading memory across many layers of update frequencies.
  • Move toward models that can update themselves in real time without retraining from scratch.

This could reduce training costs, improve adaptability, and enable models that evolve more organically.

Why this matters for the future of AI

The field has been searching for a way to bridge the gap between static pretraining and real continual learning. Nested Learning is one of the clearest attempts so far to unify all parts of the learning process.

If these ideas continue to show strong results, they could influence the next generation of LLMs, agents, and self-updating systems. Models may become better at long-term memory, more resilient to new data, and more capable of autonomous improvement.

Nested Learning might not be the final answer to continual learning. But it’s an important step in the right direction.