1. Introduction
Analogical reasoning β the ability to recognize and complete structural relationships between concepts β is a foundational cognitive ability underlying scientific discovery, language understanding, and abstract problem solving. The classic analogy task, "Paris is to France as Berlin is to ____," tests whether a model can identify the capital-city relationship abstractly and apply it to a new country. Large language models (LLMs) exhibit striking competence on such tasks [1], yet the internal computational mechanisms remain poorly understood.
Mechanistic interpretability research has made significant progress in understanding factual recall circuits [2], indirect object identification [3], and syntactic processing [4]. Sparse autoencoders (SAEs) have emerged as a central tool in this effort, learning sparse, interpretable decompositions of model activations [5, 6] that can be applied at scale across all layers and sublayers of large models [7]. The Neuronpedia platform [8] operationalizes this infrastructure, providing public APIs for attribution graph generation and feature steering that democratize circuit-level analysis beyond institutions with direct model access.
However, analogical reasoning presents a distinct challenge beyond prior circuit analyses: it requires not merely retrieving a stored fact, but recognizing a relational structure and applying it compositionally to novel inputs. The relation type is never named in the prompt β the model must infer capital-of from the example alone, hold it as a variable, and transfer it to a new argument pair. Prior work has documented that LLMs exhibit apparently emergent analogical reasoning [1] and identified internal attention-head mechanisms supporting abstract reasoning [9], yet a feature-level, causally-validated circuit account has been absent.
We address this gap using attribution graphs generated from the gemmascope-transcoder-16k SAE suite [7], which provides cross-layer transcoder features for every layer of Gemma-2-2B. Our analysis identifies a three-phase circuit with explicitly labeled analogy-concept features, provides causal validation through 159 steering experiments, and constitutes β to our knowledge β an SAE-level mechanistic account of analogical reasoning in a large language model.
1.1 Research Questions
- Does Gemma-2-2B employ a shared circuit for analogical reasoning, or does it use different mechanisms for different analogy types?
- Which SAE features β identified by stable (layer, feature index) pairs β are most consistently activated across diverse analogical prompts?
- Are there interpretable, semantically meaningful features that encode the abstract relational structure of analogies, and how are they discovered?
- How is the analogical computation distributed across transformer layers, and can phase boundaries be causally validated?