These notes cover a Q&A session explaining the mathematical foundations behind section 3.3 of the paper: Compositional Feature Directions with Hierarchical Constraints. The core idea is that child feature directions are constructed to be geometrically dependent on their parent directions, creating a controlled cosine similarity equal to α.
The child direction formula is:
dchild = α ⋅ dparent + β ⋅ d⊥
A vector is scaled so that its length (Euclidean norm) equals exactly 1:
$$\|\mathbf{d}\|_2 = \sqrt{\sum_i d_i^2} = 1$$
This makes it a unit vector — it encodes only direction, not magnitude.
The dot product dchildTdparent requires multiplying a row vector by a column vector. The transpose turns dchild from a column into a row so the dimensions align for matrix multiplication, yielding a scalar.
The general formula for cosine similarity is:
$$\cos(\theta) = \frac{\mathbf{a}^T \mathbf{b}}{\|\mathbf{a}\| \cdot \|\mathbf{b}\|}$$
When both vectors are L2-normalized (unit-length), each denominator term equals 1, so the formula collapses to just the dot product:
cos (θ) = aTb
After normalizing dchild, its dot product with dparent equals exactly α. Here’s why:
Before normalization, the dot product of the unnormalized child with the parent is:
$$(\alpha \cdot \mathbf{d}_{parent} + \beta \cdot \mathbf{d}_\perp)^T \mathbf{d}_{parent} = \alpha \underbrace{(\mathbf{d}_{parent}^T \mathbf{d}_{parent})}_{=1} + \beta \underbrace{(\mathbf{d}_\perp^T \mathbf{d}_{parent})}_{=0} = \alpha$$
The β term vanishes because d⊥ is orthogonal
to dparent.
After normalization, the cosine similarity is exactly α.
Interpretation of α:
| α value | Geometric meaning | Semantic interpretation |
|---|---|---|
| α = 0 | Child ⊥ Parent | No inherited meaning; fully independent |
| α → 1 | Child ≈ Parent | Almost the same concept |
| 0 < α < 1 | Partial alignment | Child inherits part of parent’s meaning, plus its own unique component |
d⊥ = v − (v⋅dparent) dparent
The formula does one thing: strip away whatever part of v points in the parent’s direction, keeping only the leftover part that is purely sideways to it.
Step by step:
Proof that d_perp is orthogonal to d_parent:
$$\mathbf{d}_\perp \cdot \mathbf{d}_{parent} = [\mathbf{v} - (\mathbf{v} \cdot \mathbf{d}_{parent})\mathbf{d}_{parent}] \cdot \mathbf{d}_{parent} = \underbrace{\mathbf{v} \cdot \mathbf{d}_{parent}}_{s} - s \underbrace{\mathbf{d}_{parent} \cdot \mathbf{d}_{parent}}_{=1} = s - s = 0$$
The result d⊥ represents the pure specialization direction for the child — the component that captures what the child concept adds beyond the parent.
Only if the vector is unit-length (L2-normalized). For a unit vector:
d ⋅ d = ∑idi2 = ∥d∥2 = 1
For a general (non-normalized) vector, d ⋅ d = ∥d∥2, which equals the squared length — not necessarily 1.
The key insight of section 3.3 is that α acts as a direct geometric dial for semantic relatedness. By construction:
This creates testable predictions for SAE evaluation: well-functioning SAEs should recover decoder directions where child features have cosine similarity ≈ α with parent features, and ablating parent latents should impair child feature reconstruction more than unrelated features.