session_notes

Session notes: geometric feature invariance in SAEs

Based on Krampis (March 2026) — Section 3.3


Context

These notes cover a Q&A session explaining the mathematical foundations behind section 3.3 of the paper: Compositional Feature Directions with Hierarchical Constraints. The core idea is that child feature directions are constructed to be geometrically dependent on their parent directions, creating a controlled cosine similarity equal to α.

The child direction formula is:

dchild = α ⋅ dparent + β ⋅ d


Q1 — What does it mean for a vector to be L2-normalized?

A vector is scaled so that its length (Euclidean norm) equals exactly 1:

$$\|\mathbf{d}\|_2 = \sqrt{\sum_i d_i^2} = 1$$

This makes it a unit vector — it encodes only direction, not magnitude.


Q2 — Why is the child vector transposed in the dot product?

The dot product dchildTdparent requires multiplying a row vector by a column vector. The transpose turns dchild from a column into a row so the dimensions align for matrix multiplication, yielding a scalar.


Q3 — Why does the dot product of two unit vectors equal their cosine similarity?

The general formula for cosine similarity is:

$$\cos(\theta) = \frac{\mathbf{a}^T \mathbf{b}}{\|\mathbf{a}\| \cdot \|\mathbf{b}\|}$$

When both vectors are L2-normalized (unit-length), each denominator term equals 1, so the formula collapses to just the dot product:

cos (θ) = aTb


Q4 — Why does setting α > 0 create cosine similarity equal to α?

After normalizing dchild, its dot product with dparent equals exactly α. Here’s why:

Before normalization, the dot product of the unnormalized child with the parent is:

$$(\alpha \cdot \mathbf{d}_{parent} + \beta \cdot \mathbf{d}_\perp)^T \mathbf{d}_{parent} = \alpha \underbrace{(\mathbf{d}_{parent}^T \mathbf{d}_{parent})}_{=1} + \beta \underbrace{(\mathbf{d}_\perp^T \mathbf{d}_{parent})}_{=0} = \alpha$$

The β term vanishes because d is orthogonal to dparent.
After normalization, the cosine similarity is exactly α.

Interpretation of α:

α value Geometric meaning Semantic interpretation
α = 0 Child ⊥ Parent No inherited meaning; fully independent
α → 1 Child ≈ Parent Almost the same concept
0 < α < 1 Partial alignment Child inherits part of parent’s meaning, plus its own unique component

Q5 — How does Gram-Schmidt orthogonalization work to produce d_perp?

d = v − (vdparent) dparent

The formula does one thing: strip away whatever part of v points in the parent’s direction, keeping only the leftover part that is purely sideways to it.

Step by step:

  1. Start with any random vector v (which generally points partly toward the parent and partly sideways).
  2. Compute the projection of v onto the parent direction: (vdparent) dparent. The scalar v ⋅ dparent measures how much of v points toward the parent; multiplying back by dparent turns it into a vector — the “shadow” of v cast onto the parent axis.
  3. Subtract that projection from v. What remains is the component of v that was never pointing toward the parent — purely perpendicular.

Proof that d_perp is orthogonal to d_parent:

$$\mathbf{d}_\perp \cdot \mathbf{d}_{parent} = [\mathbf{v} - (\mathbf{v} \cdot \mathbf{d}_{parent})\mathbf{d}_{parent}] \cdot \mathbf{d}_{parent} = \underbrace{\mathbf{v} \cdot \mathbf{d}_{parent}}_{s} - s \underbrace{\mathbf{d}_{parent} \cdot \mathbf{d}_{parent}}_{=1} = s - s = 0$$

The result d represents the pure specialization direction for the child — the component that captures what the child concept adds beyond the parent.


Q6 — Is the dot product of a vector with itself equal to 1?

Only if the vector is unit-length (L2-normalized). For a unit vector:

d ⋅ d = ∑idi2 = ∥d2 = 1

For a general (non-normalized) vector, d ⋅ d = ∥d2, which equals the squared length — not necessarily 1.


Summary

The key insight of section 3.3 is that α acts as a direct geometric dial for semantic relatedness. By construction:

This creates testable predictions for SAE evaluation: well-functioning SAEs should recover decoder directions where child features have cosine similarity ≈ α with parent features, and ablating parent latents should impair child feature reconstruction more than unrelated features.