manifold implementation explanation

Implementing Manifold Structures in SynthSAEBench: Technical Explanation

Core Data Structure Modifications

To implement manifold features in SynthSAEBench, we need to modify the data generation pipeline to support features that are not just scalar coefficients times unit vectors, but rather points sampled from geometric structures embedded in the activation space. The key is to extend the existing feature dictionary D and coefficient sampling mechanism while preserving compatibility with correlation, hierarchy, and superposition. First, extend the synthetic model to maintain two types of features: discrete features (as currently implemented) and manifold features. For each feature i, add metadata specifying feature_type (discrete or manifold), and for manifold features, store additional parameters: intrinsic_dim (the manifold’s true dimensionality), embedding_basis (a D × d matrix of orthonormal vectors defining the subspace), and manifold_type (circle, sphere, torus, etc.). Instead of storing a single direction vector di ∈ ℝD for manifold feature i, store the embedding basis Ui that spans the subspace where the manifold lives. The existing firing probability pi, hierarchy relationships, and correlation structure remain unchanged—these still determine whether the manifold feature is active on a given sample. The crucial modification is in what happens when a manifold feature fires.

Sampling and Embedding Process

When generating a batch of activations, the sampling proceeds in phases just like current SynthSAEBench, but with manifold-specific logic inserted. Phase 1: Use the existing Gaussian copula mechanism to determine which features fire (binary indicators zi), respecting correlations. Phase 2: For discrete features that fire (zi = 1), sample scalar coefficients exactly as before: ci = ReLU(μi+σiϵi). For manifold features that fire, perform a two-step sampling: (a) Sample a point on the intrinsic manifold mi according to the manifold’s parameterization, and (b) Sample a radial magnitude ri = ReLU(μi+σiϵi) using the same rectified Gaussian as discrete features. Phase 3: Apply hierarchy constraints as before, but now when a manifold child is deactivated, we zero out both its manifold position and radial component. Phase 4: Compute the final activation by summing contributions: discrete features contribute cidi as usual, while manifold features contribute riϕi(mi) where ϕi is the embedding function. The total activation is a = ∑discretecidi + ∑manifoldriϕi(mi) + b.

Concrete Manifold Implementations

For a circular feature (S1), the implementation would store a 2-column embedding basis Ui ∈ ℝD × 2 where the columns are orthonormal. When the feature fires, sample an angle θ ∼ Uniform(0,2π) (or from a von Mises distribution if you want concentration at certain points, like specific days of the week). The embedding function is ϕcircle(θ) = Ui[cos(θ),sin(θ)]T, giving a point in D that lies on a circle in the 2D subspace spanned by Ui. For radial variation, multiply by ri: contribution = ri × ϕcircle(θ). For a sphere Sd − 1, store a d-column embedding basis Ui ∈ ℝD × d. Sample d independent Gaussians g ∼ 𝒩(0,Id), normalize to get a point on the unit sphere v = g/∥g, then embed via ϕsphere(v) = Uiv and scale by ri. For a torus T2 = S1 × S1, store a 4-column basis and sample two independent angles θ1, θ2, then embed as ϕtorus(θ1,θ2) = Ui[cos(θ1),sin(θ1),cos(θ2),sin(θ2)]T. The embedding bases Ui should be initialized as random orthonormal matrices (via QR decomposition of random Gaussian matrices) and can optionally be included in the orthogonalization procedure that reduces superposition between features. Critically, to match observed LLM behavior, set σi > 0 for realistic radial variation—Michaud et al. found that hollow manifolds (σi = 0) lead to pathological tiling, while radial variation encourages SAEs to learn basis-like representations. The practical implementation would add a ManifoldFeature class with methods sample_intrinsic_point() and embed_to_activation_space(intrinsic_point, radius), making it modular to add new manifold types while maintaining the existing SynthSAEBench infrastructure for correlation matrices, hierarchy trees, and evaluation metrics.