manifold implementation explanation

Implementing Manifold Structures in SynthSAEBench: Technical Explanation

Core Data Structure Modifications

To implement manifold features in SynthSAEBench, we need to modify the data generation pipeline to support features that are not just scalar coefficients times unit vectors, but rather points sampled from geometric structures embedded in the activation space. The key is to extend the existing feature dictionary D and coefficient sampling mechanism while preserving compatibility with correlation, hierarchy, and superposition. First, extend the synthetic model to maintain two types of features: discrete features (as currently implemented) and manifold features. For each feature i, add metadata specifying feature_type (discrete or manifold), and for manifold features, store additional parameters: intrinsic_dim (the manifold’s true dimensionality), embedding_basis (a D × d matrix of orthonormal vectors defining the subspace), and manifold_type (circle, sphere, torus, etc.). Instead of storing a single direction vector d_i ∈ ℝ^D for manifold feature i, store the embedding basis U_i that spans the subspace where the manifold lives. The existing firing probability p_i, hierarchy relationships, and correlation structure remain unchanged—these still determine whether the manifold feature is active on a given sample. The crucial modification is in what happens when a manifold feature fires.

Sampling and Embedding Process

When generating a batch of activations, the sampling proceeds in phases just like current SynthSAEBench, but with manifold-specific logic inserted. Phase 1: Use the existing Gaussian copula mechanism to determine which features fire (binary indicators z_i), respecting correlations. Phase 2: For discrete features that fire (z_i = 1), sample scalar coefficients exactly as before: c_i = ReLU(μ_i+σ_iϵ_i). For manifold features that fire, perform a two-step sampling: (a) Sample a point on the intrinsic manifold m_i according to the manifold’s parameterization, and (b) Sample a radial magnitude r_i = ReLU(μ_i+σ_iϵ_i) using the same rectified Gaussian as discrete features. Phase 3: Apply hierarchy constraints as before, but now when a manifold child is deactivated, we zero out both its manifold position and radial component. Phase 4: Compute the final activation by summing contributions: discrete features contribute c_id_i as usual, while manifold features contribute r_iϕ_i(m_i) where ϕ_i is the embedding function. The total activation is a = ∑_discretec_id_i + ∑_manifoldr_iϕ_i(m_i) + b.

Concrete Manifold Implementations

For a circular feature (S¹), the implementation would store a 2-column embedding basis U_i ∈ ℝ^D × 2 where the columns are orthonormal. When the feature fires, sample an angle θ ∼ Uniform(0,2π) (or from a von Mises distribution if you want concentration at certain points, like specific days of the week). The embedding function is ϕ_circle(θ) = U_i[cos(θ),sin(θ)]^T, giving a point in ℝ^D that lies on a circle in the 2D subspace spanned by U_i. For radial variation, multiply by r_i: contribution = r_i × ϕ_circle(θ). For a sphere S^d − 1, store a d-column embedding basis U_i ∈ ℝ^D × d. Sample d independent Gaussians g ∼ 𝒩(0,I_d), normalize to get a point on the unit sphere v = g/∥g∥, then embed via ϕ_sphere(v) = U_iv and scale by r_i. For a torus T² = S¹ × S¹, store a 4-column basis and sample two independent angles θ₁, θ₂, then embed as ϕ_torus(θ₁,θ₂) = U_i[cos(θ₁),sin(θ₁),cos(θ₂),sin(θ₂)]^T. The embedding bases U_i should be initialized as random orthonormal matrices (via QR decomposition of random Gaussian matrices) and can optionally be included in the orthogonalization procedure that reduces superposition between features. Critically, to match observed LLM behavior, set σ_i > 0 for realistic radial variation—Michaud et al. found that hollow manifolds (σ_i = 0) lead to pathological tiling, while radial variation encourages SAEs to learn basis-like representations. The practical implementation would add a ManifoldFeature class with methods sample_intrinsic_point() and embed_to_activation_space(intrinsic_point, radius), making it modular to add new manifold types while maintaining the existing SynthSAEBench infrastructure for correlation matrices, hierarchy trees, and evaluation metrics.