Toggle light / dark theme

Mathematics is All You Need 2 — Sign-Stabilized Behavioral Fibers in Transformer Residual Streams

Mathematics is All You Need 2: Sign-Stabilized Behavioral Fibers in Transformer Residual Streams This volume presents a pre-registered empirical investigation of the residual-stream geometry of frozen transformer language models, anchored by a four-test decision sprint executed on 2026/05/09 and a six-experiment tier-0 lockdown battery, with full reproducibility manifest. Empirical findings. Cross-architecture transfer of behavioral readouts from Qwen-2.5-7B-Instruct to Hermes-3-Llama-3.1-8B yields mean AUC retention of 0.749 across 75 probe-layer pairs over 10 seeds (BCa bootstrap 95% CI [0.7466, 0.7577] from 10,000 resamples; permutation test 10,000 permutations p < 10⁻⁴; significance survives Bonferroni correction at α = 0.05). Causal steering of the target architecture using a probe direction trained on the source architecture produces strictly monotonic probe-output deflection on 29 of 29 held-out prompts (median Spearman ρ = 1.000, intervention range α ∈ [−3, +3]). Gauge-flexibility of the underlying low-rank substrate is established at high statistical power: 100 random orthogonal rotations of the projection basis produce retention standard deviation σ = 0.0096. The intrinsic dimension of the behavioral substrate is shown to be 1–4 for the majority of behavioral traits tested, with single-direction (r = 1) retention of 0.897. The angle between the rank-1 output highway direction and the centroid of trained probe directions at proportional depth is measured as 85.59° on Qwen-2.5-7B-Instruct at layer 13, independently reproducing a prior internal measurement of 85.5° to within 0.1°. Theoretical synthesis. The Two-Channel theorem: the residual stream of a frozen transformer admits a decomposition into a high-variance rank-1-dominant output channel read by the unembedding head and a low-rank near-orthogonal behavioral channel supporting both readout and causal cross-architecture steering. The architecture-invariant object is established empirically as the sign-stabilized SVD subspace itself rather than any specific basis within it; the canonical-basis specificity hypothesis is formally rejected by pre-registered ablation (T2). Convergence with prior work. The geometric near-orthogonality result provides a measurement-side mechanism complementary to the training-side finding of Huang, LeCun & Balestriero (LLM-JEPA, arXiv:2509.14252, 2025) that embedding-space training objectives improve LLM performance without altering generative capabilities. The two results describe the same underlying functional separability of latent structure and generation in transformer residual streams via independent methodologies. Scope and limitations. The empirical foundation is restricted to a single source–target architecture pair (Qwen-2.5-7B-Instruct → Hermes-3-Llama-3.1-8B), both decoder-only instruction-tuned transformers in the 7-8B parameter class. The headline T4 causal steering result is on one probe (language_id) at one layer pair (qL13 → hL15). Cross-family extension (Mistral, Phi, Gemma, Yi, Llama variants), multi-probe causal steering benchmarks, full d-model space angle measurement, and the PLATINUM-probe leakage audit are queued for the cluster reproduction sprint as a 15-pipeline validation matrix. Several claims from the prior volume Mathematics is All You Need (Napolitano 2026) are explicitly retracted or demoted to conjecture in Part VI of this work. Compute and reproducibility. Total wall time for the empirical foundation: approximately 9 hours on a single NVIDIA RTX 5090. Reproducibility manifest, replication recipes, and full numerical results are included as appendices. Keywords. Mechanistic interpretability; representation engineering; activation steering; cross-architecture transfer; linear representation hypothesis; transformer residual stream; behavioral probes; gauge invariance; pre-registered evaluation; Joint Embedding Predictive Architectures. Models and datasets used. Qwen-2.5-7B-Instruct; Hermes-3-Llama-3.1-8B. Datasets: HumanEval, MBPP, MATH, GSM8K, ProofNet, WritingPrompts, ROC stories, Wikipedia. Companion volume. Integrates and supersedes the unreleased internal report CYGNUS 2: Information Field Theory and the Geometry of Machine Consciousness (April 2026), included as Part II. Access. Distribution prior to public-release date is restricted to identified academic reviewers and partner research labs under signed NDA. Public release is scheduled for 30 days after the priority date of associated U.S. provisional patent applications. Source code, model weights, cached residuals, and intermediate artifacts are proprietary property of Proprioceptive AI, Inc. License. Text under CC-BY 4.0; source code and artifacts proprietary. ORCID. 0009−0000−1927−8537

Leave a Comment

Lifeboat Foundation respects your privacy! Your email address will not be published.

/* */