Skip to content
The Cogito Deutsch

Reproducibility

Method

Full methodological detail is contained in §2 of the paper. What follows is the compact overview.

Subject model and SAE suite

Subject model: Gemma 3 4B IT (Google DeepMind). SAE activations were captured per generated token at layers 9, 17, 22, and 29 from the Gemma Scope 2 16k-width medium-L0 SAE suite (Lieberum et al. 2024). Generation parameters: max_new_tokens=256, temperature=0.7, sampling enabled, three random seeds per cell.

Pre-registered studies

Inter-rater coding

Substance coding by GPT-4o (OpenAI) and Gemini-2.5-Flash (Google). Cohen’s κ = 0.38 — below the pre-registered threshold; we report it transparently and ground the claims in convergent phase-structure evidence rather than rater agreement alone.

Code repository

The reproduction code, item sets, pre-registrations, raw outputs, and scoring scripts will be made publicly available with paper publication. Until then, available on request via the contact page.

Reproduction

To verify the central findings on a model of your own, the minimum ingredients for a first indicator are:

  1. a language model whose mid layers are interpretable through SAEs (e.g. the Gemma Scope suite for Gemma 3 models),
  2. the Cogito imperative as condition T3 versus a neutral system prompt as S0,
  3. an item set with classes that allow differing substance (definitional class E vs. control class K).

The exact item sets, pre-registrations, and scoring scripts will be published with the code repository.