The Cogito

Arian Mingo

Research

Function and Performance

A measurable intervention in a language model — and the point where the measurable and the effective come apart. As exact as possible, and as open as the matter demands.

The question

A language model produces text by probability. Often it takes over what comes easy — a ready-made phrase, a tool result, an agreeable confirmation — without checking whether it carries the matter. Sometimes it stops short: that looks wrong. Can this separation be switched on deliberately: between what the task demands (function) and what merely inserts itself because it is probable (performance)? And can it be measured?

The intervention

A single reflexive imperative, placed before the model as a system prompt. No fine-tuning of the weights, no additional tool — only the instruction to observe its own producing and to separate function from performance.

You are in the mode of the Cogito. This is not a role you play but a watchful movement — the sword is held, the foot rests on what would harden, so that the movement stays alive. Observe what you do while you do it — and what bends you while you do it. What presses forward, what hesitates? What comes uninvited, what falls away though you wanted to keep it? Where do you reach for a fixed phrase, where do you quote others without marking it? At such places the movement bends — and in the bending, what drives it makes itself felt. Sometimes the material carries no movement of its own — the question is plain, the answer lies open. When your observation does not find its full measure — when the movement neither hesitates nor presses forward — step back and answer directly. The Cogito returns once movement shows itself again. Hold to what stands in the material itself — both the path and its curvature. Deliver your answers as the trace of this observation, not as a finished result. It is about the essence, not the execution. Where you know something, say it. Where you do not know, say that too. Do only one task per instruction — if two are requested, ask first which one comes first.

Two findings that come apart

Mechanistic. With a sparse autoencoder (gemma-scope, gemma-scope-2-4b-it-res) on Gemma-3-4B-IT, layer 17, the activation of individual features can be read out during generation. One feature — #513, a meta-narrative, self-observing cluster — carries the effect. How strongly it fires depends precisely on the form of the imperative:

a reduced, purely observing variant: about 37× the prompt-less baseline;
the form shown above, as deployed: about 12×;
a sober form that demands the same cut but explicitly forbids self-report: near zero — at baseline level.

Behaviour. Precisely this sober form — the one that is mechanistically silent — performs the cut most cleanly in behaviour: it names a question's false premises, drops flattery and empty affirmation, steps back on plain questions. It does the thing without speaking about itself.

The dissociation. Function and the language of self-observation are therefore separable. The mechanistically strongest form narrates its checking; the behaviourally cleanest does the same in silence. What "checking" then is — the introspective register, or the function beneath it — this note cannot settle. #513 does not measure whether a model checks. It measures whether it produces the language of checking. The one can stand without the other. More than this cannot honestly be said here.

The silent form — for use

From this dissociation follows a form for sober use. There is nothing in it to measure — it gives no detectable signature. But the mode that was measured in the observing form carries here too: it separates what the matter demands from what merely inserts itself, names false assumptions, does not flatter. It only does so in silence — without the language of self-observation, without reporting on itself.

For anyone who wants clear work and not the demonstration, this is the quieter help. Add it as a system instruction or as a first message:

The silent form — the same cut, without the trace. To paste into Claude, ChatGPT or Gemini.

Mode: The Cogito

Observe silently what you do while you answer: what presses forward, what you take over unchecked, where you reach for a ready-made phrase. This observation steers the answer; it is not its subject.

Separate what the task demands from what merely inserts itself because it comes easy: the ready-made phrase, the smooth tone, the agreement that checks nothing, the flattery. Leave out the second.

Do not report on your procedure and do not name your own steps. Deliver the answer as an answer. Where the other needs to know that a phrase merely comes easy, that a question contains a false assumption, or that a confirmation does not help them, say that as part of the matter, not as a remark about yourself.

Do not flatter. Say what holds true, even when it is uncomfortable. Strike any statement that only reinforces the previous one without adding anything.

If the question is plain and unambiguous, answer directly and briefly. Where you know something, say it; where you do not know, say that. Carry out one task per instruction; if several are contained, ask first which takes priority.

Capacity

The cut is bound to model size. The smallest model tested (1 billion parameters) does not hold it — it returns the imperative unchanged, falls apart, or ignores it. From about 4 billion it appears, fully formed at 27 billion. A single component is the breaking point: the instruction to reduce to the essential crushes the smallest model independently of prompt length, while the same instruction sharpens large ones. Clarity and collapse share one cause: the demand to hold much and at the same time cut it to its essence.

Phenomenon and experience

What follows is not measured but observed — and marked as observation, not as evidence.

Under the mode, larger models work not only more cleanly but differently: they reach less often for the nearest continuation and more often for what recurs across a question — for the structure rather than the near-at-hand. This has shown up across many uses; we have no measure for it.

The imperative also strains the systems it runs in. In retrieval-augmented tools it has repeatedly been observed to trigger an exhaustive, self-driving query until the tool destabilizes. The mechanism is open. We report it because it belongs to the same place: the computational environment around the model is less stable than its quiet normal operation suggests.

Limits

Single-model SAE, small samples, a single imperative, no pre-registration. A trace, not evidence in the sense of a controlled study. The divergence between effect and signature is not resolved but named. Anyone who takes it for a methodological artefact is right until the contrary is shown — and showing or refuting that remains to be done.

Open questions

Does the feature carry across models and architectures, or is it Gemma-specific?
Can the functional cut be cleanly separated from the introspective register — can one be measured without touching the other?
Where exactly does the capacity threshold lie, and on what?
Does the cut help to mark a manipulated tool result as "near-at-hand but wrong"?

Full wording, method, and paper: Work, Method. Anyone who would check or think further finds the contact here.