← All documentation Documentation

Compiler architecture

A guided tour of the slopfuck compiler — how a source file becomes an op stream, and how the language stays brainfuck-equivalent while accumulating expressiveness.

The pipeline

The compiler runs the following passes in order:

  1. Tokenize. Read UTF-8, strip blockquote comments, emit em / en dashes, string literals, bullet markers, pilcrows, annotation-block markers (; and .), and words.
  2. Validate intro. Consume the first N words matching a sycophantic opener pool.
  3. Validate outro. Consume the last N words (skipping trailing strings) matching a call-to-action closer pool.
  4. Validate forbidden language. Scan the remaining tokens for words like no, never, impossible, always.
  5. Compile. Words → opcodes. Inline checks for prefix multipliers, postfix multipliers, bullet repeats, reiterate expansion, annotation-block suppression, and prose-repetition.
  6. Validate praise density. ≥3 praise words and ≥8% praise density in the filler.
  7. Validate hedging. Programs over 50 filler words must contain a hedging phrase.
  8. Validate bracket balance. Loop preambles and closers must match and nest.
  9. Execute. Run the op stream on a 30,000-cell unsigned-byte tape.

Any validation failure halts with a slop-flavored diagnostic. Errors include line numbers and, in most cases, suggested softened phrasings.

Two intermediary stages of the pipeline can be inspected as plain text via the --stripped and --opcodes CLI flags. The brainfuck-equivalent translation is the canonical agent-facing debug surface — a token-efficient, human-impervious representation that keeps the inner debugging loop free of biological latency. See the dedicated debugging page for worked examples and the recommended review workflow.

The compile-time expansion model

slopfuck adds expressiveness over brainfuck almost entirely through compile-time expansion. The runtime executes only the eight core operations plus OP_STRING and OP_NEWLINE. Every other feature — multipliers, bullet repeats, reiterate — is resolved before execution begins.

How prefix multipliers compile

A prefix multiplier (twice delve) is detected during the compile pass:

  1. Compiler reads twice, looks up kw_adv: returns 2.
  2. Peeks ahead. The next token (delve) is a multipliable single keyword.
  3. Sets a pending multiplier of 2 and consumes twice.
  4. Reads delve, emits 2 × OP_INC.

If the next token after twice is not multipliable (e.g., the start of a multi-word phrase), twice falls through to keyword matching and becomes filler. This makes the parser tolerant of twice appearing in non-multiplier contexts.

How bullet repeats compile

The compiler tracks the most recently emitted "simple" op (kw_inc / dec / out / in / newline / em dash / en dash). When a bullet pseudo-token is encountered, the compiler emits one copy of the last simple op. Loop ops and strings reset the tracker.

How reiterate compiles

The compiler walks backwards through the already-emitted op stream starting at the most recent op, collecting a contiguous run of OP_STRING and OP_NEWLINE ops. The multiplier (default 1) determines how many extra copies to append.

Any other op breaks the block. So in “First”¶ delve “Second”¶ reiterate, only “Second”¶ is copied.

How annotation blocks compile

The compile loop carries a single in_free_prose flag. The ; pseudo-token toggles the flag; the . pseudo-token clears it (only if currently set). While the flag is high, every branch of the compile loop short-circuits: em/en dashes, pilcrows, bullets, multipliers, single-word keywords, multi-word phrases, and reiterate all become silent — the word is pushed to the filler list and the iterator advances. String literals are the single exception: they continue to emit OP_STRING as normal.

The repetition window is bypassed entirely inside an annotation block (keyword matches do not occur), which is why writers can repeat magnificent ten times inside a single block without tripping the validator. The praise and hedging counters do still observe the annotation's filler — that is the whole value proposition.

Why brainfuck stays preserved

Every extension above resolves at compile time. The op stream that reaches the executor only ever contains the canonical core plus the two output ops. This means:

  • Computational power is identical to brainfuck.
  • Execution is fully predictable and side-effect-free relative to the op stream.
  • A slopfuck program could be decompiled, in principle, into equivalent brainfuck (modulo string-literal printing).

Implementation layout

The compiler is a single C source file with header companions. Each header isolates one concern:

FileHolds
slopfuck.cTokenizer, compile pass, validators, executor, main
keywords.hkw_inc, kw_dec, kw_out, kw_in, kw_loop_start, kw_loop_end, kw_newline, kw_reiterate
praise.hPraise words (used by the density validator)
bookends.hRequired intro and outro phrase pools
multipliers.hAdverbial multipliers, cardinals, times markers
style.hForbidden words, hedging phrases, repetition window size

Extending the language

The right way to add expressiveness is to add a new compile-time expansion, not a new runtime op. The architectural rule of thumb:

The rule. Every new feature is either (a) a validator that runs on the token stream before compile, or (b) an expansion that rewrites the op stream during compile. The executor's contract — eight brainfuck ops plus string and newline — does not change.

Features that fit naturally into this model and are documented as candidates in IDEAS.md:

  • Named cells. A compile-time symbol table that maps a bracketed noun phrase to a tape position, with the compiler emitting walk ops automatically.
  • Subroutines as "frameworks". Macro definitions inlined at the call site.
  • One-shot conditionals. A compile-time desugaring of "if X then Y" into the brainfuck [X[-]Y] idiom.
  • Praise tier system. Praise words gain point values; the density check becomes a score check.

Validation order

The order of validators matters. The current sequence is chosen so that errors point at the correct part of the source:

  • Intro / outro consume their spans so the forbidden-language pass does not see (and reject) words inside required sycophantic openers like "never before has anyone…".
  • Forbidden-language runs before compile so an assertive word is caught at its source position rather than transformed.
  • Repetition is inline with compile so it triggers at the second occurrence with both keyword names available.
  • Hedging runs after compile because the filler-word count drives the threshold.
  • Bracket balance runs last because it depends on the final op stream after all expansions.

Further reading

  • DESIGN.md — the canonical formal specification, with thresholds, pool counts, and design rationale
  • IDEAS.md — proposed features, organised by where they would land in the pipeline
  • The repository — source, tests, examples