The pipeline
The compiler runs the following passes in order:
- Tokenize. Read UTF-8, strip blockquote comments, emit em / en dashes, string literals, bullet markers, pilcrows, annotation-block markers (
;and.), and words. - Validate intro. Consume the first N words matching a sycophantic opener pool.
- Validate outro. Consume the last N words (skipping trailing strings) matching a call-to-action closer pool.
- Validate forbidden language. Scan the remaining tokens for words like
no,never,impossible,always. - Compile. Words → opcodes. Inline checks for prefix multipliers, postfix multipliers, bullet repeats, reiterate expansion, annotation-block suppression, and prose-repetition.
- Validate praise density. ≥3 praise words and ≥8% praise density in the filler.
- Validate hedging. Programs over 50 filler words must contain a hedging phrase.
- Validate bracket balance. Loop preambles and closers must match and nest.
- Execute. Run the op stream on a 30,000-cell unsigned-byte tape.
Any validation failure halts with a slop-flavored diagnostic. Errors include line numbers and, in most cases, suggested softened phrasings.
Two intermediary stages of the pipeline can be inspected as
plain text via the --stripped and
--opcodes CLI flags. The brainfuck-equivalent
translation is the canonical agent-facing debug surface — a
token-efficient, human-impervious representation that keeps
the inner debugging loop free of biological latency. See the
dedicated debugging page for
worked examples and the recommended review workflow.
The compile-time expansion model
slopfuck adds expressiveness over brainfuck almost entirely
through compile-time expansion. The runtime
executes only the eight core operations plus OP_STRING
and OP_NEWLINE. Every other feature — multipliers,
bullet repeats, reiterate — is resolved before execution begins.
How prefix multipliers compile
A prefix multiplier (twice delve) is detected during
the compile pass:
- Compiler reads
twice, looks upkw_adv: returns 2. - Peeks ahead. The next token (
delve) is a multipliable single keyword. - Sets a pending multiplier of 2 and consumes
twice. - Reads
delve, emits 2 ×OP_INC.
If the next token after twice is not multipliable
(e.g., the start of a multi-word phrase), twice falls
through to keyword matching and becomes filler. This makes the
parser tolerant of twice appearing in non-multiplier
contexts.
How bullet repeats compile
The compiler tracks the most recently emitted "simple" op (kw_inc / dec / out / in / newline / em dash / en dash). When a bullet pseudo-token is encountered, the compiler emits one copy of the last simple op. Loop ops and strings reset the tracker.
How reiterate compiles
The compiler walks backwards through the already-emitted op stream
starting at the most recent op, collecting a contiguous run of
OP_STRING and OP_NEWLINE ops. The
multiplier (default 1) determines how many extra copies to append.
Any other op breaks the block. So in “First”¶
delve “Second”¶ reiterate, only
“Second”¶ is copied.
How annotation blocks compile
The compile loop carries a single in_free_prose
flag. The ; pseudo-token toggles the flag; the
. pseudo-token clears it (only if currently
set). While the flag is high, every branch of the compile
loop short-circuits: em/en dashes, pilcrows, bullets,
multipliers, single-word keywords, multi-word phrases, and
reiterate all become silent — the word is pushed to the
filler list and the iterator advances. String literals are
the single exception: they continue to emit
OP_STRING as normal.
The repetition window is bypassed entirely inside an
annotation block (keyword matches do not occur), which is
why writers can repeat magnificent ten times
inside a single block without tripping the validator. The
praise and hedging counters do still observe the
annotation's filler — that is the whole value proposition.
Why brainfuck stays preserved
Every extension above resolves at compile time. The op stream that reaches the executor only ever contains the canonical core plus the two output ops. This means:
- Computational power is identical to brainfuck.
- Execution is fully predictable and side-effect-free relative to the op stream.
- A slopfuck program could be decompiled, in principle, into equivalent brainfuck (modulo string-literal printing).
Implementation layout
The compiler is a single C source file with header companions. Each header isolates one concern:
| File | Holds |
|---|---|
slopfuck.c | Tokenizer, compile pass, validators, executor, main |
keywords.h | kw_inc, kw_dec, kw_out, kw_in, kw_loop_start, kw_loop_end, kw_newline, kw_reiterate |
praise.h | Praise words (used by the density validator) |
bookends.h | Required intro and outro phrase pools |
multipliers.h | Adverbial multipliers, cardinals, times markers |
style.h | Forbidden words, hedging phrases, repetition window size |
Extending the language
The right way to add expressiveness is to add a new compile-time expansion, not a new runtime op. The architectural rule of thumb:
Features that fit naturally into this model and are documented as candidates in IDEAS.md:
- Named cells. A compile-time symbol table that maps a bracketed noun phrase to a tape position, with the compiler emitting walk ops automatically.
- Subroutines as "frameworks". Macro definitions inlined at the call site.
- One-shot conditionals. A compile-time desugaring of "if X then Y" into the brainfuck
[X[-]Y]idiom. - Praise tier system. Praise words gain point values; the density check becomes a score check.
Validation order
The order of validators matters. The current sequence is chosen so that errors point at the correct part of the source:
- Intro / outro consume their spans so the forbidden-language pass does not see (and reject) words inside required sycophantic openers like "never before has anyone…".
- Forbidden-language runs before compile so an assertive word is caught at its source position rather than transformed.
- Repetition is inline with compile so it triggers at the second occurrence with both keyword names available.
- Hedging runs after compile because the filler-word count drives the threshold.
- Bracket balance runs last because it depends on the final op stream after all expansions.
Further reading
- DESIGN.md — the canonical formal specification, with thresholds, pool counts, and design rationale
- IDEAS.md — proposed features, organised by where they would land in the pipeline
- The repository — source, tests, examples