Circumventing dementia, a.k.a. Prompt Engineering

Geändert: 2026-04-26 09:33:06, Wörter: 1063, Lesedauer: 5 min

Note

I translated this article from my German article using DeepL and edited it slightly.

The term “prompt engineering” is ridiculous. Engineering implies that something is designed according to rules to achieve a reliable result. But crafting prompts for language, image, and video models doesn’t follow rules—it’s 5% experience and 95% hope. And the prompt is never reliable, not even for a single model, and certainly not portable across many or even all models. Crafting prompts really has nothing to do with engineering.

If we look at the rules for system prompts in models like Claude, we see that a lot of them are phrased using “DOES NOT.” This is the result of the experience of just about every user who, after a while with language models like GPT-5, realizes that the model either stubbornly doesn’t understand what you mean or, as the chat history gets longer and longer, becomes dumber and dumber, to the point where it understands the opposite of what you’ve asked it to do. But “DOES NOT” is not reliable, and certainly not the more information you include in the prompt. The more information is in the prompt, the more models disregard it. Even if idiots—and that includes the language models themselves—always claim that a language model has “forgotten” something, no, it has disregarded it. Because a language model’s intelligence quotient is significantly shaped by its attention mechanism. And that is not perfect.

Language models don’t get smarter; they just become more adapted

There is also research being conducted on attention mechanisms. But so far, the difference in performance is only a matter of a few percentage points, whether changes are made to the attention mechanism or the architecture. Only for specific tasks are there currently reports of success indicating that a differently designed model performs 100 to 200 percent better—though, of course, this applies only to that specific task. But across a wide range of tasks, the improvements are now only a few percentage points every few months. Yet what isn’t mentioned here is an irrefutable fact. If a model has the same number of parameters and essentially the same architecture, it can only improve by a few percentage points in some areas if it were to perform worse in other, untested areas. AI companies don’t show these. No one wants to advertise with the message, “Overall, our new model is just as dumb as before. And we invested $10 million in training it. xD.” And in fact, it’s now the case that new models perform slightly worse on guardrails than the previous model. In other words, it’s less reliable than before. But none of the companies put it that way. So-called “prompt engineering” can hardly help with reliability. Nobody knows how to make prompts reliable. And those influencers on YouTube act as if they do.

Language models aren’t forgetful; they’re demented

If models get dumber as the chat history grows, that doesn’t mean they become more forgetful, as the influenza plague on YouTube claims. Once a model has become dumber during the chat, it stays that way—it can’t recall anything from the chat history before. You have to tell it again. That’s not forgetfulness but dementia. And that’s also what many people who have no experience with dementia misunderstand. They think people with dementia are forgetful. No, with forgetfulness, you remember again hours, days, or weeks later. With dementia, you forget forever; you forget your loved ones, like your own children. And you never remember again. It goes so far that you can no longer distinguish left from right or before from after. That’s why, over time, people with dementia can no longer operate tools like the bedside bell in a hospital. It is forgotten and remains forgotten. Dementia is therefore not forgetfulness, but a steadily declining intelligence, down to the level of a preschooler and below. This is exactly the experience one has with language models the longer the chat history becomes. With prompt engineering, one therefore attempts to circumvent the “dementia” of language models. This is not engineering, but rather nursing care for people with dementia.

Language models don’t respond; they just keep rambling on

The height of public deception is when “Frontier Labs” like OpenAI and Antrophic pretend that we can hope for a cure for language models’ dementia in “two years.” They like to refer to this supposedly imminent superintelligence as AGI (Artificial General Intelligence). From a mathematical perspective, the prediction of the next token must become increasingly inaccurate the more preceding tokens must be taken into account. There can be no solution to this in two years, and very likely never. Any solution to this is merely a hack, such as leveraging tools, agent skills, or databases. But that has nothing to do with the model itself; these are merely aids. A fool is still a fool with paper and pencil. What is referred to as “getting smarter” in language models is merely what is called “alignment.” Alignment means that a model more often matches the user’s intent or their prompts when outputting response tokens. Although the term “response” is also bullshit.

A language model doesn’t simply respond; instead, it continues to generate the story. You provide tokens in the form of a prompt, and the model continues to generate the story using additional tokens—called response tokens, or simply “responses”—until an EOS (End Of Sequence) token is encountered. Then the software running the model stops generating further tokens. It could also continue outputting tokens after the EOS, giving the user the impression that the model has gone haywire and is producing increasingly nonsensical content unrelated to the prompt. But it doesn’t actually get dumber, because it was never intelligent to begin with. It’s just pretending. The software simply used the model weights and architecture to spit out response tokens that were statistically linked to the query tokens via a simple formula known as an attention mechanism. The more precise the query, the more appropriate the answer—but only if the model is fundamentally capable of it. If it’s too small, it’s too dumb. No amount of prompt engineering will help in that case.

Language models are not a paradigm shift, but a cargo cult

Prompt engineering isn’t just a form of dementia care; it’s also a cargo cult. It’s all pretense. Every aspect of language models is a “as if”: as if they were becoming more intelligent, as if “good” prompt formulation would lead to reliable full automation, as if AI could replace entire jobs, as if every person and every company needed it, as if the companies behind them were the most valuable in the world, and so on. The language model craze is a cargo cult.