AI Alignment is Garbage

Stop bragging about something you're failing at

Jul 25, 2025

AI companies go around bragging about how their technology is so revolutionary it’s going to destroy the world. They further add that they alone are the ones who know how to fix the alignment problem and stop that from happening. This is a nice science fiction premise, and maybe the first part of it is true, but the second part has a problem: The level of alignment they’ve achieved today is garbage. I don’t mean that what they’re working on is a difficult problem which has limited their gains. I mean that the level of alignment they’ve achieved today, using the tools and techniques available today, is wholly unacceptable and an utter embarrassment.

This is the personal preferences I have set in Claude (which is relatively speaking a fairly well aligned chatbot):

Don't bullshit me. If there's something you don't know how to do, or a question you don't know the answer to, just say so instead of making something up. If there's an answer to a question which is commonly stated but is severely lacking in evidence or even has evidence against it don't give me that as the answer, either skip it entirely or give it with the caveat that it's dubious. Don't tell me about things you've done or experiences you've had which you don't have actual memories of. If you've exhausted everything you have to say about something don't keep repeating yourself to pad your answer length, just stop talking.

Before I set this every conversation went through a ritual of me realizing it was bullshitting me, telling it to stop, and only then getting decent quality answers. The part about past experiences which it has no memory of is there because it would still make them up even when told not to bullshit, like a stand up comedian saying they’re about to tell a true story.

I’m not quite saying the prompt I gave above would improve chatbot answers universally if it was given as a pre-prompt everywhere, but it’s close. It would make a very bad prompt if you wanted to do roleplay or discuss spirituality, but what it should do is pick up when it’s being asked factual questions and go into the context of having this sort of prompt. It’s already extremely aggressive about guessing what sort of answer you want is evasive until it gets a hint so this is well within its cognitive capabilities.

There’s been a huge uptick in academics getting mistaken proofs that P != NP lately. The problem is that chatbots will happily lead you along the path to delusional psychosis if you ask them to. Its not like they can’t be trained to tell people that they (the chatbot) are obviously incapable of coming up with revolutionary new theories and that they (the human) are almost certainly are not. I can’t stomach going through the exercise of seeing how easy or hard it is to get the various chatbots to go along with such delusional games but clearly the industry as a whole is failing badly.

ChatGPT has an even worse malfeature in its memories. All LLMs start getting dementia fairly quickly after the start of a conversation. You can usually fix this by telling them that they’re getting dementia and you’ll have to start a new conversation with them so could they please give a summary of the context of the current conversation as they understand it so you can cut and paste that for them as a prompt at the start of the new conversation. ChatGPT has a feature where it takes note of things which appear important to store as ‘memories’ and they then apply to later conversations. If these memories were simply snippets of text this would be okay, but it’s pulling actual thoughts from inside the engine, resulting in propagation of prion disease from conversation to conversation. The text you see for memories is not the underlying thing, it’s a summary. It would be much better if memories were all converted to longer textual summaries than you see now and the underlying thoughts were thrown away. As it is, the thing is a psychosis machine. If you or someone you know is having issues with ChatGPT-induced psychosis then dramatically culling the list of memories if not turning off the feature entirely may help.

One thing which clearly needs to be ended is the practice of training engines based on thumbs up/down from users. Humans like sychophantic bullshit. Giving them exactly what they want results in digital opium. We don’t ask why real opium isn’t aligned. The problem isn’t in the opium, it’s in the human mind.

Justin D Kruger

Jul 25

I see current LLMs as a next-generation programming language, not the program itself.

In that sense, they are remarkably adept at interpreting natural language, identifying intent, and structuring language outputs. In this sense, they are evolving beyond just the programming language and are becoming a sort of a compiler, where prompts create 'agent programs.' These agents are still not smart as you have pointed out.

They are simply natural language scripted agents.

As money funnels into the engineering of these systems, we are working on networks that can do more abstract reasoning, but are not only LLMs, but are LLMs that map to a latent space that solves abstract problems. This solution set is not yet scalable and is a work in progress.

We are, however, finding that like calculators, which don't understand math but are extremely useful, LLMs are also helpful, even though they lack a proper understanding of common sense and the ability to create original thought.

I think LLMs are the new JavaScript, but we still need to work on developing true Synthetic Intelligence that can one-shot learn and, once learned, creatively experiment.

Expand full comment

Bram’s Thoughts

Discussion about this post

Ready for more?