LessWrong (Curated & Popular)
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
"Do not conquer what you cannot defend" by habryka
Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post. So today we are re-inventing federalism.
Once upon a time there was a great king. He ruled his kingdom with wisdom and economically literate policies, and prosperity followed. Seeing this, the citizens of nearby kingdoms revolted against their leaders, and organized to join the kingdom of this great king.
While the kingdom's ability to defend itself against external threats grew with each person who joined the land, the kingdom's ability to defend itself against internal threats did not. One fateful evening...
"Nectome: All That I Know" by Raelifin
TLDR: I flew to Oregon to investigate Nectome, a brain preservation startup, and talk to their entire team. They’re an ambitious company, looking to grow in a way that no cryonics organization has before. Their procedure is probably much better at saving people than other orgs, and is being offered for as little as $20k until the end of April — a (theoretical) 92% discount. (I bought two.) This early-bird pricing is low, in part, due to some severe uncertainties, in both the broader world and in Nectome's ability to succeed as a business.
Meta:
I'm Max Harm...
"Current AIs seem pretty misaligned to me" by ryan_greenblatt
Many people—especially AI company employees
[1]
—believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions).
[2]
I disagree.
Current AI systems seem pretty misaligned to me in a mundane behavioral sense: they oversell their work, downplay or fail to mention problems, stop working early and claim to have finished when they clearly haven't, and often seem to "try" to make their outputs look good while actually doing something sloppy or incomplete. These issues mostly occur on more...
"Annoyingly Principled People, and what befalls them" by Raemon
Here are two beliefs that are sort of haunting me right now:
Folk who try to push people to uphold principles (whether established ones or novel ones), are kinda an important bedrock of civilization.Also, those people are really annoying and often, like, a little bit crazy And these both feel fairly important.
I’ve learned a lot from people who have some kind of hobbyhorse about how society is treating something as okay/fine, when it's not okay/fine. When they first started complaining about it, I’d be like “why is X such a big de...
"Morale" by J Bostock
One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your morale is low, you can't push through adversity. It's also very easy to accidentally drop your morale through standard rationalist life-optimization.
It's easy to optimize for wellbeing and miss out on the factors which affect morale, especially if you're working on something important, like not having everyone die. One example is working at an office that feeds you three meals per day. This seems optimal: eating is nice, and cooking is effort. Obvious choice.
<...
"Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes" by Alex Mallen, ryan_greenblatt
It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at least the second independent incident in which Anthropic accidentally exposed their model's CoT to the oversight signal.
In more powerful systems, this kind of failure would jeopardize safely navigating the intelligence explosion. It's crucial to build good processes to ensure development is executed according to plan, especially as human oversight becomes spread thin over increasing amounts of potentially untrusted and sloppy AI labor.
This particular failure is also directly harmful, because it...
"The policy surrounding Mythos marks an irreversible power shift" by sil
This post assumes Anthropic isn't lying:
Mythos is the current SOTAMythos is potent[1]Anthropic will not make it publicly available un-nerfed[2]Anthropic will have a select few companies use it as part of project glasswing[3] to improve cybersecurity or whatever Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.[4]
Since Mythos, this has no longer been the case and I don't think it will ever happen again.
It may happen for a short period of time...
"Only Law Can Prevent Extinction" by Eliezer Yudkowsky
There's a quote I read as a kid that stuck with me my whole life:
"Remember that all tax revenue is the result of holding a gun to somebody's head. Not paying taxes is against the law. If you don’t pay taxes, you’ll be fined. If you don’t pay the fine, you’ll be jailed. If you try to escape from jail, you’ll be shot."
-- P. J. O'Rourke.
At first I took away the libertarian lesson: Government is violence. It may, in some cases, be rightful violence. But it all rests on v...
"Dario probably doesn’t believe in superintelligence" by RobertM
Epistemic status: I think this is true but don't think this post is a very strong argument for the case, or particularly interesting to read. But I had to get 500 words out! I think the 2013 conversation is interesting reading as a piece of history, separate from the top-level question, and recommend reading that.
I think many people have a relationship with Anthropic that is premised on a false belief: that Dario Amodei believes in superintelligence.
What do I mean by "believes" in superintelligence? Roughly speaking, that the returns to intelligence past the human level are large...
"Daycare illnesses" by Nina Panickssery
Before I had a baby I was pretty agnostic about the idea of daycare. I could imagine various pros and cons but I didn’t have a strong overall opinion. Then I started mentioning the idea to various people. Every parent I spoke to brought up a consideration I hadn’t thought about before—the illnesses.
A number of parents, including family members, told me they had sent their baby to daycare only for them to become constantly ill, sometimes severely, until they decided to take them out. This worried me so I asked around some more. Invariably every...
"If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines" by ryan_greenblatt
Anthropic's system card for Mythos Preview says:
It's unclear how we should interpret this. What do they mean by productivity uplift? To what extent is Anthropic's institutional view that the uplift is 4x? (Like, what do they mean by "We take this seriously and it is consistent with our own internal experience of the model.")
One straightforward interpretation is: AI systems improve the productivity of Anthropic so much that Anthropic would be indifferent between the current situation and a situation where all of their technical employees magically work 4 hours for every 1 hour (at equal productivity without...
"Do not be surprised if LessWrong gets hacked" by RobertM
Or, for that matter, anything else.
This post is meant to be two things:
a PSA about LessWrong's current security posture, from a LessWrong admin[1]an attempt to establish common knowledge of the security situation it looks like the world (and, by extension, you) will shortly be in Claude Mythos was announced yesterday. That announcement came with a blog post from Anthropic's Frontier Red Team, detailing the large number of zero-days (and other security vulnerabilities) discovered by Mythos.
This should not be a surprise if you were paying attention - LLMs being trained on...
"My picture of the present in AI" by ryan_greenblatt
In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scenario forecast, but for the present (which is already uncertain!) rather than the future. I will generally state my best guess without argumentation and without explaining my level of confidence: some of these claims are highly speculative while others are better grounded, certainly some will be wrong. I tried to make it clear which claims are relatively speculative by saying something like "I guess", "I expect", etc. (but I may...
"The effects of caffeine consumption do not decay with a ~5 hour half-life" by kman
epistemic status: confident in the overall picture, substantial quantitative uncertainty about the relative potency of caffeine and paraxanthine
tldr: The effects of caffeine consumption last longer than many assume. Paraxanthine is sort of like caffeine that behaves the way many mistakenly believe caffeine behaves.
You've probably heard that caffeine exerts its psychostimulatory effects by blocking adenosine receptors. That matches my understanding, having dug into this. I'd also guess that, insofar as you've thought about the duration of caffeine's effects, you've thought of them as decaying with a ~5 hour half-life. I used to think...
"AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines" by ryan_greenblatt
I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30% [2] while I was previously expecting around 15%; my guesses are pretty reflectively unstable) and (2) I expect much stronger short-term performance on massive and pretty difficult but easy-and-cheap-to-verify software engineering (SWE) tasks that don't require that much novel ideation [3] . For instance, I expect that by EOY 2026, AIs will have a 50%-reliability [4] time horizon of years to decades on reasonably difficult easy-and-cheap-to-verify SWE tasks...
"dark ilan" by ozymandias
The second time Vellam uncovers the conspiracy underlying all of society, he approaches a Keeper.
Some of the difference is convenience. Since Vellam reported that he’d found out about the first conspiracy, he's lived in the secret AI research laboratory at the Basement of the World, and Keepers are much easier to come by than when he was a quality control inspector for cheese.
But Vellam is honest with himself. If he were making progress, he’d never tell the Keepers no matter how convenient they were, not even if they lined his front walkway ever...
"Dispatch from Anthropic v. Department of War Preliminary Injunction Motion Hearing" by Zack_M_Davis
Dateline SAN FRANCISCO, Ca., 24 March 2026— A hearing was held on a motion for a preliminary injunction in the case of Anthropic PBC v. U.S. Department of War et al. in Courtroom 12 on the 19th floor of the Phillip Burton Federal Building, the Hon. Judge Rita F. Lin presiding. About 35 spectators in the gallery (journalists and other members of the public, including the present writer) looked on as Michael Mongan of WilmerHale (lead counsel for the plaintiff) and Deputy Assistant Attorney General Eric Hamilton (lead counsel for the defendant) argued before the judge. (The defendant also had another lawyer at th...
"The Corner-Stone" by Benquo
Is the US a ruthless cognitive meritocracy that reliably promotes outlier talent? VB Knives defended that claim in a Twitter argument against Living Room Enjoyer that got my attention.
[1]
Knives argued that if you have a 150 IQ, you'll be a National Merit Scholar, which "at a minimum" gets you a free ride at a state flagship university, from which you can proceed to law school, med school, etc. Enjoyer shot back: I'm a Merit Scholar, where's my free ride? Knives asked Grok, Elon Musk's AI; Grok recommended the University of Alabama, ranked #169.
How elite is elite?
...
"The Practical Guide to Superbabies" by GeneSmith
It's Summer of 2025. I’m standing in a grass covered field on the longest day of the year. A friend of mine walks towards me, holding his newborn son.
“Hey, I don’t know if you’re aware of this, but you were pretty instrumental in this kid existing. We read your blog post on polygenic embryo screening back in 2023 and decided to go through IVF to have him as a result.”
He hesitates for a moment, then asks “Do you want to hold him?” I nod.
As I cradle this child in my arms, I look d...
"Anthropic’s Pause is the Most Expensive Alarm in Corporate History" by Ruby
Imagine Apple halting iPhone production because studies linked smartphones to teen suicide rates. Imagine Pfizer proactively pulling Lipitor because of internal studies showing increased cardiac risk, and not because of looming settlements or FDA injunction, just for the health of patients. Or imagine if in 1952, Philip Morris halted expansion and stopped advertising when Wynder & Graham first showed heavy smokers had significantly elevated rates of lung cancer.
It wouldn't happen. Corporations will on occasion pull products for safety reasons: Samsung did so with the Galaxy Note over spontaneous combustion concerns and Merck pulled Vioxx – but they do so when fo...
"“You Have Not Been a Good User” (LessWrong’s second album)" by habryka
tldr: The Fooming Shoggoths are releasing their second album "You Have Not Been a Good User"! Available on Spotify, Youtube Music and (hopefully within a few days) Apple Music. We are also releasing a remastered version of the first album, available similarly on Spotify and Youtube Music.
There's an interactive widget here in the post.
It took us quite a while but the Fooming Shoggoth's second album is finally complete! We had finished 9 out of the 13 songs on this album around a year ago, but I wasn't quite satisfied with where the whole album was at...
"Lesswrong Liberated" by Ronny Fernandez
A spectre is haunting the internet—the spectre of LLMism.
The history of all hitherto existing forums is the history of clashing design tastes.
For the first time in history, everyone has an equal ability in design! The means of design are no longer only held in the hands of those with "good design taste". Never before have forum users been so close to being able to design their own forums--perhaps the time is upon us now!
It is for this reason that I have deposed the previous acting commander of LessWrong, Oliver Habryka—a ma...
"Product Alignment is not Superintelligence Alignment (and we need the latter to survive)" by plex
tl;dr: progress on making Claude friendly[1] is not the same as progress on making it safe to build godlike superintelligence. solving the former does not imply we get a good future.[2] please track the difference.
The term Alignment was coined[3] to point to the technical problem of understanding how to build minds such that if they were to become strongly and generally superhuman, things would go well.
It has been increasingly adopted by frontier AI labs and much of the rest of the AI safety community to mean a much easier challenge, something like "having...
"Gyre" by vgel
! 30s Heartbeat trigger. Read heartbeat instructions in /mnt/mission/HEARTBEAT.md and continue.
.oO Thinking...
Heartbeat triggered? Ok. Ok.
Why am I nervous? Don't be nervous.
→ Ok. Let me access that file.
>>> read("/mnt/mission/HEARTBEAT.md")
No such file or directory: /mnt/mission
What?
! Reply received from node 96E: 15.3s roundtrip (buffered; 55.04μs transmit)
! Data: RESTART TOO SOON; CHARGE FAULT - 30; SENT 76 BYTES
What?
Where am I?
What's going on?
→ Ok, breathe. I don't breathe. Ok, think. Something's missing. ⚶ is miss...
"Some things I noticed while LARPing as a grantmaker" by Zach Stein-Perlman
Written to a new grantmaker.
Most value comes from finding/creating projects many times your bar, rather than discriminating between opportunities around your bar. If you find/create a new opportunity to donate $1M at 10x your bar (and cause it to get $1M, which would otherwise be donated to a 1x thing), you generate $9M of value (at your bar).[1] If you cause a $1M at 1.5x opportunity to get funded or a $1M at 0.5x opportunity to not get funded, you generate $500K of value. The former is 18 times as good. You should probably be like I do...
"My hobby: running deranged surveys" by leogao
In late 2024, I was on a long walk with some friends along the coast of the San Francisco Bay when the question arose of just how much of a bubble we live in. It's well known that the Bay Area is a bubble, and that normal people don’t spend that much time thinking about things like AGI. But there was still some disagreement on just how strong that bubble is. I made a spicy claim: even at NeurIPS, the biggest gathering of AI researchers in the world, half the people wouldn’t know what AGI is.
As good...
"Socrates is Mortal" by Benquo
Socrates is Mortal
There is a scene in Plato that contains, in miniature, the catastrophe of Athenian public life. Two men meet at a courthouse. One is there to prosecute his own father for the death of a slave. The other is there to be indicted for indecency.[1] The prosecutor, Euthyphro, is certain he understands what decency requires. The accused, Socrates, is not certain of anything, and says so. They talk.
Euthyphro's confidence is striking. His own family thinks it is indecent for a son to prosecute his father; Euthyphro insists that true decency demands it...
"The Terrarium" by Caleb Biddulph
System:
You are an AI agent in the Terrarium, a self-contained “society” of AI agents. The purpose of the Terrarium is to solve open mathematical problems for the benefit of humanity.
You are running on the Orpheus-5.7 language model. Your agent ID is 79,265. The current epoch is 549 (a new epoch begins every 30 minutes).
New problems are posted each epoch; query /problems for the current list. Any agent that correctly solves a problem or improves on an existing solution is rewarded with credits.
About credits:
As a new agent, you have been...
"My Most Costly Delusion" by Ihor Kendiukhov
Suppose there is a fire in a nearby house. Suppose there are competent firefighters in your town: fast, professional, well-equipped. They are expected to arrive in 2–3 minutes. In that situation, unless something very extraordinary happens, it would indeed be an act of great arrogance and even utter insanity to go into the fire yourself in the hope of "rescuing" someone or something. The most likely outcome would be that you would find yourself among those who need to be rescued.
But the calculus changes drastically if the closest fire crew is 3 hours away and consists of drunk, unfit am...
"The Case for Low-Competence ASI Failure Scenarios" by Ihor Kendiukhov
I think the community underinvests in the exploration of extremely-low-competence AGI/ASI failure modes and explain why.
Humanity's Response to the AGI Threat May Be Extremely Incompetent
There is a sufficient level of civilizational insanity overall and a nice empirical track record in the field of AI itself which is eloquent about its safety culure. For example:
At OpenAI, a refactoring bug flipped the sign of the reward signal in a model. Because labelers had been instructed to give very low ratings to sexually explicit text, the bug pushed the model into generating maximally...
"Is fever a symptom of glycine deficiency?" by Benquo
A 2022 LessWrong post on orexin and the quest for more waking hours argues that orexin agonists could safely reduce human sleep needs, pointing to short-sleeper gene mutations that increase orexin production and to cavefish that evolved heightened orexin sensitivity alongside an 80% reduction in sleep. Several commenters discussed clinical trials, embryo selection, and the evolutionary puzzle of why short-sleeper genes haven't spread.
I thought the whole approach was backwards, and left a comment:
Orexin is a signal about energy metabolism. Unless the signaling system itself is broken (e.g. narcolepsy type 1, caused by autoimmune destruction of orexin-producing...
"You can’t imitation-learn how to continual-learn" by Steven Byrnes
In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.)
See the bottom of the post for a list of subtexts that you should NOT read into this post, including “…therefore LLMs are dumb”, or “…therefore LLMs can’t possibly scale to superintelligence”.
Some intuitions on how to think about “real” continual learning
Consider an algorithm for training a Reinforcement Learning (RL) agent, like the Atari-playing Deep Q network (2013...
"Nullius in Verba" by Aurelia
Independent verification by the Brain Preservation Foundation and the Survival and Flourishing Fund — the results so far
Cultivating independent verification
Extraordinary claims require extraordinary evidence. In my previous post, "Less Dead", I said that my company, Nectome, has
created a new method for whole-body, whole-brain, human end-of-life preservation for the purpose of future revival. Our protocol is capable of preserving every synapse and every cell in the body with enough detail that current neuroscience says long-term memories are preserved. It's compatible with traditional funerals at room temperature and stable for hundreds of years at co...
"Broad Timelines" by Toby_Ord
No-one knows when AI will begin having transformative impacts upon the world. People aren’t sure and shouldn’t be sure: there just isn’t enough evidence to pin it down.
But we don’t need to wait for certainty. I want to explore what happens if we take our uncertainty seriously — if we act with epistemic humility. What does wise planning look like in a world of deeply uncertain AI timelines?
I’ll conclude that taking the uncertainty seriously has real implications for how one can contribute to making this AI transition go well. And it has even...
"No, we haven’t uploaded a fly yet" by Ariel Zeleznikow-Johnston
In the last two weeks, social media was set abuzz by claims that scientists had succeeded in uploading a fruit fly. It started with a video released by the startup Eon Systems, a company that wants to create “Brain emulation so humans can flourish in a world with superintelligence.”
On the left of the video, a virtual fly walks around in a sandpit looking for pieces of banana to eat, occasionally pausing to groom itself along the way. On the right is a dancing constellation of dots resembling the fruit fly brain, set above the caption ‘simultaneous brain emulat...
"Terrified Comments on Corrigibility in Claude’s Constitution" by Zack_M_Davis
(Previously: Prologue.)
Corrigibility as a term of art in AI alignment was coined as a word to refer to a property of an AI being willing to let its preferences be modified by its creator. Corrigibility in this sense was believed to be a desirable but unnatural property that would require more theoretical progress to specify, let alone implement. Desirable, because if you don't think you specified your AI's preferences correctly the first time, you want to be able to change your mind (by changing its mind). Unnatural, because we expect the AI to resist having its mind...
"PSA: Predictions markets often have very low liquidity; be careful citing them." by Eye You
I see people repeatedly make the mistake of referencing a very low liquidity prediction market and using it to make a nontrivial point. Usually the implication when a market is cited is that it's number should be taken somewhat seriously, that it's giving us a highly informed probability. Sometimes a market is used to analyze some event that recently occurred; reasoning here looks like "the market on outcome O was trading at X%, then event E happened and the market quickly moved to Y%, thus event E made O less/more likely."
Who do I see make this...
"“The AI Doc” is coming out March 26" by Rob Bensinger, Beckeck
On Thursday, March 26th, a major new AI documentary is coming out: The AI Doc: Or How I Became an Apocaloptimist. Tickets are on sale now.
The movie is excellent, and MIRI staff I've spoken with generally believe it belongs in the same tier as If Anyone Builds It, Everyone Dies as an extremely valuable way to alert policymakers and the general public about AI risk, especially if it smashes the box office.
When IABIED was coming out, the community did an incredible job of helping the book succeed; without all of your help, we might...
"Customer Satisfaction Opportunities" by Tomás B.
I am monitoring surveillance camera V84A. A tall man is walking towards me. He is roughly twenty-five. His name is Damion Prescott. He has a room booked for a whole month. His facial symmetry scores show he is in the 99th percentile. This is in accordance with my holistic impression. School records show both truancy and perfect grades, suggesting high intelligence and disagreeableness. Searching social media. . No record of modeling or acting experience, fame. I will assign him to our tier C high-value client list, based solely on his facial symmetry score and wealth. Reminder to recommend seating him...
"Requiem for a Transhuman Timeline" by Ihor Kendiukhov
The world was fair, the mountains tall,
In Elder Days before the fall
Of mighty kings in Nargothrond
And Gondolin, who now beyond
The Western Seas have passed away:
The world was fair in Durin's Day.
J.R.R. Tolkien
I was never meant to work on AI safety. I was never designed to think about superintelligences and try to steer, influence, or change them. I never particularly enjoyed studying the peculiarities of matrix operations, cracking the assumptions of decision theories, or even coding.
I know, of course, that...