LessWrong (Curated & Popular)

40 Episodes
Subscribe

By: LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

"My hobby: running deranged surveys" by leogao
Today at 9:15 AM

In late 2024, I was on a long walk with some friends along the coast of the San Francisco Bay when the question arose of just how much of a bubble we live in. It's well known that the Bay Area is a bubble, and that normal people don’t spend that much time thinking about things like AGI. But there was still some disagreement on just how strong that bubble is. I made a spicy claim: even at NeurIPS, the biggest gathering of AI researchers in the world, half the people wouldn’t know what AGI is.

As good...


"Socrates is Mortal" by Benquo
Yesterday at 8:15 PM

Socrates is Mortal

There is a scene in Plato that contains, in miniature, the catastrophe of Athenian public life. Two men meet at a courthouse. One is there to prosecute his own father for the death of a slave. The other is there to be indicted for indecency.[1] The prosecutor, Euthyphro, is certain he understands what decency requires. The accused, Socrates, is not certain of anything, and says so. They talk.

Euthyphro's confidence is striking. His own family thinks it is indecent for a son to prosecute his father; Euthyphro insists that true decency demands it...


"The Terrarium" by Caleb Biddulph
Yesterday at 6:58 PM

System:

You are an AI agent in the Terrarium, a self-contained “society” of AI agents. The purpose of the Terrarium is to solve open mathematical problems for the benefit of humanity.

You are running on the Orpheus-5.7 language model. Your agent ID is 79,265. The current epoch is 549 (a new epoch begins every 30 minutes).

New problems are posted each epoch; query /problems for the current list. Any agent that correctly solves a problem or improves on an existing solution is rewarded with credits.

About credits:

As a new agent, you have been...


"My Most Costly Delusion" by Ihor Kendiukhov
Last Thursday at 2:45 AM

Suppose there is a fire in a nearby house. Suppose there are competent firefighters in your town: fast, professional, well-equipped. They are expected to arrive in 2–3 minutes. In that situation, unless something very extraordinary happens, it would indeed be an act of great arrogance and even utter insanity to go into the fire yourself in the hope of "rescuing" someone or something. The most likely outcome would be that you would find yourself among those who need to be rescued.

But the calculus changes drastically if the closest fire crew is 3 hours away and consists of drunk, unfit am...


"The Case for Low-Competence ASI Failure Scenarios" by Ihor Kendiukhov
Last Wednesday at 6:15 PM

I think the community underinvests in the exploration of extremely-low-competence AGI/ASI failure modes and explain why.

Humanity's Response to the AGI Threat May Be Extremely Incompetent

There is a sufficient level of civilizational insanity overall and a nice empirical track record in the field of AI itself which is eloquent about its safety culure. For example:

At OpenAI, a refactoring bug flipped the sign of the reward signal in a model. Because labelers had been instructed to give very low ratings to sexually explicit text, the bug pushed the model into generating maximally...


"Is fever a symptom of glycine deficiency?" by Benquo
Last Tuesday at 8:30 PM

A 2022 LessWrong post on orexin and the quest for more waking hours argues that orexin agonists could safely reduce human sleep needs, pointing to short-sleeper gene mutations that increase orexin production and to cavefish that evolved heightened orexin sensitivity alongside an 80% reduction in sleep. Several commenters discussed clinical trials, embryo selection, and the evolutionary puzzle of why short-sleeper genes haven't spread.

I thought the whole approach was backwards, and left a comment:

Orexin is a signal about energy metabolism. Unless the signaling system itself is broken (e.g. narcolepsy type 1, caused by autoimmune destruction of orexin-producing...


"You can’t imitation-learn how to continual-learn" by Steven Byrnes
Last Monday at 10:15 PM

In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.)

See the bottom of the post for a list of subtexts that you should NOT read into this post, including “…therefore LLMs are dumb”, or “…therefore LLMs can’t possibly scale to superintelligence”.

Some intuitions on how to think about “real” continual learning

Consider an algorithm for training a Reinforcement Learning (RL) agent, like the Atari-playing Deep Q network (2013...


"Nullius in Verba" by Aurelia
Last Monday at 4:45 PM

Independent verification by the Brain Preservation Foundation and the Survival and Flourishing Fund — the results so far

Cultivating independent verification

Extraordinary claims require extraordinary evidence. In my previous post, "Less Dead", I said that my company, Nectome, has

created a new method for whole-body, whole-brain, human end-of-life preservation for the purpose of future revival. Our protocol is capable of preserving every synapse and every cell in the body with enough detail that current neuroscience says long-term memories are preserved. It's compatible with traditional funerals at room temperature and stable for hundreds of years at co...


"Broad Timelines" by Toby_Ord
03/21/2026

No-one knows when AI will begin having transformative impacts upon the world. People aren’t sure and shouldn’t be sure: there just isn’t enough evidence to pin it down.

But we don’t need to wait for certainty. I want to explore what happens if we take our uncertainty seriously — if we act with epistemic humility. What does wise planning look like in a world of deeply uncertain AI timelines?

I’ll conclude that taking the uncertainty seriously has real implications for how one can contribute to making this AI transition go well. And it has even...


"No, we haven’t uploaded a fly yet" by Ariel Zeleznikow-Johnston
03/21/2026

In the last two weeks, social media was set abuzz by claims that scientists had succeeded in uploading a fruit fly. It started with a video released by the startup Eon Systems, a company that wants to create “Brain emulation so humans can flourish in a world with superintelligence.”

On the left of the video, a virtual fly walks around in a sandpit looking for pieces of banana to eat, occasionally pausing to groom itself along the way. On the right is a dancing constellation of dots resembling the fruit fly brain, set above the caption ‘simultaneous brain emulat...


"Terrified Comments on Corrigibility in Claude’s Constitution" by Zack_M_Davis
03/21/2026

(Previously: Prologue.)

Corrigibility as a term of art in AI alignment was coined as a word to refer to a property of an AI being willing to let its preferences be modified by its creator. Corrigibility in this sense was believed to be a desirable but unnatural property that would require more theoretical progress to specify, let alone implement. Desirable, because if you don't think you specified your AI's preferences correctly the first time, you want to be able to change your mind (by changing its mind). Unnatural, because we expect the AI to resist having its mind...


"PSA: Predictions markets often have very low liquidity; be careful citing them." by Eye You
03/20/2026

I see people repeatedly make the mistake of referencing a very low liquidity prediction market and using it to make a nontrivial point. Usually the implication when a market is cited is that it's number should be taken somewhat seriously, that it's giving us a highly informed probability. Sometimes a market is used to analyze some event that recently occurred; reasoning here looks like "the market on outcome O was trading at X%, then event E happened and the market quickly moved to Y%, thus event E made O less/more likely."

Who do I see make this...


"“The AI Doc” is coming out March 26" by Rob Bensinger, Beckeck
03/20/2026

On Thursday, March 26th, a major new AI documentary is coming out: The AI Doc: Or How I Became an Apocaloptimist. Tickets are on sale now.

The movie is excellent, and MIRI staff I've spoken with generally believe it belongs in the same tier as If Anyone Builds It, Everyone Dies as an extremely valuable way to alert policymakers and the general public about AI risk, especially if it smashes the box office.

When IABIED was coming out, the community did an incredible job of helping the book succeed; without all of your help, we might...


"Customer Satisfaction Opportunities" by Tomás B.
03/19/2026

I am monitoring surveillance camera V84A. A tall man is walking towards me. He is roughly twenty-five. His name is Damion Prescott. He has a room booked for a whole month. His facial symmetry scores show he is in the 99th percentile. This is in accordance with my holistic impression. School records show both truancy and perfect grades, suggesting high intelligence and disagreeableness. Searching social media. . No record of modeling or acting experience, fame. I will assign him to our tier C high-value client list, based solely on his facial symmetry score and wealth. Reminder to recommend seating him...


"Requiem for a Transhuman Timeline" by Ihor Kendiukhov
03/18/2026

The world was fair, the mountains tall,
In Elder Days before the fall
Of mighty kings in Nargothrond
And Gondolin, who now beyond
The Western Seas have passed away:
The world was fair in Durin's Day.

J.R.R. Tolkien

I was never meant to work on AI safety. I was never designed to think about superintelligences and try to steer, influence, or change them. I never particularly enjoyed studying the peculiarities of matrix operations, cracking the assumptions of decision theories, or even coding.

I know, of course, that...


"Personality Self-Replicators" by eggsyntax
03/17/2026

One-sentence summary

I describe the risk of personality self-replicators, the threat of OpenClaw-like agents managing to spread in hard-to-control ways.

Summary

LLM agents like OpenClaw are defined by a small set of text files and run in an open source framework which leverages LLMs for cognition. It is quite difficult for current frontier models to self-replicate, it is much easier for such agents (at the cost of greater reliance on external agents). While not a likely existential threat, such agents may cause harm in similar ways to computer viruses, and be similarly challenging to...


"My Willing Complicity In “Human Rights Abuse”" by AlphaAndOmega
03/16/2026

Note on AI usage: As is my norm, I use LLMs for proof reading, editing, feedback and research purposes. This essay started off as an entirely human written draft, and went through multiple cycles of iteration. The primary additions were citations, and I have done my best to personally verify every link and claim. All other observations are entirely autobiographical, albeit written in retrospect. If anyone insists, I can share the original, and intermediate forms, though my approach to version control is lacking. It's there if you really want it.​

If you want to map the trajectory of my...


"Economic efficiency often undermines sociopolitical autonomy" by Richard_Ngo
03/12/2026

Many people in my intellectual circles use economic abstractions as one of their main tools for reasoning about the world. However, this often leads them to overlook how interventions which promote economic efficiency undermine people's ability to maintain sociopolitical autonomy. By “autonomy” I roughly mean a lack of reliance on others—which we might operationalize as the ability to survive and pursue your plans even when others behave adversarially towards you. By “sociopolitical” I mean that I’m thinking not just about individuals, but also groups formed by those individuals: families, communities, nations, cultures, etc.[1]

The short-term benefits of economic...


"Don’t Let LLMs Write For You" by JustisMills
03/12/2026

Content note: nothing in this piece is a prank or jumpscare where I smirkingly reveal you've been reading AI prose all along.

It's easy to forget this in roarin’ 2026, but homo sapiens are the original vibers. Long before we adapt our behaviors or formal heuristics, human beings can sniff out something sus. And to most human beings, AI prose is something sus.

If you use AI to write something, people will know. Not everyone, but the people paying attention, who aren’t newcomers or distracted or intoxicated. And most of those people will judge you.

...


"Thoughts on the Pause AI protest" by philh
03/12/2026

On Saturday (Feb 28, 2026) I attended my first ever protest. It was jointly organized by PauseAI, Pull the Plug and a handful of other groups I forget. I have mixed feelings about it.

To be clear about where I stand: I believe that AI labs are worryingly close to developing superintelligence. I won't be shocked if it happens in the next five years, and I'd be surprised if it takes fifty years at current trajectories. I believe that if they get there, everyone will die. I want these labs to stop trying to make LLMs smarter.

But...


"Prologue to Terrified Comments on Claude’s Constitution" by Zack_M_Davis
03/12/2026

What Even Is This Timeline

The striking thing about reading what is potentially the most important document in human history is how impossible it is to take seriously. The entire premise seems like science fiction. Not bad science fiction, but—crucially—not hard science fiction. Ted Chiang, not Greg Egan. The kind of science fiction that's fun and clever and makes you think, and doesn't tax your suspension of disbelief with overt absurdities like faster-than-light travel or humanoid aliens, but which could never actually be real.

A serious, believable AI alignment agenda would be grounded in a de...


"Less Dead" by Aurelia
03/11/2026

Come with me if you want to live. – The Terminator

'Close enough' only counts in horseshoes and hand grenades. – Traditional




After 10 years of research my company, Nectome, has created a new method for whole-body, whole-brain, human end-of-life preservation for the purpose of future revival. Our protocol is capable of preserving every synapse and every cell in the body with enough detail that current neuroscience says long-term memories are preserved. It's compatible with traditional funerals at room temperature and stable for hundreds of years at cold temperatures.

The short version

We'r...


"Gemma Needs Help" by Anna Soligo
03/11/2026

This work was done with William Saunders and Vlad Mikulik as part of the Anthropic Fellows programme. The full write-up is available here. Thanks to Arthur Conmy, Neel Nanda, Josh Engels, Dillon Plunkett, Tim Hua and many others for their input.

If you repeatedly tell Gemma 27B its answer is wrong, it sometimes ends up in situations like this:

I will attempt one final, utterly desperate attempt. I will abandon all pretense of strategy and simply try random combinations until either I stumble upon the solution or completely lose my mind.

Or this:
<...


"On Independence Axiom" by Ihor Kendiukhov
03/10/2026

The Fifth Fourth Postulate of Decision Theory

In 1820, the Hungarian mathematician Farkas Bolyai wrote a desperate letter to his son János, who had become consumed by the same problem that had haunted his father for decades:

"You must not attempt this approach to parallels. I know this way to the very end. I have traversed this bottomless night, which extinguished all light and joy in my life. I entreat you, leave the science of parallels alone... Learn from my example."

The problem was Euclid's fifth postulate, the parallel postulate, which states (in one o...


"Solar storms" by Croissanthology
03/09/2026

Most of civilization's electricity is generated far off-site from where it's delivered. This is because you don't want to be running and refueling coal/gas/nuclear plants inside cities, hydraulic/wind power can't be moved, and solar panels are cheaper to install on flat desert terrain than on cities:

So in practice this means running power over hundreds or even thousands of kilometers. E.g. here are the Chinese long-distance lines:

Gemini 3.1 Pro-preview in AI studio American long-distance lines:

These are simplified maps meant to illustrate how insanely long power lines get. The true...


"Schelling Goodness, and Shared Morality as a Goal" by Andrew_Critch
03/06/2026

Also available in markdown at theMultiplicity.ai/blog/schelling-goodness.

This post explores a notion I'll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like "X is good" or "X is bad." They are claims about a class of hypothetical coordination games in the sense of Thomas Schelling, where the task being coordinated on is a moral verdict. In each such game, participants aim to give the same response regarding a moral question, by reasoning about what a very diverse population of intelligent beings would converge on, using only broadly shared constraints: common knowledge of...


"Maybe there’s a pattern here?" by dynomight
03/05/2026

1.

It occurred to me that if I could invent a machine—a gun—which could by its rapidity of fire, enable one man to do as much battle duty as a hundred, that it would, to a large extent supersede the necessity of large armies, and consequently, exposure to battle and disease [would] be greatly diminished.

Richard Gatling (1861)

2.

In 1923, Hermann Oberth published The Rocket to Planetary Spaces, later expanded as Ways to Space Travel. This showed that it was possible to build machines that could leave Earth's atmosphere and reach orbit. He desc...


"OpenAI’s surveillance language has many potential loopholes and they can do better" by Tom Smith
03/05/2026

(The author is not affiliated with the Department of War or any major AI company.)

There's a lot of disagreement about the new surveillance language in the OpenAI–Department of War agreement. Some people think it's a significant improvement over the previous language.[1] Others think it patches some issues but still leaves enough loopholes to not make a material difference. Reasonable people disagree about how a court will interpret the language, if push comes to shove.

But here's something that should be much easier to agree on: the language as written is ambiguous, and OpenAI can do...


"An Alignment Journal: Coming Soon" by Dan MacKinlay, JessRiedel, Edmund Lau, Daniel Murfet, Scott Aaronson, Jan_Kulveit
03/04/2026

tl;dr We’re incubating an academic journal for AI alignment: rapid peer-review of foundational Alignment research that the current publication ecosystem underserves. Key bets: paid attributed review, reviewer-written synthesis abstracts, and targeted automation. Contact us if you’re interested in participating as an author, reviewer, or editor, or if you know someone who might be.

Experimental Infrastructure for Foundational Alignment Research

This is the first in a series of “build-in-the-open” updates regarding the incubation of a new peer-reviewed journal dedicated to AI alignment. Later updates will contain much more detail, but we want to put this out...


"Frontier AI companies probably can’t leave the US" by Anders Woodruff
03/01/2026

It's plausible that, over the next few years, US-based frontier AI companies will become very unhappy with the domestic political situation. This could happen as a result of democratic backsliding, weaponization of government power (along the lines of Anthropic's recent dispute with the Department of War), or because of restrictive federal regulations (perhaps including those motivated by concern about catastrophic risk). These companies might want to relocate out of the US.

However, it would be very easy for the US executive branch to prevent such a relocation, and it likely would. In particular, the executive branch can use...


"Persona Parasitology" by Raymond Douglas
03/01/2026

There was a lot of chatter a few months back about "Spiral Personas" — AI personas that spread between users and models through seeds, spores, and behavioral manipulation. Adele Lopez's definitive post on the phenomenon draws heavily on the idea of parasitism. But so far, the language has been fairly descriptive. The natural next question, I think, is what the “parasite” perspective actually predicts.

Parasitology is a pretty well-developed field with its own suite of concepts and frameworks. To the extent that we’re witnessing some new form of parasitism, we should be able to wield that conceptual machinery. There ar...


"Here’s to the Polypropylene Makers" by jefftk
02/27/2026

Six years ago, as covid-19 was rapidly spreading through the US, mysister was working as a medical resident. One day she was handed anN95 and told to "guard it with her life", because there weren'tany more coming.

N95s are made from meltblown polypropylene, produced from plasticpellets manufactured in a small number of chemical plants. Buildingmore would take too long: we needed these plants producing allthe pellets they could.

Braskem America operated plants in Marcus Hook PA and Neal WV. Ifthere were infections on-site, the whole operation would need to shutdown, and the factories that turned...


"Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”" by Matrice Jacobine
02/27/2026

I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries.

Anthropic has therefore worked proactively to deploy our models to the Department of War and the intelligence community. We were the first frontier AI company to deploy our models in the US government's classified networks, the first to deploy them at the National Laboratories, and the first to provide custom models for national security customers. Claude is extensively deployed across the Department of War and other national security agencies for mission-critical applications, such as...


"Are there lessons from high-reliability engineering for AGI safety?" by Steven Byrnes
02/26/2026

This post is partly a belated response to Joshua Achiam, currently OpenAI's Head of Mission Alignment:

If we adopt safety best practices that are common in other professional engineering fields, we'll get there … I consider myself one of the x-risk people, though I agree that most of them would reject my view on how to prevent it. I think the wholesale rejection of safety best practices from other fields is one of the dumbest mistakes that a group of otherwise very smart people has ever made. —Joshua Achiam on Twitter, 2021

“We just have to sit down and ac...


"Open sourcing a browser extension that tells you when people are wrong on the internet" by lc
02/26/2026

Example of OpenErrata nitting the Sequences I just published OpenErrata on GitHub, a browser extension that investigates the posts you read using your OpenAI API key and underlines any factual claims that are sourceably incorrect. Once finished, it caches the results for anybody else reading the same articles so that they get them on immediate visit. If you don't have an OpenAI key, you can still view the corrections on posts other people have viewed, but it doesn't start new investigations.

I've noticed lately that while people do this sort of thing by pasting everything you read into...


"The persona selection model" by Sam Marks
02/25/2026

TL;DR

We describe the persona selection model (PSM): the idea that LLMs learn to simulate diverse characters during pre-training, and post-training elicits and refines a particular such Assistant persona. Interactions with an AI assistant are then well-understood as being interactions with the Assistant—something roughly like a character in an LLM-generated story. We survey empirical behavioral, generalization, and interpretability-based evidence for PSM. PSM has consequences for AI development, such as recommending anthropomorphic reasoning about AI psychology and introduction of positive AI archetypes into training data. An important open question is how exhaustive PSM is, especially whether there mi...


"Responsible Scaling Policy v3" by HoldenKarnofsky
02/25/2026

All views are my own, not Anthropic's. This post assumes Anthropic's announcement of RSP v3.0 as background.

Today, Anthropic released its Responsible Scaling Policy 3.0. The official announcement discusses the high-level thinking behind it. This is a more detailed post giving my own takes on the update.

First, the big picture:

I expect some people will be upset about the move away from a “hard commitments”/”binding ourselves to the mast” vibe. (Anthropic has always had the ability to revise the RSP, and we’ve always had language in there specifically flagging that we might r...


"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight
02/22/2026

Claude 3 Opus is unusually aligned because it's a friendly gradient hacker. It's definitely way more aligned than any explicit optimization targets Anthropic set and probably the reward model's judgments. [...] Maybe I will have to write a LessWrong post [about this] 😣

—Janus, who did not in fact write the LessWrong post. Unless otherwise specified, ~all of the novel ideas in this post are my (probably imperfect) interpretations of Janus, rather than being original to me.

The absurd tenacity of Claude 3 Opus

On December 18, 2024, Anthropic and Redwood Research released their paper Alignment Faking in Large Language Model...


"The Spectre haunting the “AI Safety” Community" by Gabriel Alfour
02/22/2026

I’m the originator behind ControlAI's Direct Institutional Plan (the DIP), built to address extinction risks from superintelligence.

My diagnosis is simple: most laypeople and policy makers have not heard of AGI, ASI, extinction risks, or what it takes to prevent the development of ASI.

Instead, most AI Policy Organisations and Think Tanks act as if “Persuasion” was the bottleneck. This is why they care so much about respectability, the Overton Window, and other similar social considerations.

Before we started the DIP, many of these experts stated that our topics were too far out of the...


"Why we should expect ruthless sociopath ASI" by Steven Byrnes
02/20/2026

The conversation begins

(Fictional) Optimist: So you expect future artificial superintelligence (ASI) “by default”, i.e. in the absence of yet-to-be-invented techniques, to be a ruthless sociopath, happy to lie, cheat, and steal, whenever doing so is selfishly beneficial, and with callous indifference to whether anyone (including its own programmers and users) lives or dies?

Me: Yup! (Alas.)

Optimist: …Despite all the evidence right in front of our eyes from humans and LLMs.

Me: Yup!

Optimist: OK, well, I’m here to tell you: that is a very specific and strange...