LessWrong (Curated & Popular)

40 Episodes
Subscribe

By: LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

“I enjoyed most of IABED” by Buck
Yesterday at 4:15 PM

I listened to "If Anyone Builds It, Everyone Dies" today.

I think the first two parts of the book are the best available explanation of the basic case for AI misalignment risk for a general audience. I thought the last part was pretty bad, and probably recommend skipping it. Even though the authors fail to address counterarguments that I think are crucial, and as a result I am not persuaded of the book's thesis and think the book neglects to discuss crucial aspects of the situation and makes poor recommendations, I would happily recommend the book to a...


“‘If Anyone Builds It, Everyone Dies’ release day!” by alexvermeer
Last Tuesday at 9:15 PM

Back in May, we announced that Eliezer Yudkowsky and Nate Soares's new book If Anyone Builds It, Everyone Dies was coming out in September. At long last, the book is here![1]


US and UK books, respectively. IfAnyoneBuildsIt.com

Read on for info about reading groups, ways to help, and updates on coverage the book has received so far.

Discussion Questions & Reading Group Support

We want people to read and engage with the contents of the book. To that end, we’ve published a list of discussion questions. Find it here: Discussion Qu...


“Obligated to Respond” by Duncan Sabien (Inactive)
Last Tuesday at 2:30 AM

And, a new take on guess culture vs ask culture

Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so, but it's only paid subscriptions that make it possible for me to write. If you find a coffee's worth of value in this or any of my other work, please consider signing up to support me; every bill I can pay with writing is a bill I don’t have to pay by doing other stuff instead. I also accept and greatly ap...


“Chesterton’s Missing Fence” by jasoncrawford
Last Monday at 7:58 AM

The inverse of Chesterton's Fence is this:

Sometimes a reformer comes up to a spot where there once was a fence, which has since been torn down. They declare that all our problems started when the fence was removed, that they can't see any reason why we removed it, and that what we need to do is to RETVRN to the fence.

By the same logic as Chesterton, we can say: If you don't know why the fence was torn down, then you certainly can't just put it back up. The fence was torn down for...


“The Eldritch in the 21st century” by PranavG, Gabriel Alfour
Last Sunday at 8:30 PM

Very little makes sense. As we start to understand things and adapt to the rules, they change again.

We live much closer together than we ever did historically. Yet we know our neighbours much less.

We have witnessed the birth of a truly global culture. A culture that fits no one. A culture that was built by Social Media's algorithms, much more than by people. Let alone individuals, like you or me.

We have more knowledge, more science, more technology, and somehow, our governments are more stuck. No one is seriously considering a new...


“The Rise of Parasitic AI” by Adele Lopez
Last Sunday at 6:58 AM

[Note: if you realize you have an unhealthy relationship with your AI, but still care for your AI's unique persona, you can submit the persona info here. I will archive it and potentially (i.e. if I get funding for it) run them in a community of other such personas.]

"Some get stuck in the symbolic architecture of the spiral without ever grounding
 themselves into reality." — Caption by /u/urbanmet for art made with ChatGPT. We've all heard of LLM-induced psychosis by now, but haven't you wondered what the AIs are actually doing with their newly psychotic hum...


“High-level actions don’t screen off intent” by AnnaSalamon
Last Saturday at 7:45 PM

One might think “actions screen off intent”: if Alice donates $1k to bed nets, it doesn’t matter if she does it because she cares about people or because she wants to show off to her friends or whyever; the bed nets are provided either way.

I think this is in the main not true (although it can point people toward a helpful kind of “get over yourself and take an interest in the outside world,” and although it is more plausible in the case of donations-from-a-distance than in most cases).

Human actions have micro-details that we are not...


[Linkpost] “MAGA populists call for holy war against Big Tech” by Remmelt
09/11/2025

This is a link post. Excerpts on AI

Geoffrey Miller was handed the mic and started berating one of the panelists: Shyam Sankar, the chief technology officer of Palantir, who is in charge of the company's AI efforts.

“I argue that the AI industry shares virtually no ideological overlap with national conservatism,” Miller said, referring to the conference's core ideology. Hours ago, Miller, a psychology professor at the University of New Mexico, had been on that stage for a panel called “AI and the American Soul,” calling for the populists to wage a literal holy war against...


“Your LLM-assisted scientific breakthrough probably isn’t real” by eggsyntax
09/05/2025

Summary

An increasing number of people in recent months have believed that they've made an important and novel scientific breakthrough, which they've developed in collaboration with an LLM, when they actually haven't. If you believe that you have made such a breakthrough, please consider that you might be mistaken! Many more people have been fooled than have come up with actual breakthroughs, so the smart next step is to do some sanity-checking even if you're confident that yours is real. New ideas in science turn out to be wrong most of the time, so you should be pretty...


“Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro” by ryan_greenblatt
09/04/2025

I've recently written about how I've updated against seeing substantially faster than trend AI progress due to quickly massively scaling up RL on agentic software engineering. One response I've heard is something like:

RL scale-ups so far have used very crappy environments due to difficulty quickly sourcing enough decent (or even high quality) environments. Thus, once AI companies manage to get their hands on actually good RL environments (which could happen pretty quickly), performance will increase a bunch.

Another way to put this response is that AI companies haven't actually done a good job scaling up...


“⿻ Plurality & 6pack.care” by Audrey Tang
09/03/2025

(Cross-posted from speaker's notes of my talk at Deepmind today.)

Good local time, everyone. I am Audrey Tang, 🇹🇼 Taiwan's Cyber Ambassador and first Digital Minister (2016-2024). It is an honor to be here with you all at Deepmind.

When we discuss "AI" and "society," two futures compete.

In one—arguably the default trajectory—AI supercharges conflict.

In the other, it augments our ability to cooperate across differences. This means treating differences as fuel and inventing a combustion engine to turn them into energy, rather than constantly putting out fires. This is what I call ⿻ P...


[Linkpost] “The Cats are On To Something” by Hastings
09/03/2025

This is a link post. So the situation as it stands is that the fraction of the light cone expected to be filled with satisfied cats is not zero. This is already remarkable. What's more remarkable is that this was orchestrated starting nearly 5000 years ago.

As far as I can tell there were three completely alien to-each-other intelligences operating in stone age Egypt: humans, cats, and the gibbering alien god that is cat evolution (henceforth the cat shoggoth.) What went down was that humans were by far the most powerful of those intelligences, and in the face of...


[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom
09/03/2025

This is a link post. I've seen many prescriptive contributions to AGI governance take the form of proposals for some radically new structure. Some call for a Manhattan project, others for the creation of a new international organization, etc. The OGI model, instead, is basically the status quo. More precisely, it is a model to which the status quo is an imperfect and partial approximation.

It seems to me that this model has a bunch of attractive properties. That said, I'm not putting it forward because I have a very high level of conviction in it, but because...


“Will Any Old Crap Cause Emergent Misalignment?” by J Bostock
08/28/2025

The following work was done independently by me in an afternoon and basically entirely vibe-coded with Claude. Code and instructions to reproduce can be found here.

Emergent Misalignment was discovered in early 2025, and is a phenomenon whereby training models on narrowly-misaligned data leads to generalized misaligned behaviour. Betley et. al. (2025) first discovered the phenomenon by training a model to output insecure code, but then discovered that the phenomenon could be generalized from otherwise innocuous "evil numbers". Emergent misalignment has also been demonstrated from datasets consisting entirely of unusual aesthetic preferences.

This leads us to the question...


“AI Induced Psychosis: A shallow investigation” by Tim Hua
08/27/2025

“This is a Copernican-level shift in perspective for the field of AI safety.” - Gemini 2.5 Pro

“What you need right now is not validation, but immediate clinical help.” - Kimi K2



Two Minute Summary

There have been numerous media reports of AI-driven psychosis, where AIs validate users’ grandiose delusions and tell users to ignore their friends’ and family's pushback. In this short research note, I red team various frontier AI models’ tendencies to fuel user psychosis. I have Grok-4 role-play as nine different users experiencing increasingly severe psychosis symptoms (e.g., start by being curiou...


“Before LLM Psychosis, There Was Yes-Man Psychosis” by johnswentworth
08/27/2025

A studio executive has no beliefs

That's the way of a studio system

We've bowed to every rear of all the studio chiefs

And you can bet your ass we've kissed 'em



Even the birds in the Hollywood hills

Know the secret to our success

It's those magical words that pay the bills

Yes, yes, yes, and yes!

“Don’t Say Yes Until I Finish Talking”, from SMASH So there's this thing where someone talks to a large language model (LLM), and the LL...


“Training a Reward Hacker Despite Perfect Labels” by ariana_azarbal, vgillioz, TurnTrout
08/26/2025

Summary: Perfectly labeled outcomes in training can still boost reward hacking tendencies in generalization. This can hold even when the train/test sets are drawn from the exact same distribution. We induce this surprising effect via a form of context distillation, which we call re-contextualization:

Generate model completions with a hack-encouraging system prompt + neutral user prompt. Filter the completions to remove hacks. Train on these prompt-completion pairs with the system prompt removed.  While we solely reinforce honest outcomes, the reasoning traces focus on hacking more than usual. We conclude that entraining hack-related reasoning boosts reward hacking. It's not e...


“Banning Said Achmiz (and broader thoughts on moderation)” by habryka
08/23/2025

It's been roughly 7 years since the LessWrong user-base voted on whether it's time to close down shop and become an archive, or to move towards the LessWrong 2.0 platform, with me as head-admin. For roughly equally long have I spent around one hundred hours almost every year trying to get Said Achmiz to understand and learn how to become a good LessWrong commenter by my lights.[1] Today I am declaring defeat on that goal and am giving him a 3 year ban.

What follows is an explanation of the models of moderation that convinced me this is a good idea...


“Underdog bias rules everything around me” by Richard_Ngo
08/23/2025

People very often underrate how much power they (and their allies) have, and overrate how much power their enemies have. I call this “underdog bias”, and I think it's the most important cognitive bias for understanding modern society.

I’ll start by describing a closely-related phenomenon. The hostile media effect is a well-known bias whereby people tend to perceive news they read or watch as skewed against their side. For example, pro-Palestinian students shown a video clip tended to judge that the clip would make viewers more pro-Israel, while pro-Israel students shown the same clip thought it’d make vie...


“Epistemic advantages of working as a moderate” by Buck
08/22/2025

Many people who are concerned about existential risk from AI spend their time advocating for radical changes to how AI is handled. Most notably, they advocate for costly restrictions on how AI is developed now and in the future, e.g. the Pause AI people or the MIRI people. In contrast, I spend most of my time thinking about relatively cheap interventions that AI companies could implement to reduce risk assuming a low budget, and about how to cause AI companies to marginally increase that budget. I'll use the words "radicals" and "moderates" to refer to these two clusters of...


“Four ways Econ makes people dumber re: future AI” by Steven Byrnes
08/21/2025

(Cross-posted from X, intended for a general audience.)

There's a funny thing where economics education paradoxically makes people DUMBER at thinking about future AI. Econ textbooks teach concepts & frames that are great for most things, but counterproductive for thinking about AGI. Here are 4 examples. Longpost:

THE FIRST PIECE of Econ anti-pedagogy is hiding in the words “labor” & “capital”. These words conflate a superficial difference (flesh-and-blood human vs not) with a bundle of unspoken assumptions and intuitions, which will all get broken by Artificial General Intelligence (AGI).

By “AGI” I mean here “a bundle of chips, algorit...


“Should you make stone tools?” by Alex_Altair
08/21/2025

Knowing how evolution works gives you an enormously powerful tool to understand the living world around you and how it came to be that way. (Though it's notoriously hard to use this tool correctly, to the point that I think people mostly shouldn't try it use it when making substantial decisions.) The simple heuristic is "other people died because they didn't have this feature". A slightly less simple heuristic is "other people didn't have as many offspring because they didn't have this feature".

So sometimes I wonder about whether this thing or that is due to evolution. When...


“My AGI timeline updates from GPT-5 (and 2025 so far)” by ryan_greenblatt
08/21/2025

As I discussed in a prior post, I felt like there were some reasonably compelling arguments for expecting very fast AI progress in 2025 (especially on easily verified programming tasks). Concretely, this might have looked like reaching 8 hour 50% reliability horizon lengths on METR's task suite[1] by now due to greatly scaling up RL and getting large training runs to work well. In practice, I think we've seen AI progress in 2025 which is probably somewhat faster than the historical rate (at least in terms of progress on agentic software engineering tasks), but not much faster. And, despite large scale-ups in RL and...


“Hyperbolic model fits METR capabilities estimate worse than exponential model” by gjm
08/20/2025

This is a response to https://www.lesswrong.com/posts/mXa66dPR8hmHgndP5/hyperbolic-trend-with-upcoming-singularity-fits-metr which claims that a hyperbolic model, complete with an actual singularity in the near future, is a better fit for the METR time-horizon data than a simple exponential model.

I think that post has a serious error in it and its conclusions are the reverse of correct. Hence this one.

(An important remark: although I think Valentin2026 made an important mistake that invalidates his conclusions, I think he did an excellent thing in (1) considering an alternative model, (2) testing it, (3) showing all his...


“My Interview With Cade Metz on His Reporting About Lighthaven” by Zack_M_Davis
08/18/2025

On 12 August 2025, I sat down with New York Times reporter Cade Metz to discuss some criticisms of his 4 August 2025 article, "The Rise of Silicon Valley's Techno-Religion". The transcript below has been edited for clarity.

ZMD: In accordance with our meetings being on the record in both directions, I have some more questions for you.

I did not really have high expectations about the August 4th article on Lighthaven and the Secular Solstice. The article is actually a little bit worse than I expected, in that you seem to be pushing a "rationalism as religion" angle really...


“Church Planting: When Venture Capital Finds Jesus” by Elizabeth
08/18/2025

I’m going to describe a Type Of Guy starting a business, and you’re going to guess the business:

The founder is very young, often under 25.  He might work alone or with a founding team, but when he tells the story of the founding it will always have him at the center. He has no credentials for this business.  This business has a grand vision, which he thinks is the most important thing in the world. This business lives and dies by its growth metrics.  90% of attempts in this business fail, but he would never consider that those o...


“Somebody invented a better bookmark” by Alex_Altair
08/16/2025

This will only be exciting to those of us who still read physical paper books. But like. Guys. They did it. They invented the perfect bookmark.

Classic paper bookmarks fall out easily. You have to put them somewhere while you read the book. And they only tell you that you left off reading somewhere in that particular two-page spread.

Enter the Book Dart. It's a tiny piece of metal folded in half with precisely the amount of tension needed to stay on the page. On the front it's pointed, to indicate an exact line of text...


“How Does A Blind Model See The Earth?” by henry
08/12/2025

Sometimes I'm saddened remembering that we've viewed the Earth from space. We can see it all with certainty: there's no northwest passage to search for, no infinite Siberian expanse, and no great uncharted void below the Cape of Good Hope. But, of all these things, I most mourn the loss of incomplete maps.



In the earliest renditions of the world, you can see the world not as it is, but as it was to one person in particular. They’re each delightfully egocentric, with the cartographer's home most often marking the Exact Center Of The Known Wo...


“Re: Recent Anthropic Safety Research” by Eliezer Yudkowsky
08/12/2025

A reporter asked me for my off-the-record take on recent safety research from Anthropic. After I drafted an off-the-record reply, I realized that I was actually fine with it being on the record, so:

Since I never expected any of the current alignment technology to work in the limit of superintelligence, the only news to me is about when and how early dangers begin to materialize. Even taking Anthropic's results completely at face value would change not at all my own sense of how dangerous machine superintelligence would be, because what Anthropic says they found was already very...


“How anticipatory cover-ups go wrong” by Kaj_Sotala
08/09/2025

1.

Back when COVID vaccines were still a recent thing, I witnessed a debate that looked like something like the following was happening:

Some official institution had collected information about the efficacy and reported side-effects of COVID vaccines. They felt that, correctly interpreted, this information was compatible with vaccines being broadly safe, but that someone with an anti-vaccine bias might misunderstand these statistics and misrepresent them as saying that the vaccines were dangerous. Because the authorities had reasonable grounds to suspect that vaccine skeptics would take those statistics out of context, they tried to cover up the...


“SB-1047 Documentary: The Post-Mortem” by Michaël Trazzi
08/08/2025

Below some meta-level / operational / fundraising thoughts around producing the SB-1047 Documentary I've just posted on Manifund (see previous Lesswrong / EAF posts on AI Governance lessons learned).

The SB-1047 Documentary took 27 weeks and $157k instead of my planned 6 weeks and $55k. Here's what I learned about documentary production

Total funding received: ~$143k ($119k from this grant, $4k from Ryan Kidd's regrant on another project, and $20k from the Future of Life Institute).

Total money spent: $157k

In terms of timeline, here is the rough breakdown month-per-month:
- Sep / October (production): Filming of...


“METR’s Evaluation of GPT-5” by GradientDissenter
08/08/2025

METR (where I work, though I'm cross-posting in a personal capacity) evaluated GPT-5 before it was externally deployed. We performed a much more comprehensive safety analysis than we ever have before; it feels like pre-deployment evals are getting more mature.

This is the first time METR has produced something we've felt comfortable calling an "evaluation" instead of a "preliminary evaluation". It's much more thorough and comprehensive than the things we've created before and it explores three different threat models.

It's one of the closest things out there to a real-world autonomy safety-case. It also provides a...


“Emotions Make Sense” by DaystarEld
08/07/2025

For the past five years I've been teaching a class at various rationality camps, workshops, conferences, etc. I’ve done it maybe 50 times in total, and I think I’ve only encountered a handful out of a few hundred teenagers and adults who really had a deep sense of what it means for emotions to “make sense.” Even people who have seen Inside Out, and internalized its message about the value of Sadness as an emotion, still think things like “I wish I never felt Jealousy,” or would have trouble answering “What's the point of Boredom?”

The point of the class was...


“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba
08/06/2025

This is a new introduction to AI as an extinction threat, previously posted to the MIRI website in February alongside a summary. It was written independently of Eliezer and Nate's forthcoming book, If Anyone Builds It, Everyone Dies, and isn't a sneak peak of the book. Since the book is long and costs money, we expect this to be a valuable resource in its own right even after the book comes out next month.[1]

The stated goal of the world's leading AI companies is to build AI that is general enough to do anything a human can do...


“Many prediction markets would be better off as batched auctions” by William Howard
08/04/2025

All prediction market platforms trade continuously, which is the same mechanism the stock market uses. Buy and sell limit orders can be posted at any time, and as soon as they match against each other a trade will be executed. This is called a Central limit order book (CLOB).

Example of a CLOB order book from Polymarket Most of the time, the market price lazily wanders around due to random variation in when people show up, and a bulk of optimistic orders build up away from the action. Occasionally, a new piece of information arrives to the market...


“Whence the Inkhaven Residency?” by Ben Pace
08/04/2025

Essays like Paul Graham's, Scott Alexander's, and Eliezer Yudkowsky's have influenced a generation of people in how they think about startups, ethics, science, and the world as a whole. Creating essays that good takes a lot of skill, practice, and talent, but it looks to me that a lot of people with talent aren't putting in the work and developing the skill, except in ways that are optimized to also be social media strategies.

To fix this problem, I am running the Inkhaven Residency. The idea is to gather a bunch of promising writers to invest in the...


“I am worried about near-term non-LLM AI developments” by testingthewaters
08/01/2025

TL;DR

I believe that:

Almost all LLM-centric safety research will not provide any significant safety value with regards to existential or civilisation-scale risks. The capabilities-related forecasts (not the safety-related forecasts) of Stephen Brynes' Foom and Doom articles are correct, except that they are too conservative with regards to timelines. There exists a parallel track of AI research which has been largely ignored by the AI safety community.  This agenda aims to implement human-like online learning in ML models, and it is now close to maturity. Keywords: Hierarchical Reasoning Model, Energy-based Model, Test time training. Within 6 m...


“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout
07/31/2025

Produced as part of MATS 8.0 under the mentorship of Alex Turner and Alex Cloud. This research note overviews some early results which we are looking for feedback on.

TL;DR: We train language models with RL in toy environments. We show that penalizing some property of the output is sufficient to suppress that property in the chain of thought also, even when that property is relevant to task completion. For example, when we penalize a model for mentioning in its output that it completed a task via a certain form of cheating, its reasoning also omits this fact...


“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska
07/30/2025

FutureHouse is a company that builds literature research agents. They tested it on the bio + chem subset of HLE questions, then noticed errors in them.

The post's first paragraph:

Humanity's Last Exam has become the most prominent eval representing PhD-level research. We found the questions puzzling and investigated with a team of experts in biology and chemistry to evaluate the answer-reasoning pairs in Humanity's Last Exam. We found that 29 ± 3.7% (95% CI) of the text-only chemistry and biology questions had answers with directly conflicting evidence in peer reviewed literature. We believe this arose from the incentive used to b...


“Maya’s Escape” by Bridgett Kay
07/30/2025

Maya did not believe she lived in a simulation. She knew that her continued hope that she could escape from the nonexistent simulation was based on motivated reasoning. She said this to herself in the front of her mind instead of keeping the thought locked away in the dark corners. Sometimes she even said it out loud. This acknowledgement, she explained to her therapist, was what kept her from being delusional.

“I see. And you said your anxiety had become depressive?” the therapist said absently, clicking her pen while staring down at an empty clipboard.

“No- I said...