LessWrong (Curated & Popular)

40 Episodes
Subscribe

By: LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

"A Year Late, Claude Finally Beats Pokémon" by Julian Bradshaw
Yesterday at 6:30 AM

Credit: ClaudePlaysPokemon Elevator Shanty by Kurukkoo

Disclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however.

ClaudePlaysPokemon feat. Opus 4.7 has finally beaten Pokémon Red, fulfilling the challenge set over a year ago when LLMs playing Pokémon went briefly, slightly viral.

Victory Screen!

Let's get the throat-clearing out of the way: this doesn't make 4.7 a clear breakthrough in intelligence over 4.6 or 4.5. It's smarter, yes, as we'll discuss below, but not by something one could honestly call a bi...


"A relatively brief explanation of Boltzmann Brains" by Eliezer Yudkowsky
Yesterday at 2:45 AM

(Initially written for the LW Wiki, but then I realized it was looking more like a post instead.)

In 1895, the physicist Ignaz Robert Schütz, who worked as an assistant to the more eminent physicist Ludwig Boltzmann, wondered if our observed universe had simply assembled by a random fluctuation of order from a universe otherwise in thermal equilibrium. The idea was published by Boltzmann in 1896, properly credited to Schütz, and has been associated with Boltzmann ever since.

The obvious objection to this scenario is credited to Arthur Eddington in 1931: If all order is due to ra...


"Automated Alignment is Harder Than You Think" by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving
Last Sunday at 5:15 AM

Summary

This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here.

AI research agents may help solve ASI alignment, for example via the following plan:

Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually automating more of the research processHand over primary research responsibility once agents outperform humans at all relevant...


"MATS 9 Retrospective & Advice" by beyarkay
Last Sunday at 5:15 AM

I couldn’t find a recent write-up from a MATS alum about what attending MATS was like, so this is the thing that I wish I had. I attended MATS from January to March 2026, on Team Shard with Alex Turner and Alex Cloud. It was a great time! Applications for MATS are basically on a rolling basis nowadays, and I can strongly recommend applying (to multiple streams) even if you think you’re not a great match.

With that being said, there's a lot I wish I knew going into MATS, so here's a brain-dump of thoughts. It's not...


"The primary sources of near-term cybersecurity risk" by lc
Last Saturday at 2:45 PM

[Some ideas here were developed in conversation with Chris Hacking (real name)]

I have tried and failed to write a longer post many times, so here goes a short one with little detail.

Discourse has primarily focused on models' ability to develop new exploits against important software from scratch. That capability is impressive, but the tech industry has been dealing with people regularly finding 0-day exploits for important pieces of software for more than twenty years. Having to patch these vulnerabilities at a 10xed or even 100xed cadence for six months is annoying, but well within...


"The Owned Ones" by Eliezer Yudkowsky
05/12/2026

(An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I perhaps do not fully understand or agree with the reasoning behind this request, but it costs me little to fulfill and so I shall. -- Yudkowsky)


And another day came when the Ships of Humanity, going from star to star, found Sapience.

The Humans discovered a world of two species: where the Owners lazed or worked or slept, and the Owned Ones only worked.

...


"The Iliad Intensive Course Materials" by Leon Lang, David Udell, Alexander Gietelink Oldenziel
05/12/2026

We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer science, and the materials reflect that: they include mathematical exercises with solutions, self-contained lecture notes on topics like singular learning theory and data attribution, and coding problems, at a depth that is unmatched for many of the topics we cover. Around 20 contributors (listed further below) were involved in developing these materials for the April 2026 cohort of the Iliad Intensive.

By sharing...


"The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be" by Elias Schmied
05/12/2026

Crossposted from Substack and the EA Forum.




A common argument for optimism about the future is that living conditions have improved a lot in the past few hundred years, billions of people have been lifted out of poverty, and so on. It's a very strong, grounding piece of evidence - probably the best we have in figuring out what our foundational beliefs about the world should be.

However, I now think it's a lot less powerful than I once did.




Let's take a Darwinian perspective - entities that...


"What I did in the hedonium shockwave, by Emma, age six and a half" by ozymandias
05/11/2026

My name is Emma and I’m six and a half years old and I like pink and Pokemon and my cat River and I’m going to be swallowed by a hedonium shockwave soon, except you already know that about me because everyone else is too.

“Hedonium shockwave” means that everyone is going to be happy forever. Not just all the humans but all the animals and the flowers and the ground and River too. It has already made a bunch of the stars happy, like Betelgeuse and Alpha Centauri.

Scientists saw that the stars were bli...


"Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis" by Linch
05/10/2026

Here's a dynamic I’ve seen at least a dozen times:




Alice: Man that article has a very inaccurate/misleading/horrifying headline.

Bob: Did you know, *actually* article writers don't write their own headlines?



But what I care about is the misleading headline, not your org chart

__

Another example I’ve encountered recently is (anonymizing) when a friend complained about a prosaic safety problem at a major AI company that went unfixed for multiple months. Someone else with background information “usefully” chimed in with a long expla...


"x-risk-themed" by kave
05/09/2026

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who's hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role.

Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that's not the focus. In those conver...


"Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks
05/08/2026

Abstract

We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstruction, the resulting NLA explanations read as plausible interpretations of model internals that, according to our quantitative evaluations, grow more informative over training.

We apply NLAs...


[Linkpost] "Interpreting Language Model Parameters" by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey
05/07/2026

This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it.

VPD greatly improves on our previous techniques, Stochastic Parameter Decomposition (SPD) and Attribution-based Parameter Decomposition (APD). We think the parameter decomposition approach is now more-or-less ready to be applied at scale to models people care about.







Importantly, we show that we can decompose attention layers, which interp methods like transcoders and SAEs have...


"It’s nice of you to worry about me, but I really do have a life" by Viliam
05/05/2026

I have two shameful secrets that I probably shouldn't talk about online:

I love my family.I enjoy my hobbies. "What an idiot!" you probably think. "Doesn't he realize that at his next job interview, HR will probably use an AI that can match his online writing based on a short sample of written text, and when they ask 'hey AI, is this guy really 100% devoted to his job, and does he spend his entire days and nights thinking about how to make his boss more rich?', the AI will laugh and print: 'beep-boop, negative, mwa-ha-ha-ha'."
<...


"Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI" by Eliezer Yudkowsky
05/05/2026

Example 1: The Viking 1 lander

In the 1970s, NASA sent a pair of probes to Mars, Viking 1 and Viking 2 missions, at a total cost of 1 billion dollars[1970], equivalent to about 7 billion dollars[2025]. The Viking 1 probe operated on Mars's surface for six years, before its battery began to seriously degrade.

One might have thought a battery problem like that would spell the irrevocable end of the mission. The probe had already launched and was now on Mars, very far away and out of reach of any human technician's fixing fingers. Was it not inevitable, then, that if any...


"Dairy cows make their misery expensive (but their calves can’t)" by Elizabeth
05/05/2026

How much do cows suffer in the production of milk? I can’t answer that; understanding animal experience is hard. But I can at least provide some facts about the conditions dairy cows live in, which might be useful to you in making your own assessment. My biggest conclusion is that cows made better choices than chickens by making their misery financially costly to farmers.

Life Cycle

The life of a dairy cow starts as a calf. She is typically separated from her mother a few hours to a few days after birth and, to reduce di...


"Takes from two months as an aspiring LLM naturalist" by AnnaSalamon
05/04/2026

I spent my last two months playing around with LLMs. I’m a beginner, bumbling and incorrect, but I want to share some takes anyhow.[1]

Take 1. Everything with computers is so so much easier than it was a year ago. 

This puts much “playing with LLMs” stuff within my very short attention span. This has felt empowering and fun; 10/10 would recommend.

There's a details box here with the title "Detail:". The box contents are omitted from this narration. Take 2. There's somebody home[2] inside an LLM. And if you play around while caring and being curious...


"Intelligence Dissolves Privacy" by Vaniver
05/02/2026

The future is going to be different from the present. Let's think about how.

Specifically, our expectations about what's reasonable are downstream of our past experiences, and those experiences were downstream of our options (and the options other people in our society had). As those options change, so too our experiences, and our expectations of what's reasonable. I once thought it was reasonable to pick up the phone and call someone, and to pick up my phone when it rang; things have changed, and someone thinking about what's possible could have seen it coming. So let's try to...


"How Go Players Disempower Themselves to AI" by Ashe Vazquez Nuñez
05/02/2026

Written as part of the MATS 9.1 extension program, mentored by Richard Ngo.

From March 9th to 15th 2016, Go players around the world stayed up to watch their game fall to AI. Google DeepMind's AlphaGo defeated Lee Sedol, commonly understood to be the world's strongest player at the time, with a convincing 4-1 score.

This event “rocked” the Go world, but its impact on the culture was initially unclear. In Chess, for instance, computers have not meaningfully automated away human jobs. Human Chess flourished as a pseudo-Esport in the internet era whereas the yearly Computer Chess Championship is f...


"On today’s panel with Bernie Sanders" by David Scott Krueger
05/01/2026

It's sort of easy to forget how close Bernie Sanders was to becoming the most powerful person in the world. The world we live in feels so much not like that place.

I’m in Washington DC for the next week, and I’ve just finished a public appearance with Senator Sanders (should I call him Bernie? Or Sanders? or…) You won’t often see me so dressed up and polished. But this is important!

There are politicians who have principles and character, who really believe in doing what's right. I think you have to respect them whe...


"Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”" by LawrenceC
04/29/2026

(Fragments from a research paper that will never be written)

Extended Abstract.

The frontier AI developers are becoming increasingly powerful and wealthy, significantly increasing their potential for risks. One concern is that of executive misalignment: when the CEO has different incentives and goals than that of the board of directors, or of humanity as a whole. Our work proposes three different threat models, under which executive misalignment can lead to concrete harm.

We perform two evaluations to understand the capabilities and propensities of current humans in relation to executive misalignment: First, we developed a...


"llm assistant personas seem increasingly incoherent (some subjective observations)" by nostalgebraist
04/29/2026

(This was originally going to be a "quick take" but then it got a bit long. Just FYI.)

There's this weird trend I perceive with the personas of LLM assistants over time. It feels like they're getting less "coherent" in a certain sense, even as the models get more capable.

When I read samples from older chat-tuned models, it's striking how "mode-collapsed" they feel relative to recent models like Claude Opus 4.6 or GPT-5.4.[1]

This is most straightforwardly obvious when it comes to textual style and structure: outputs from older models feel more templated and...


"LessWrong Shows You Social Signals Before the Comment" by TurnTrout
04/28/2026

When reading comments, you see is what other people think before reading the comment. As shown in an RCT, that information anchors your opinion, reducing your ability to form your own opinion and making the site's karma rankings less related to the comment's true value. I think the problem is fixable and float some ideas for consideration.

The LessWrong interface prioritizes social information

You read a comment. What information is presented, and in what order?

The order of information:

Who wrote the comment (in bold);How much other people like this comment...


"Update on the Alex Bores campaign" by Eric Neyman
04/27/2026

In October, I wrote a post arguing that donating to Alex Bores's campaign for Congress was among the most cost-effective opportunities that I'd ever encountered.

(A bit of context: Bores is a state legislator in New York who championed the RAISE Act, which was signed into law last December.[1] He's now running for Congress in New York's 12th Congressional district, which runs from about 17th Street to 100th Street in Manhattan. If elected to Congress, I think he'd be a strong champion for AI safety legislation, with a focus on catastrophic and existential risk.)

It's been...


"Community misconduct disputes are not about facts" by mingyuan
04/27/2026

In criminal law, the prosecution and the defense each try to establish a timeline — what happened, where, when, who was involved — and thereby determine whether the defendant is actually guilty of a crime.[1]

Community misconduct disputes are nothing like this.

There is only rarely disagreement over facts, and even when there is, it is not the crux of the matter. Community disputes are not for litigating facts. What they are for[2] is litigating three things:

The character of the accusedThe character of the accuserThe importance of the accusation, in light of points 1 & 2 I think basi...


"The paper that killed deep learning theory" by LawrenceC
04/27/2026

Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al. 's aptly titled Understanding deep learning requires rethinking generalization.

Of course, this is a bit of an exaggeration. No single paper ever kills a field of research on its own, and deep learning theory was not exactly the most productive and healthy field at the time this was published. But if I had to point to a single paper that shattered the feeling of optimism at the time, it would be Zhang et al. 2016.[1]

Caption: believe it or not...


"Forecasting is Way Overrated, and We Should Stop Funding It" by mabramov
04/26/2026

Summary

EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn’t proven very useful, yet it continues to receive substantial EA funding. We should cut it off.

My Experience with Forecasting

For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite quitting, I’m still #8 on the platform. Additionally, I have done well on real-money prediction markets (Polymarket), earning mid-5 figures and winning a few AI b...


"Your Supplies Probably Won’t Be Stolen in a Disaster" by jefftk
04/24/2026

When I write about things like storing food or medication in case of disaster, one common response I get is that it doesn't matter: society will break down, and people who are stronger than you will take your stuff. This seemed plausible at first, but it's actually way off.

Looking at past disasters, people mostly fall somewhere on a "kind and supportive" to "keep to themselves" spectrum. When there is looting it's typically directed at stores, not homes, and violence is mostly in the streets. Having supplies at home lets you stay out of the way.

...


"10 posts I don’t have time to write" by habryka
04/23/2026

I am a busy man and will die knowing I have not said all I wanted to say. But maybe I can at least leave some IOUs behind.




1) Blatant conflicts are the best kind

Ben Hoffman's "Blatant Lies are the Best Kind!" is maybe the best post title followed by the least clarifying post I have ever encountered. The title is honestly amazing, but the text of the post, instead of a straightforward argument that the title promises, is an extremely dense and almost meta-fictional dialogue about the title:

I think...


"$50 million a year for a 10% chance to ban ASI" by Andrea_Miotti, Alex Amadori, Gabriel Alfour
04/22/2026

ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development.

We're working to make this happen through what we believe is the most natural and promising approach: helping decision-makers in governments and the public understand the risks and take action.

We believe that ControlAI can achieve an international prohibition on ASI development if scaled sufficiently. We estimate that it would take approximately a $50 million yearly budget in funding to give us a concrete chance at achieving this...


"Evil is bad, actually (Vassar and Olivia Schaefer callout post)" by plex
04/21/2026

Micheal Vassar's strategy for saving the world is horrifyingly counterproductive. Olivia's is worse.

A note before we start: A lot of the sources cited are people who ended up looking kinda insane. This is not a coincidence, it's apparently an explicit strategy: Apply plausibly-deniable psychological pressure to anyone who might speak up until they crack and discredit themselves by sounding crazy or taking extreme and destructive actions. Here's Brent Dill explaining it:




(later in the conversation he tries to encourage the person he's talking to kill herself, and threatens her death if she...


"10 non-boring ways I’ve used AI in the last month" by habryka
04/21/2026

I use AI assistance for basically all of my work, for many hours, every day. My colleagues do the same. Recent surveys suggest >50% of Americans have used AI to help with their work in the last week. My architect recently started sending me emails that were clearly ChatGPT generated.[1]

Despite that, I know surprisingly little about how other people use AI assitance. Or at least how people who aren't weird AI-influencers sharing their marketing courses on Twitter or LinkedIn use AI. So here is a list of 10 concrete times I have used AI in some at least mildly...


"Feel like a room has bad vibes? The lighting is probably too “spiky” or too blue" by habryka
04/21/2026

I have now had a few years of experience doing architectural and interior design for many spaces that people seem to really love (most widely known Lighthaven, but before that we also had the Lightcone Offices, though I've also played a hand in designing some of the most popular areas at Constellation a few years back).

Most people (including me a few years back) have surprisingly bad introspective access into why a room makes them feel certain things. Most of the time, people's ability to describe the effect of a space on them is as shallow as "this...


"Quality Matters Most When Stakes are Highest" by LawrenceC
04/20/2026

Or, the end of the world is no excuse for sloppy work

One morning when I was nine, my dad called me over to his computer. He wanted to show me this amazing Korean scientist who had managed to clone stem cells, and who was developing treatments to let people with spinal cord injuries – people like my dad – walk again on their own two legs.

I don't remember exactly what he said next, or what I said back. I have a sense that I was excited too, and that I was upset when I learned the Unit...


"Reevaluating AGI Ruin in 2026" by lc
04/20/2026

It's been about four years since Eliezer Yudkowsky published AGI Ruin: A List of Lethalities, a 43-point list of reasons the default outcome from building AGI is everyone dying. A week later, Paul Christiano replied with Where I Agree and Disagree with Eliezer, signing on to about half the list and pushing back on most of the rest.

For people who were young and not in the bay area, like me, these essays were probably more significant than old timers would expect. Before it became completely consumed with AI discussions, LessWrong was a forum about the art of...


"Having OCD is like living in North Korea (Here’s how I escaped)" by Declan Molony
04/19/2026

[Author's note: this post is the narrative version that explains my journey with OCD and how I treated it. The short version provides quick, actionable advice for treating OCD.]

The following is the most painful experience I've ever had.

Four years ago in the parking lot of my rock climbing gym…

…my heart was pumping out of my chest, I was sweating profusely, and an overwhelming sense of panic and impending doom had a vice grip on my soul. A painful death was surely imminent. I felt like I was defusing a bomb that was...


"There are only four skills: design, technical, management and physical" by habryka
04/19/2026

Epistemic status: Completely schizo galaxy-brained theory

Lightcone[1] operates on a "generalist" philosophy. Most of our full-time staff have the title "generalist", and in any given year they work on a wide variety of tasks — from software development on the LessWrong codebase to fixing an overflowing toilet at Lighthaven, our 30,000 sq. ft. campus.

One of our core rules is that you should not delegate a task you don't know how to perform yourself. This is a very intense rule and has lots of implications about how we operate, so I've spent a lot of time watching people le...


"Meaningful Questions Have Return Types" by Drake Morrison
04/19/2026

One way intellectual progress stalls is when you are asking the Wrong Questions. Your question is nonsensical, or cuts against the way reality works. Sometimes you can avoid this by learning more about how the world works, which implicitly answers some question you had, but if you want to make real progress you have to develop the skill of Righting a Wrong Question. This is a classic, old-school rationalist idea. The standard examples are asking about determinism, or free will, or consciousness. The standard fix is to go meta. Ask yourself, "Why do I feel like I have free will"...


"Carpathia Day" by Drake Morrison
04/18/2026

(The better telling is here. Seriously you should go read it. I've heard this story told in rationalist circles, but there wasn't a post on LessWrong, so I made one)

Today is April 15th, Carpathia Day. Take a moment to put forth an unreasonable effort to save a little piece of your world, when no one would fault you for doing less.

In the early morning of April 15, the RMS Titanic began to sink with more than two thousand souls on board.

Over 58 nautical miles away — too far to make it in time — sailed the...


"Let goodness conquer all that it can defend" by habryka
04/18/2026

Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post, so today we are re-inventing modernism.

In my post yesterday, I said:

Maybe the most important way ambitious, smart, and wise people leave the world worse off than they found it is by seeing correctly how some part of the world is broken and unifying various powers under a banner to fix that problem — only for the thing they have built to slip from their grasp and, in its collapse, destroy much more than anything previously could have.

I th...