LessWrong (Curated & Popular)

40 Episodes
Subscribe

By: LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

"The Possessed Machines (summary)" by L Rudolf L
Yesterday at 2:15 PM

The Possessed Machines is one of the most important AI microsites. It was published anonymously by an ex- lab employee, and does not seem to have spread very far, likely at least partly due to this anonymity (e.g. there is no LessWrong discussion at the time I'm posting this). This post is my attempt to fix that.

I do not agree with everything in the piece, but I think cultural critiques of the "AGI uniparty" are vastly undersupplied and incredibly important in modeling & fixing the current trajectory.

The piece is a long but worthwhile analysis...


"Ada Palmer: Inventing the Renaissance" by Martin Sustrik
Last Wednesday at 11:45 PM

Papal election of 1492 For over a decade, Ada Palmer, a history professor at University of Chicago (and a science-fiction writer!), struggled to teach Machiavelli. “I kept changing my approach, trying new things: which texts, what combinations, expanding how many class sessions he got…” The problem, she explains, is that “Machiavelli doesn’t unpack his contemporary examples, he assumes that you lived through it and know, so sometimes he just says things like: Some princes don’t have to work to maintain their power, like the Duke of Ferrara, period end of chapter. He doesn’t explain, so modern readers can’t get it.”
<...


"AI found 12 of 12 OpenSSL zero-days (while curl cancelled its bug bounty)" by Stanislav Fort
Last Wednesday at 2:15 PM

This is a partial follow-up to AISLE discovered three new OpenSSL vulnerabilities from October 2025.

TL;DR: OpenSSL is among the most scrutinized and audited cryptographic libraries on the planet, underpinning encryption for most of the internet. They just announced 12 new zero-day vulnerabilities (meaning previously unknown to maintainers at time of disclosure). We at AISLE discovered all 12 using our AI system. This is a historically unusual count and the first real-world demonstration of AI-based cybersecurity at this scale. Meanwhile, curl just cancelled its bug bounty program due to a flood of AI-generated spam, even as we reported 5 genuine CVEs...


"Dario Amodei – The Adolescence of Technology" by habryka
Last Wednesday at 4:45 AM

Dario Amodei, CEO of Anthropic, has written a new essay on his thoughts on AI risk of various shapes. It seems worth reading, even if just for understanding what Anthropic is likely to do in the future.

Confronting and Overcoming the Risks of Powerful AI

There is a scene in the movie version of Carl Sagan's book Contact where the main character, an astronomer who has detected the first radio signal from an alien civilization, is being considered for the role of humanity's representative to meet the aliens. The international panel interviewing her asks, “If you co...


"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton
Last Tuesday at 4:45 PM

Audio note: this article contains 78 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

This post covers work done by several researchers at, visitors to and collaborators of ARC, including Zihao Chen, George Robinson, David Matolcsi, Jacob Stavrianos, Jiawei Li and Michael Sklar. Thanks to Aryan Bhatt, Gabriel Wu, Jiawei Li, Lee Sharkey, Victor Lecomte and Zihao Chen for comments.

In the wake of recent debate about pragmatic versus ambitious visions for mechanistic interpretability, ARC is sharing some models we've been studying...


"Does Pentagon Pizza Theory Work?" by rba
Last Tuesday at 12:15 PM

As soon as modern data analysis became a thing, the US government has had to deal with people trying to use open source data to uncover its secrets.

During the early Cold War days and America's hydrogen bomb testing, there was an enormous amount of speculation about how the bombs actually worked. All nuclear technology involves refinement and purification of large amounts of raw substances into chemically pure substances. Armen Alchian was an economist working at RAND and reasoned that any US company working in such raw materials and supplying the government would have made a killing leading...


"The inaugural Redwood Research podcast" by Buck, ryan_greenblatt
Last Tuesday at 12:15 AM

After five months of me (Buck) being slow at finishing up the editing on this, we’re finally putting out our inaugural Redwood Research podcast. I think it came out pretty well—we discussed a bunch of interesting and underdiscussed topics and I’m glad to have a public record of a bunch of stuff about our history. Tell your friends! Whether we do another one depends on how useful people find this one. You can watch on Youtube here, or as a Substack podcast.

Notes on editing the podcast with Claude Code

(Buck wrote this sectio...


"Canada Lost Its Measles Elimination Status Because We Don’t Have Enough Nurses Who Speak Low German" by jenn
Last Monday at 3:15 PM

This post was originally published on November 11th, 2025. I've been spending some time reworking and cleaning up the Inkhaven posts I'm most proud of, and completed the process for this one today.

Today, Canada officially lost its measles elimination status. Measles was previously declared eliminated in Canada in 1998, but countries lose that status after 12 months of continuous transmission.

Here are some articles about the the fact that we have lost our measles elimination status: CBC, BBC, New York Times, Toronto Life. You can see some chatter on Reddit about it if you're interested here.

...


"Deep learning as program synthesis" by Zach Furman
Last Saturday at 11:45 PM

Audio note: this article contains 73 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

Epistemic status: This post is a synthesis of ideas that are, in my experience, widespread among researchers at frontier labs and in mechanistic interpretability, but rarely written down comprehensively in one place - different communities tend to know different pieces of evidence. The core hypothesis - that deep learning is performing something like tractable program synthesis - is not original to me (even to me, the ideas are ~3 years old...


"Why I Transitioned: A Response" by marisa
Last Saturday at 12:30 AM

Fiora Sunshine's post, Why I Transitioned: A Case Study (the OP) articulates a valuable theory for why some MtFs transition.

If you are MtF and feel the post describes you, I believe you.

However, many statements from the post are wrong or overly broad.

My claims:

There is evidence of a biological basis for trans identity. Twin studies are a good way to see this.
  Fiora claims that trans people's apparent lack of introspective clarity may be evidence of deception. But trans people are incentivized not to attempt to share accurate a...


"Claude’s new constitution" by Zac Hatfield-Dodds
01/22/2026

Read the constitution. Previously: 'soul document' discussion here.

We're publishing a new constitution for our AI model, Claude. It's a detailed description of Anthropic's vision for Claude's values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be.

The constitution is a crucial part of our model training process, and its content directly shapes Claude's behavior. Training models is a difficult task, and Claude's outputs might not always adhere to the constitution's ideals. But we think that the way the new constitution...


[Linkpost] "“The first two weeks are the hardest”: my first digital declutter" by mingyuan
01/20/2026

This is a link post. It is unbearable to not be consuming. All through the house is nothing but silence. The need inside of me is not an ache, it is caustic, sour, the burning desire to be distracted, to be listening, watching, scrolling.

Some of the time I think I’m happy. I think this is very good. I go to the park and lie on a blanket in a sun with a book and a notebook. I watch the blades of grass and the kids and the dogs and the butterflies and I’m so happy to b...


"What Washington Says About AGI" by zroe1
01/20/2026

I spent a few hundred dollars on Anthropic API credits and let Claude individually research every current US congressperson's position on AI. This is a summary of my findings.

Disclaimer: Summarizing people's beliefs is hard and inherently subjective and noisy. Likewise, US politicians change their opinions on things constantly so it's hard to know what's up-to-date. Also, I vibe-coded a lot of this.

Methodology

I used Claude Sonnet 4.5 with web search to research every congressperson's public statements on AI, then used GPT-4o to score each politician on how "AGI-pilled" they are, how concerned...


"Precedents for the Unprecedented: Historical Analogies for Thirteen Artificial Superintelligence Risks" by James_Miller
01/19/2026

Since artificial superintelligence has never existed, claims that it poses a serious risk of global catastrophe can be easy to dismiss as fearmongering. Yet many of the specific worries about such systems are not free-floating fantasies but extensions of patterns we already see. This essay examines thirteen distinct ways artificial superintelligence could go wrong and, for each, pairs the abstract failure mode with concrete precedents where a similar pattern has already caused serious harm. By assembling a broad cross-domain catalog of such precedents, I aim to show that concerns about artificial superintelligence track recurring failure modes in our world.
<...


"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar
01/19/2026

Boaz Barak, Gabriel Wu, Jeremy Chen, Manas Joglekar

[Linkposting from the OpenAI alignment blog, where we post more speculative/technical/informal results and thoughts on safety and alignment.]


TL;DR We go into more details and some follow up results from our paper on confessions (see the original blog post). We give deeper analysis of the impact of training, as well as some preliminary comparisons to chain of thought monitoring.

We have recently published a new paper on confessions, along with an accompanying blog post. Here, we want to share with the research...


"Backyard cat fight shows Schelling points preexist language" by jchan
01/16/2026

Two cats fighting for control over my backyard appear to have settled on a particular chain-link fence as the delineation between their territories. This suggests that:

Animals are capable of recognizing Schelling points Therefore, Schelling points do not depend on language for their Schelling-ness Therefore, tacit bargaining should be understood not as a special case of bargaining where communication happens to be restricted, but rather as the norm from which the exceptional case of explicit bargaining is derived. Summary of cat situation

I don't have any pets, so my backyard is terra nullius according to Cat...


"How AI Is Learning to Think in Secret" by Nicholas Andresen
01/09/2026

On Thinkish, Neuralese, and the End of Readable Reasoning

In September 2025, researchers published the internal monologue of OpenAI's GPT-o3 as it decided to lie about scientific data. This is what it thought:

Pardon? This looks like someone had a stroke during a meeting they didn’t want to be in, but their hand kept taking notes.

That transcript comes from a recent paper published by researchers at Apollo Research and OpenAI on catching AI systems scheming. To understand what's happening here - and why one of the most sophisticated AI systems in the world is...


"On Owning Galaxies" by Simon Lermen
01/08/2026

It seems to be a real view held by serious people that your OpenAI shares will soon be tradable for moons and galaxies. This includes eminent thinkers like Dwarkesh Patel, Leopold Aschenbrenner, perhaps Scott Alexander and many more. According to them, property rights will survive an AI singularity event and soon economic growth is going to make it possible for individuals to own entire galaxies in exchange for some AI stocks. It follows that we should now seriously think through how we can equally distribute those galaxies and make sure that most humans will not end up as the UBI...


"AI Futures Timelines and Takeoff Model: Dec 2025 Update" by elifland, bhalstead, Alex Kastner, Daniel Kokotajlo
01/06/2026

We’ve significantly upgraded our timelines and takeoff models! It predicts when AIs will reach key capability milestones: for example, Automated Coder / AC (full automation of coding) and superintelligence / ASI (much better than the best humans at virtually all cognitive tasks). This post will briefly explain how the model works, present our timelines and takeoff forecasts, and compare it to our previous (AI 2027) models (spoiler: the AI Futures Model predicts about 3 years longer timelines to full coding automation than our previous model, mostly due to being less bullish on pre-full-automation AI R&D speedups).

If you’re interested in p...


"In My Misanthropy Era" by jenn
01/05/2026

For the past year I've been sinking into the Great Books via the Penguin Great Ideas series, because I wanted to be conversant in the Great Conversation. I am occasionally frustrated by this endeavour, but overall, it's been fun! I'm learning a lot about my civilization and the various curmudgeons that shaped it.

But one dismaying side effect is that it's also been quite empowering for my inner 13 year old edgelord. Did you know that before we invented woke, you were just allowed to be openly contemptuous of people?

Here's Schopenhauer on the common man:
<...


"2025 in AI predictions" by jessicata
01/03/2026

Past years: 2023 2024

Continuing a yearly tradition, I evaluate AI predictions from past years, and collect a convenience sample of AI predictions made this year. In terms of selection, I prefer selecting specific predictions, especially ones made about the near term, enabling faster evaluation.

Evaluated predictions made about 2025 in 2023, 2024, or 2025 mostly overestimate AI capabilities advances, although there's of course a selection effect (people making notable predictions about the near-term are more likely to believe AI will be impressive near-term).

As time goes on, "AGI" becomes a less useful term, so operationalizing predictions is especially important...


"Good if make prior after data instead of before" by dynomight
12/27/2025

They say you’re supposed to choose your prior in advance. That's why it's called a “prior”. First, you’re supposed to say say how plausible different things are, and then you update your beliefs based on what you see in the world.

For example, currently you are—I assume—trying to decide if you should stop reading this post and do something else with your life. If you’ve read this blog before, then lurking somewhere in your mind is some prior for how often my posts are good. For the sake of argument, let's say you think 25% of m...


"Measuring no CoT math time horizon (single forward pass)" by ryan_greenblatt
12/27/2025

A key risk factor for scheming (and misalignment more generally) is opaque reasoning ability.One proxy for this is how good AIs are at solving math problems immediately without any chain-of-thought (CoT) (as in, in a single forward pass).I've measured this on a dataset of easy math problems and used this to estimate 50% reliability no-CoT time horizon using the same methodology introduced in Measuring AI Ability to Complete Long Tasks (the METR time horizon paper).

Important caveat: To get human completion times, I ask Opus 4.5 (with thinking) to estimate how long it would take the median AIME...


"Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance" by ryan_greenblatt
12/23/2025

Prior results have shown that LLMs released before 2024 can't leverage 'filler tokens'—unrelated tokens prior to the model's final answer—to perform additional computation and improve performance.[1]I did an investigation on more recent models (e.g. Opus 4.5) and found that many recent LLMs improve substantially on math problems when given filler tokens.That is, I force the LLM to answer some math question immediately without being able to reason using Chain-of-Thought (CoT), but do give the LLM filter tokens (e.g., text like "Filler: 1 2 3 ...") before it has to answer.Giving Opus 4.5 filler tokens[2] boosts no-CoT performance from 45% to 51% (p=4e-7...


"Turning 20 in the probable pre-apocalypse" by Parv Mahajan
12/23/2025

Master version of this on https://parvmahajan.com/2025/12/21/turning-20.html

I turn 20 in January, and the world looks very strange. Probably, things will change very quickly. Maybe, one of those things is whether or not we’re still here.

This moment seems very fragile, and perhaps more than most moments will never happen again. I want to capture a little bit of what it feels like to be alive right now.

1. 

Everywhere around me there is this incredible sense of freefall and of grasping. I realize with excitement and horror that over a s...


"Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment" by Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam, andyk
12/23/2025

TL;DR

LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs helps them become more aligned. These alignment priors persist through post-training, providing alignment-in-depth. We recommend labs pretrain for alignment, just as they do for capabilities.

Website: alignmentpretraining.ai
Us: geodesicresearch.org | x.com/geodesresearch

Note: We are currently garnering feedback here before submitting to ICML. Any suggestions here or on our Google Doc are welcome! We will be releasing a revision on arXiv in the coming days. Folks who leave feedback...


"Dancing in a World of Horseradish" by lsusr
12/22/2025

Commercial airplane tickets are divided up into coach, business class, and first class. In 2014, Etihad introduced The Residence, a premium experience above first class. The Residence isn't very popular.

The reason The Residence isn't very popular is because of economics. A Residence flight is almost as expensive as a private charter jet. Private jets aren't just a little bit better than commercial flights. They're a totally different product. The airplane waits for you, and you don't have to go through TSA (or Un-American equivalent). The differences between flying coach and flying on The Residence are small compared to...


"Contradict my take on OpenPhil’s past AI beliefs" by Eliezer Yudkowsky
12/21/2025

At many points now, I've been asked in private for a critique of EA / EA's history / EA's impact and I have ad-libbed statements that I feel guilty about because they have not been subjected to EA critique and refutation. I need to write up my take and let you all try to shoot it down.

Before I can or should try to write up that take, I need to fact-check one of my take-central beliefs about how the last couple of decades have gone down. My belief is that the Open Philanthropy Project, EA generally, and Oxford EA...


"Opinionated Takes on Meetups Organizing" by jenn
12/21/2025

Screwtape, as the global ACX meetups czar, has to be reasonable and responsible in his advice giving for running meetups.

And the advice is great! It is unobjectionably great.

I am here to give you more objectionable advice, as another organizer who's run two weekend retreats and a cool hundred rationality meetups over the last two years. As the advice is objectionable (in that, I can see reasonable people disagreeing with it), please read with the appropriate amount of skepticism.

Don't do anything you find annoying

If any piece of advice on...


"How to game the METR plot" by shash42
12/21/2025

TL;DR: In 2025, we were in the 1-4 hour range, which has only 14 samples in METR's underlying data. The topic of each sample is public, making it easy to game METR horizon length measurements for a frontier lab, sometimes inadvertently. Finally, the “horizon length” under METR's assumptions might be adding little information beyond benchmark accuracy. None of this is to criticize METR—in research, its hard to be perfect on the first release. But I’m tired of what is being inferred from this plot, pls stop!

14 prompts ruled AI discourse in 2025

The METR horizon length plot was...


"Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers" by Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas, Owain_Evans
12/20/2025

TL;DR: We train LLMs to accept LLM neural activations as inputs and answer arbitrary questions about them in natural language. These Activation Oracles generalize far beyond their training distribution, for example uncovering misalignment or secret knowledge introduced via fine-tuning. Activation Oracles can be improved simply by scaling training data quantity and diversity.

The below is a reproduction of our X thread on this paper and the Anthropic Alignment blog post.

Thread

New paper:

We train Activation Oracles: LLMs that decode their own neural activations and answer questions about them in natural...


"Scientific breakthroughs of the year" by technicalities
12/17/2025


A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually didn’t link to the original source; they didn't use probabilities (or even report the sample size); they were usually credulous about preliminary findings (“...which species was it tested on?”); and they essentially never gave any sense of the magnitude or the baselines (“how much better is this treatment than the previous best?”). Speculative results were covered with the same credence as solid proofs. And highly technical fields like mathematics were rarely covered at all, regardless of their prac...


"A high integrity/epistemics political machine?" by Raemon
12/17/2025

I have goals that can only be reached via a powerful political machine. Probably a lot of other people around here share them. (Goals include “ensure no powerful dangerous AI get built”, “ensure governance of the US and world are broadly good / not decaying”, “have good civic discourse that plugs into said governance.”)

I think it’d be good if there was a powerful rationalist political machine to try to make those things happen. Unfortunately the naive ways of doing that would destroy the good things about the rationalist intellectual machine. This post lays out some thoughts on how to have a...


"How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)" by Kaj_Sotala
12/16/2025

How it started

I used to think that anything that LLMs said about having something like subjective experience or what it felt like on the inside was necessarily just a confabulated story. And there were several good reasons for this.

First, something that Peter Watts mentioned in an early blog post about LaMDa stuck with me, back when Blake Lemoine got convinced that LaMDa was conscious. Watts noted that LaMDa claimed not to have just emotions, but to have exactly the same emotions as humans did - and that it also claimed to meditate, despite no...


“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes
12/15/2025

Previous: 2024, 2022

“Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter.” –attributed to DL Moody[1]

1. Background & threat model

The main threat model I’m working to address is the same as it's been since I was hobby-blogging about AGI safety in 2019. Basically, I think that:

The “secret sauce” of human intelligence is a big uniform-ish learning algorithm centered around the cortex; This learning algorithm is different from and more powerful than LLMs; Nobody knows how it works today; Someone someday will either reverse-engineer this learning algorithm, o...


“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f
12/14/2025

This is the abstract and introduction of our new paper.

Links: 📜 Paper, 🐦 Twitter thread, 🌐 Project page, 💻 Code

Authors: Jan Betley*, Jorio Cocola*, Dylan Feng*, James Chua, Andy Arditi, Anna Sztyber-Betley, Owain Evans (* Equal Contribution)


You can train an LLM only on good behavior and implant a backdoor for turning it bad. How? Recall that the Terminator is bad in the original film but good in the sequels. Train an LLM to act well in the sequels. It'll be evil if told it's 1984.
Abstract

LLMs are useful because they generalize so well. But ca...


“Insights into Claude Opus 4.5 from Pokémon” by Julian Bradshaw
12/13/2025

Credit: Nano Banana, with some text provided. You may be surprised to learn that ClaudePlaysPokemon is still running today, and that Claude still hasn't beaten Pokémon Red, more than half a year after Google proudly announced that Gemini 2.5 Pro beat Pokémon Blue. Indeed, since then, Google and OpenAI models have gone on to beat the longer and more complex Pokémon Crystal, yet Claude has made no real progress on Red since Claude 3.7 Sonnet![1]

This is because ClaudePlaysPokemon is a purer test of LLM ability, thanks to its consistently simple agent harness and the relatively hands-off app...


“The funding conversation we left unfinished” by jenn
12/13/2025

People working in the AI industry are making stupid amounts of money, and word on the street is that Anthropic is going to have some sort of liquidity event soon (for example possibly IPOing sometime next year). A lot of people working in AI are familiar with EA, and are intending to direct donations our way (if they haven't started already). People are starting to discuss what this might mean for their own personal donations and for the ecosystem, and this is encouraging to see.

It also has me thinking about 2022. Immediately before the FTX collapse, we were...


“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck
12/11/2025

Highly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important questions we can ask.

Many people have proposed different answers. Some predict that powerful AIs will learn to intrinsically pursue reward. Others respond by saying reward is not the optimization target, and instead reward “chisels” a combination of context-dependent cognitive patterns into the AI. Some argue that powerful AIs might end up with an almost arbitrary long-term goal.

All of these hypotheses share an important justification: An AI with each motivation has highly fit...


“Little Echo” by Zvi
12/09/2025

I believe that we will win.

An echo of an old ad for the 2014 US men's World Cup team. It did not win.

I was in Berkeley for the 2025 Secular Solstice. We gather to sing and to reflect.

The night's theme was the opposite: ‘I don’t think we’re going to make it.’

As in: Sufficiently advanced AI is coming. We don’t know exactly when, or what form it will take, but it is probably coming. When it does, we, humanity, probably won’t make it. It's a live question. Could easily...