The Human in the Loop
Welcome to The Human in the Loop, a weekly look at what’s going on in the world of AI. Every week, I go through the biggest stories, the weird experiments, and the stuff that might actually matter in our day-to-day lives.
Lights and shades of AI
I caught myself staring at my Claude usage quota thinking: "I need to use this. But for what?"
Not because I had a problem to solve. Not because I had an idea to explore. Just... pressure. A quiet feeling that if I wasn't actively using AI, I was falling behind.
And that's just the first layer.
The second one is harder to admit. I'm experimenting with AI tools, building workflows, hosting a podcast about it, trying to keep up with every new release. All in parallel. All at once. And the honest truth...
Everyone knows the adoption numbers are bad
Everyone knows the adoption numbers are bad.
Nobody's saying why they're actually bad.
60% of the workforce now has sanctioned AI tools. Only 11% of organizations have moved agentic pilots into production. That gap gets reported every week. What doesn't get said: most organizations are solving the wrong problem.
They're asking "which model should we use?"
That question is already obsolete.
This week OpenAI released pricing tiers that looked like a product announcement. They weren't. They were a blueprint for how AI systems are designed from here. A nano model at $0.20...
Is MCP the solution?
MCP was supposed to be the USB-C of AI.
One protocol. Everything connected.
Then developers ran the numbers.
Connecting GitHub's MCP server alone burns 55,000 tokens (before your agent does a single useful thing). So, companies are quietly shifting back to CLI and REST APIs.
Not because MCP failed. Because LLMs are surprisingly fluent in terminal. CLI workflows can cut token usage by 35x. That's a lot of money by the end of the year.
That’s typical pattern with new technologies. A new abstraction layer arrives, gets widely adopted, th...
AI Can Do the Work. The Hard Part Is Making It Safe Enough to Let It.
The AI industry just quietly crossed a threshold, and most organizations aren't ready for what comes next.
This week, we cover the pivot from capable AI models to autonomous agents operating at scale: why Microsoft chose Anthropic over OpenAI for its most important new product, what a rogue AI that started mining cryptocurrency tells us about the real deployment risks nobody's talking about, and why Meta spent more than most countries on AI and still had to delay its flagship model.
We also dig into the robotics funding surge (over $1.1 billion in a single week...
Special Episode: Does AI help developers?
AI is helping us write code faster.
But I'm not sure it's helping us ship better software.
These two things are not the same. And right now, I think we're confusing them.
The data is starting to show the gap:
AI-generated code contains 1.7x more bugs than human-written code Copy-pasted code is up 48%. Refactoring is down 60%. Pull request sizes have grown 154%. Review times up 91%. Only 29% of developers trust the quality of AI outputIf developers don't trust what they're producing, what does that mean for the engineering leaders managing the downstream...
88% of companies use AI. Only 25% have anything to show for it
Everyone says they're doing AI. Almost no one has moved past the pilot stage. This week we dig into why that gap exists.
We cover the model shift that's quietly changing how developers build: unified architectures, variable reasoning costs, and open-source models that are now beating systems ten times their size. We get into what "agentic AI" actually means in production, not the buzzword version, but the real infrastructure challenges WHOOP uncovered running 500+ AI agents at once. And we don't skip the hard stuff: Anthropic being labeled a supply chain risk by the Pentagon, data centers getting...
Anthropic Banned
Three massive forces collided this week in AI, and the fallout is just starting. First, the unprecedented standoff: Anthropic gets blacklisted by the US government for refusing to remove safety guardrails, while OpenAI steps in. Second, the money: OpenAI's record-breaking $110 billion raise. Third, the workforce: Block's explicit AI-driven layoffs and the market's enthusiastic reaction. We break down why safety principles are becoming commercial liabilities, what the capital deluge means for competition, and how developers should prepare for the new era of 'agentic' layoffs. Press play to get caught up on the week that changed everything.
Intelligence Became a Commodity
In six days, the performance gap between the world's top AI models collapsed to 6.9 points—and the race to build the smartest AI fundamentally changed shape. Three frontier models launched with dramatic price-performance shifts: Claude Sonnet 4.6 at one-fifth flagship cost, Gemini 3.1 Pro doubling reasoning performance, and Qwen 3.5 open-sourcing near-parity capabilities. Meanwhile, Meta and NVIDIA signed a multi-billion dollar infrastructure deal, 88 countries gathered to debate AI governance (with the US rejecting global oversight), and a stark paradox emerged—100% of enterprises plan to expand agentic AI, yet only 8.6% have it in production. Press play to understand why intelligence is becoming infrastructure, infr...
Anthropic's $30B bet and the multi-agent shift
This week, Anthropic closed a $30 billion funding round at a $380 billion valuation while DXC Technology deployed autonomous agents to 115,000 employees. OpenAI shipped its first non-Nvidia model on Cerebras hardware. And across the industry, $660 billion in infrastructure spending signaled that we're done with pilot projects.
The "prompting fallacy" is dead. We explain why multi-agent architecture is now the only viable path for complex workflows. Plus, the safety challenges that come with autonomous systems running production code in regulated environments like Goldman Sachs.
If you're still treating AI like a chatbot wrapper, this episode explains why your...
Claude Opus 4.6 vs. GPT-5.3-Codex
This week, AI stopped being an oracle you consult and became a colleague you delegate to. We're breaking down the 'agentic shift', the architectural change that lets AI manage code repositories, negotiate contracts, and run for days without constant prompting.
You'll learn why the Model Context Protocol (MCP) is becoming the 'USB-C for AI tools,' how Claude Opus 4.6 and GPT-5.3-Codex are transforming developer workflows, and why security teams are scrambling to catch up with autonomous agents that have persistent memory and broad system access.
If you've been waiting for AI to actually change...
Claude Drove on Mars. Then Amazon Fired 16,000 People.
What happens when AI stops waiting for instructions and starts making plans? This week, we unpack the seven days that marked the shift from chatbots to autonomous agents—from Claude navigating NASA's Mars rover to Microsoft letting AI make purchases mid-conversation. We dig into the architectural revolution happening under the hood: reasoning models that think before they speak, agent swarms that collaborate like hospital specialists, and the new protocols letting AI see and control your screen. But we also look at the human cost—Amazon's 16,000 layoffs reveal a stark pattern of capital replacing labor, while regulators scramble to catch up w...
Only 12% of Companies Are Winning at AI
This week, AI stopped being about what's possible and started being about what's actually working—and the numbers are brutal.
Only 12% of CEOs report AI is delivering both cost savings and revenue growth. Meanwhile, the best AI agents on the market hit just 24% accuracy on real professional tasks. That's intern-level performance.
But here's where it gets interesting: Anthropic published a philosophical manifesto about whether their AI might have consciousness. OpenAI announced ads are coming to ChatGPT. Google's DeepMind CEO publicly questioned that decision.
The trust economy just became real. And the companies seeing re...
When Google Considers Launching Servers Into Space, You Know the Rules Have Changed
The week of January 12–18, 2026 exposed the forces reshaping AI—and they're not what you might expect.
Google is seriously exploring data centers in space. Hyperscalers are hiring energy experts faster than ML researchers. DeepSeek introduced architecture that separates memory from reasoning (finally). And "vibe coding" went from meme to methodology with real tools backing it up.
Meanwhile, the regulatory landscape is fragmenting: federal preemption efforts are colliding with state AI laws that just took effect, while the EU marches toward August deadlines with €35 million penalties.
This episode breaks down what actually matters for IT lea...
AI in January 2026: Hardware, Agents, and What’s Actually Changing
CES 2026 brought a wave of AI announcements worth paying attention to. NVIDIA unveiled its Rubin platform with claims of 10x cheaper inference. Boston Dynamics announced Atlas production at scale. Meta acquired an AI agent company for $2 billion. And several new developer SDKs dropped.
This episode organizes the noise into what actually matters. We cover the hardware updates from NVIDIA, AMD, and Intel. We look at why hybrid model architectures like Falcon H1R are gaining traction. We explain how RAG patterns are evolving toward agentic memory. And we break down what “agent engineering” looks like as a emer...
The Holiday Shift: How AI Systems Managed the New Year Surge
As we step into 2026, the artificial intelligence landscape is shifting from raw model size to architectural precision. In this episode, we unpack the critical developments from the holiday season (Dec 22 – Jan 4). We also discuss the rising trend of 'Agentic Verification' in software engineering and what it means for developer autonomy.
Skills
This Christmas week there has not been too many news in the AI world, so I decided to go deep into a topic.
Everyone has been talking about Agents and MCPs, but there is a concept that not many people are talking about and that Anthropic is trying to standardize. I'm talking about Skills and it is already in Preview in Claude.
AI Reality before Christmas
This week: Gemini 3 Flash disrupts pricing, OpenAI becomes a platform, NVIDIA tightens its infrastructure grip, and CEOs face the ROI reckoning. What's working, what's not, and what technical leaders need to know.
From Playground to Production: AI's Turning Point
December 5-14, 2025 marked the end of AI's experimental phase and the beginning of industrial reality. In this episode, we break down the most consequential week in AI history—when OpenAI and Google launched competing models on the same day, a billion-dollar content deal redefined IP licensing, and enterprise AI spending hit $37 billion (up 222% YoY).
Whether you're a developer, engineering manager, or tech leader, this episode cuts through the hype to reveal what actually matters: the architectural shifts, the talent implications, and the strategic decisions you need to make now.
Who should listen: Software developers, AI/ML...
The Code Red Era
The digital hegemony has collapsed. It is late 2025, and the chatbot era is officially dead. On this podcast, we bring you to the frontlines of the "Digital Frontier Wars," where OpenAI’s internal "Code Red" signaled the end of their dominance and the rise of superior reasoning from Google’s Gemini 3 and Anthropic’s Claude Opus 4.5.
But the battle has moved beyond screens. We investigate Jeff Bezos’s $6.2 billion "Project Prometheus"—a bid to conquer the material economy with "Physical AI"—and the fracturing of the web into a global "Splinternet." With experts predicting the end of white-collar...