The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org

How Anthropic Built Claude 3 for Enterprise Reliability

#62

Today at 7:20 AM

In this episode of The CTO Podcast, Lucas and Luna dive into the technical architecture behind Anthropic's Claude 3, focusing on how the team designed for enterprise-grade reliability at scale. They discuss concrete decisions like sharding policy across GPU clusters, the introduction of 'constitutional AI' as a guardrail system, and how Anthropic's engineering org balances fast iteration with rigorous safety testing. The episode also covers the trade-offs between model size and inference cost, and what other CTOs can learn about building fault-tolerant AI products. A must-listen for technical leaders navigating the generative AI landscape.

#Anthropic #Claude3 #EnterpriseReliability #GenerativeAI...

How Spotify Rebuilt Its Recommender System for 600 Million Users

#61

Yesterday at 7:36 PM

In this episode of The CTO Podcast, Lucas and Luna dive into how Spotify rebuilt its core recommender engine from a batch-based collaborative filtering system to a real-time graph neural network serving 600 million users. They explore the specific architectural decisions behind Spotify's migration from Apache Spark and nightly model retraining to a streaming pipeline with TensorFlow and graph embeddings. Lucas explains why the team chose to model user listening sessions as dynamic graphs, how they reduced cold-start latency from hours to under 30 seconds, and the trade-offs they made in compute cost versus recommendation freshness. Luna presses on the practical...

How Netflix Rebuilt Its CDN for 300 Million Subscribers

#60

Yesterday at 7:03 AM

Netflix's content delivery network, Open Connect, delivers over half of the world's internet traffic at peak. In this episode, Lucas and Luna dive deep into the specific architectural decisions Netflix made to scale its CDN from 100 million to 300 million subscribers. They explore the shift from commercial CDNs to a peered, ISP-embedded appliance model, the move from spinning disks to NVMe SSDs, and the caching algorithms that optimize for long-tail content. The hosts also discuss how Netflix manages the trade-off between cache hit ratio and storage cost, and why they chose to build their own hardware. This episode is a...

How Walmart Rebuilt Its Supply Chain with Real-Time Data

#59

Last Thursday at 7:20 PM

Walmart's supply chain is the largest in the world, moving over 5 billion units annually. In this episode, Lucas and Luna explore how Walmart rebuilt its supply chain on real-time data from edge to shelf. They break down the 2019 shift from batch processing to streaming events, the use of Kafka at massive scale, and how a machine learning model called 'Eddie' predicts demand by the hour. Lucas explains why Walmart moved its inventory management from mainframes to a cloud-native architecture built on Google Cloud, and how real-time visibility reduced out-of-stocks by 16% in pilot stores. The conversation covers the technical trade-offs—wh...

How Datadog Monitors Its Own Infrastructure

#58

Last Thursday at 7:11 AM

Episode 58 of The CTO Podcast goes inside Datadog's engineering org to explore how the company monitors its own 100-terabyte infrastructure. Lucas and Luna walk through Datadog's dogfooding culture, the architectural challenges of running a monitoring platform for itself, and how the team handles alert fatigue, distributed tracing, and log ingestion at massive scale. They discuss specific tools like the Datadog Agent, the trace-agent, and the custom time-series database built in-house. The episode includes concrete numbers: 30 trillion time-series points ingested daily, 99.99 percent uptime target, and how the SRE team manages 8,000 hosts across multiple cloud providers. Tune in for a rare...

How Figma Rebuilt Its Multiplayer Engine for 500 Users per File

#57

Last Wednesday at 7:12 PM

Figma's multiplayer engine lets hundreds of designers edit the same file simultaneously. How did they rebuild it from scratch to handle over 500 concurrent users per document without conflicts or lag? Lucas and Luna break down the architecture: the shift from CRDTs to a custom conflict-resolution layer, the 'change tree' data structure that replaced operational transforms, and the decision to move from WebSockets to WebRTC data channels for sub-200ms sync. They also discuss the engineering trade-offs: why Figma chose JavaScript over Rust for the client, how they handle undo/redo in a multi-user environment, and the surprising bottleneck that...

How Cloudflare Built Its Global Network for 30 Million Requests per Second

#56

Last Wednesday at 7:17 AM

Lucas and Luna break down the architectural decisions behind Cloudflare's global edge network, which handles over 30 million HTTP requests per second. They explore how the company moved from a simple reverse proxy to a distributed system spanning 330 cities, the role of custom-built Nginx configurations, and the trade-offs between latency and consistency. Specific topics include the use of Anycast routing, the challenge of DDoS mitigation at scale, and how Cloudflare optimized its cache hierarchy for static content delivery. This episode is a deep dive for engineers and CTOs interested in high-performance networking and edge computing.

#Cloudflare #EdgeNetwork #CDN...

How Stripe Migrated Payment Routing to 99.999% Uptime

#55

Last Tuesday at 7:15 PM

Episode 55 of The CTO Podcast dives into how Stripe rebuilt its payment routing engine to achieve 99.999% uptime. Lucas and Luna break down the architectural shift from a monolithic routing layer to a distributed, deterministic system that handles millions of transactions per second. They explore the team's decision to move away from traditional load balancers, the role of formal verification in routing logic, and how Stripe's engineers stress-tested the system with simulated global outages. Along the way, they discuss the trade-offs between latency and consistency, and why a gradual canary deployment was critical. This episode offers concrete lessons for engineering...

How Datadog Monitors Its Own 100-Terabyte Infrastructure

#54

Last Tuesday at 7:13 AM

Episode 54 of The CTO Podcast: Lucas and Luna explore how Datadog, the monitoring giant, uses its own tools to manage a sprawling infrastructure that ingests over 100 terabytes of data daily. They dive into the dogfooding strategy, the architectural choices that keep observability scalable, and the surprising insight that Datadog runs its entire backend on a single PostgreSQL fork — with custom sharding. Lucas explains the engineering org structure behind the monitoring team, and Luna questions whether dogfooding can blind teams to customer pain. Specific examples include how Datadog handles metric cardinality explosion and why they built a separate time-series database in...

How Stripe Rebuilt Payment Routing for 99.999% Uptime

#53

Last Monday at 7:36 PM

Stripe's payment infrastructure processes billions of dollars annually, and their routing engine—the system that decides which bank or processor gets each transaction—is a marvel of distributed systems engineering. In this episode, Lucas and Luna explore how Stripe rebuilt its payment routing layer to achieve five-nines uptime, handling failures at the bank level in milliseconds without user impact. They break down the architecture: the state machine that tracks each transaction through six phases, the circuit-breaker pattern that isolates failing processors, and the decision-tree optimization that cut latency by 40 percent. Lucas explains why routing is the hardest problem in paym...

How Supabase Rebuilt Postgres for Real-Time Apps

#52

Last Monday at 7:11 AM

In this episode, Lucas and Luna explore how Supabase, an open-source Firebase alternative, built a real-time layer on top of PostgreSQL that handles millions of concurrent WebSocket connections. They break down the architecture behind Supabase's Realtime server, which uses PostgreSQL's logical replication and Elixir's BEAM VM to stream database changes to client applications with sub-second latency. Lucas explains why the team chose to fork PostgreSQL's replication slot mechanism and how they handle backpressure when clients fall behind. Luna questions the trade-offs of using WebSockets versus server-sent events for real-time data synchronization. The conversation also touches on Supabase's decision to...

How Discord Rebuilt Its Voice Engine for Sub-50ms Latency

#51

Last Sunday at 7:23 PM

In this episode of The CTO Podcast, Lucas and Luna dive into how Discord achieved sub-50 millisecond voice latency across millions of concurrent users. They break down the specific architectural changes Discord made: switching from Opus to a custom codec called Siren, rewriting their audio processing pipeline in Rust, and deploying edge relays in over 300 locations worldwide. The discussion covers why Discord chose to build its own transport protocol over WebRTC, how they handle packet loss with forward error correction, and the trade-offs between CPU usage and bandwidth. Lucas explains the key metric that guided their redesign — the 99th pe...

How Airbnb Rebuilt Search for 150 Million Guests

#50

Last Sunday at 7:07 AM

In this episode, Lucas and Luna dive into Airbnb's multi-year effort to rebuild its search infrastructure to handle 150 million nightly searches. They explore the shift from a monolithic PostgreSQL-backed system to a custom search service built on Elasticsearch, the trade-offs between relevance and latency, and the team's decision to implement a two-phase ranking system with lightweight machine learning at query time. Specific numbers include Airbnb's pre-migration latency of 800 milliseconds for a single search and the post-migration reduction to under 200 milliseconds at peak. The discussion also covers how the engineering team organized around the project, the cultural challenges of migrating...

How Postgres Powers 40 Percent of New Cloud Databases

#49

06/13/2026

Lucas and Luna examine how PostgreSQL has quietly become the default database for modern cloud-native applications. They trace the journey from a 1996 open-source project to powering 40 percent of new database instances on AWS, Azure, and Google Cloud. The episode focuses on the architectural decisions that made Postgres scalable: its extension ecosystem, the rise of managed services like Aurora and Cloud SQL, and how its MVCC concurrency model handles mixed workloads. They also discuss why developers are migrating from proprietary databases and what Postgres's dominance means for the database industry. Specific examples include how Instacart uses Postgres for real-time inventory...

How Gitlab Runs Remote Engineering with 2000 Developers

#48

06/13/2026

In this episode of The CTO Podcast, Lucas and Luna dive into the operational and cultural mechanics behind GitLab's all-remote engineering organization. With over 2000 developers spread across 65 countries, GitLab has become a case study in asynchronous work, written documentation, and intentional culture-building. Lucas walks through GitLab's handbook-first approach, how they structure teams around 'stable counterparts' to avoid silos, and the specific tools and rituals that keep a global engineering org aligned. Luna challenges the model's trade-offs: burnout risk in async environments, the difficulty of onboarding without synchronous mentorship, and whether remote scales differently for engineering versus other functions. Together...

How Monzo Rebuilt Its Core Banking Engine for Real-Time

#47

06/12/2026

Lucas and Luna dive into how Monzo, the UK digital bank, replaced its legacy core banking system with a real-time event-driven architecture. They explore the technical bet on Apache Kafka as the source of truth, the migration from a batch-processing model to stream processing, and the engineering trade-offs involved in ensuring instant balance updates without breaking financial integrity. With specific numbers on transaction throughput and uptime targets, this episode unpacks a case study in modernizing financial infrastructure at scale.

#Monzo #CoreBanking #EventDriven #ApacheKafka #RealTime #Fintech #StreamProcessing #Microservices #Architecture #Migration #FinancialServices #UKTech #Business #Technology #FexingoBusiness #BusinessPodcast #CTOPodcast #TechLeadership<...

How LinkedIn Rebuilt Search for 950 Million Members

#46

06/12/2026

LinkedIn's search team faced a massive technical challenge: how to serve relevant results to 950 million members across jobs, people, companies, and posts — all while respecting privacy and permissions. In this episode, Lucas and Luna dive into how the team rebuilt LinkedIn's search infrastructure using a real-time indexing pipeline and a custom retrieval engine called Galene. They discuss the trade-offs between relevance and speed, the decision to move away from Apache Solr, and how LinkedIn handles multilingual queries and typo tolerance. Specific numbers include: 2.5 billion search queries per week, 100 million daily active job searches, and a 40 percent reduction in query la...

How HashiCorp Rebuilt Terraform for Multi-Cloud Scale

#45

06/11/2026

In this episode, Lucas and Luna dive into HashiCorp's architectural overhaul of Terraform to handle multi-cloud deployments at massive scale. They explore the shift from a monolithic state management system to a modular, plugin-based architecture, the introduction of Terraform Cloud's real-time collaboration features, and the engineering decisions behind maintaining backward compatibility while scaling to over 100 million monthly runs. The hosts discuss the trade-offs between performance and consistency, the role of infrastructure as code in modern DevOps, and how HashiCorp's approach to provider abstraction enables organizations to manage hundreds of cloud resources across AWS, Azure, and Google Cloud seamlessly. A...

How CockroachDB Survived the Cloud Database Wars

#44

06/11/2026

Episode 44 of The CTO Podcast dives deep into how Cockroach Labs built a distributed SQL database that could survive not just server failures, but the competitive onslaught of AWS, Google, and Microsoft. Lucas walks through the key architectural decisions — the Raft consensus protocol, the geo-partitioning trick that made multi-region compliance possible, and the controversial move to make the product open-source but the enterprise features proprietary. Luna presses on how CockroachDB lost Google's internal adoption to Spanner but won over financial-services customers like JPMorgan. The episode also covers the inflection point in 2023 when CockroachDB hit $50 million in annual recurring revenue an...

How Vercel Rebuilt Its Edge Network for Sub-50ms Cold Starts

#43

06/10/2026

Lucas and Luna dive into how Vercel redesigned its edge compute layer to achieve cold-start latencies under 50 milliseconds, even for complex serverless functions. They unpick the architecture behind Vercel's 'Edge Functions' — from isolate pooling and Wasm-based sandboxing to regional pre-warming. The hosts discuss the trade-offs between JavaScript and Rust runtimes, how Vercel collaborates with Cloudflare on WinterJS, and why sub-50ms cold starts matter for real-time personalisation at scale. A concrete look at the engineering decisions that let developers run logic at the network edge without the traditional cold-start tax.

#Vercel #EdgeComputing #Serverless #ColdStarts #Wasm #WinterJS #Cloudflare #Ru...

How Slack Rebuilt Its Backend for 10 Million Daily Active Users

#42

06/10/2026

In this episode, Lucas and Luna dive into the technical decisions behind Slack's backend overhaul as it scaled from a small team tool to a platform serving 10 million daily active users. They explore how Slack moved from a monolithic Ruby on Rails architecture to a service-oriented model using Java and C++, the critical choice of building its own message queue instead of relying on Kafka or RabbitMQ, and how the team tackled the 'unread counts' challenge that nearly broke the system. With specific examples like the Flannel service for real-time presence and the Vitess database sharding layer, this episode...

How Notion Scaled Its Real-Time Sync Engine

#41

06/09/2026

Notion's real-time sync has become table stakes for any collaborative product, but building it was anything but straightforward. In this episode, Lucas and Luna break down how Notion's engineering team moved from a naive polling model to a custom CRDT-based sync engine that handles millions of concurrent edits across documents, databases, and wikis. They walk through the key design decisions: why they chose a hybrid logical clock over vector clocks, how they handle conflict resolution without a central server, and the storage tradeoffs they made to keep latency under 100 milliseconds. Lucas also shares a concrete example of a sync...

How Linear Uses Linear Technology to Build Linear

#40

06/09/2026

Episode 40 of The CTO Podcast explores Linear — the project management tool built by a team of seven engineers using what they ship. Lucas and Luna walk through Linear's architecture: a single TypeScript codebase, a custom sync engine built on SQLite and CRDTs, and how they handle optimistic updates with zero conflict. The episode examines why the team chose not to adopt microservices, how they keep latency under 50 milliseconds even on shaky connections, and what happens when your dogfooding strategy means your entire infrastructure is also your product. Specific numbers discussed: seven engineers, 50 ms sync latency, zero merge conflicts on is...

How Shopify Handles Black Friday Traffic With Static Caching

#39

06/08/2026

Lucas and Luna break down how Shopify prepares its infrastructure for the biggest shopping day of the year. They focus on a specific technique: using edge static caching to absorb 90 percent of read requests before they hit the application layer. The episode walks through Shopify's architecture for serving storefront pages from CDN nodes, how they invalidate caches when a merchant updates a product, and what happens when the cache misses. Lucas explains the trade-offs between stale content and site reliability, and Luna asks about the blast radius of a cache stampede. They also touch on how Shopify's approach differs...

How Amazon Built Its One-Day Delivery Supply Chain

#38

06/08/2026

In 2019, Amazon announced it would convert Prime shipping from two days to one day. Most people saw a marketing promise. Engineers saw a logistics nightmare. This episode unpacks how Amazon rebuilt its fulfillment network — restructuring inventory placement, rethinking sortation center algorithms, and launching its own air hub in Cincinnati — to make one-day delivery economically viable across millions of SKUs. Lucas and Luna walk through the key architectural decisions: how Amazon used machine learning to predict demand at the zip-code level, decoupled its fulfillment centers from its transportation layer, and absorbed a multi-billion-dollar cost that competitors couldn't replicate. They also touc...

How GitLab Runs Remote Engineering with 2000 Developers

#37

06/07/2026

In this episode, Lucas and Luna dive into how GitLab manages a fully remote engineering organization of over 2,000 developers. They explore the company's unique handbook-first culture, how they maintain code quality across time zones, and the specific tools they use for asynchronous communication. Lucas shares key metrics: GitLab ships 40 releases per year with a median merge request cycle time of under 6 hours. They also discuss how the company handles onboarding, performance reviews, and incident response without a physical office. A must-listen for anyone leading or building a remote engineering team.

#GitLab #RemoteEngineering #EngineeringManagement #AsynchronousWork #DevOps #CodeReview #TechLeadership...

How Figma Scales Real-Time Collaboration With CRDTs

#36

06/07/2026

Episode 36 of The CTO Podcast dives into how Figma built its real-time collaboration engine using Conflict-Free Replicated Data Types (CRDTs). Lucas and Luna unpack the architectural decision to move from Operational Transform to CRDTs, how Figma handles merge conflicts at scale, and the engineering tradeoffs behind its vector-based multi-user editing. They walk through the key design choices: why Figma chose a custom CRDT instead of off-the-shelf libraries, how it serialises operations for low-latency sync across hundreds of collaborators on a single file, and the surprising way it prioritises local responsiveness over consistency. Luna asks the hard questions about production...

How Elasticsearch Powers Netflix's Search and Observe

#35

06/06/2026

Netflix runs one of the largest Elasticsearch deployments in the world — over 150 clusters, thousands of nodes, processing tens of billions of documents. In this episode, Lucas and Luna unpack how Netflix uses Elasticsearch not just for log aggregation, but to power its internal search, real-time monitoring, and even the titles you see when you open the app. They walk through the architecture behind Netflix's search — from how they handle partial matches across 17,000 titles to how they keep observability data flowing without crashing the clusters. Along the way, they cover shard sizing, index lifecycle management, and the painful lessons Netflix lear...

How Discord Rebuilt Its Voice Engine for Latency

#34

06/06/2026

In this episode of The CTO Podcast, Lucas and Luna dive into Discord's architectural overhaul of its real-time voice system. They explore how the team reduced latency from hundreds of milliseconds to under 50 by switching from a traditional client-server model to a mesh-based WebRTC architecture. The discussion covers the trade-offs of running their own media servers versus outsourcing, the engineering challenge of synchronizing 50 users in a single voice channel without a central coordinator, and how Discord handled the transition without disrupting its 150 million monthly active users. Lucas explains the key insight: rather than optimizing the existing pipeline, Discord rethought...

How AWS Built Its Control Plane for 200 Services

#33

06/05/2026

Amazon Web Services runs over 200 services, each with its own control plane. In this episode, Lucas and Luna break down how AWS's internal architecture team designed a unified control plane framework that handles millions of API requests per second across regions. They explore the concept of 'control plane as a platform' — a set of reusable primitives for authorization, rate limiting, and state management that lets service teams focus on business logic. Lucas walks through the key design decisions: separating data plane from control plane at the infrastructure level, using eventual consistency for global state, and the 'cell-based architecture' that is...

How Stripe Runs a Global Payment Platform With 99.999 Percent Uptime

#32

06/05/2026

Stripe processes hundreds of billions in payments annually. But behind the API is a reliability architecture that few people talk about. In this episode, Lucas and Luna dive into how Stripe achieves five-nines uptime across its payment infrastructure — the layers of redundancy, the careful rollout strategy, and the incident response playbook that keeps money moving. They explore Stripe's use of circuit breakers, gradual canary deployments, and a global multi-region database topology that can survive an entire cloud region going dark. Specific numbers: Stripe's documented 99.999% uptime goal, the 30-minute maximum recovery time for critical services, and how they test failure sc...

How Uber Rebuilt Its Maps for 40 Million Daily Rides

#31

06/04/2026

Episode 31 of The CTO Podcast digs into how Uber's engineering team rebuilt its mapping and routing stack from scratch between 2019 and 2022 to handle over 40 million daily rides across 10,000 cities. We look at the specific reason they abandoned the old pipeline — vendor lock-in with Google Maps and a 40 percent cost increase in a single quarter — and how they designed a modular routing engine called Michelangelo Maps. Lucas explains the architecture: a C++ kernel for shortest-path that runs in under 50 milliseconds, a tile-based geocoding layer that reduced queries by 80 percent, and a machine learning model that predicts travel time to within 5 perc...

How Spotify Migrated to Google Cloud Without Breaking Discovery Weekly

#30

06/04/2026

In 2016, Spotify announced it was moving its entire infrastructure from its own data centers to Google Cloud Platform. The migration took four years and involved moving over 1,200 services, petabytes of data, and the machine learning pipelines powering Discover Weekly — all while keeping the music streaming without audible interruption. Lucas and Luna break down how Spotify's engineering team pulled off one of the largest cloud migrations in tech history, the architectural decisions that made it possible, and the lessons for any organization facing a big infrastructure move. Featuring the surprising role of a custom tool called 'Sisyphus' and why Spotify ch...

How Stripe Uses Idempotency Keys to Prevent Double Charges

#29

06/03/2026

Stripe processes billions of dollars in payments every year. One double charge could destroy trust. In this episode, Lucas and Luna break down how Stripe uses idempotency keys — a simple but brilliant engineering pattern — to guarantee that even if a network request is retried dozens of times, the customer is charged exactly once. They walk through a real-world example: a customer hitting 'Place Order' twice during a card decline, the first attempt succeeds, and the second attempt should not create a duplicate charge. Lucas explains the idempotency key lifecycle: generation, storage in Redis, TTL, and response replay. He contrasts Stri...

How Pixar Rebuilt Its Render Farm for Real-Time

#28

06/03/2026

In this episode, we dive into how Pixar Engineering rebuilt their legendary render farm architecture to support hybrid real-time workflows without sacrificing the fidelity that made 'Soul' and 'The Incredibles 2' possible. Hosts Lucas and Luna unpack the tradeoffs between batch rendering and real-time ray tracing, the shift to a unified storage fabric, and how Pixar's internal tool RenderMan co-evolved with Disney's streaming push. We discuss the specific challenge of maintaining deterministic results across heterogeneous GPU clusters and how the team used a scene-graph abstraction to decouple authoring from rendering. A concrete look at the infrastructure behind animated movies...

How Stack Overflow Survived ChatGPT's First Year

#27

06/02/2026

When ChatGPT launched in late 2022, many predicted Stack Overflow was dead. Traffic dropped 14 percent quarter-over-quarter in early 2023 as developers copied AI-generated code instead of browsing answers. By mid-2024, the site had stabilized and even recovered some traffic. In this episode, Lucas and Luna unpack Stack Overflow's survival playbook: why the moderation layer gave it staying power, how they launched OverflowAI without alienating their core community, and what the traffic data says about developer trust in AI-generated answers versus human-vetted ones. They also discuss a concrete lesson for any platform facing generative AI disruption—namely, that curation becomes more valuable, no...

How Netflix Rebuilt Its Encoding Pipeline for Bandwidth Savings

#26

06/02/2026

Lucas and Luna dive into how Netflix re-engineered its video encoding pipeline to shave bandwidth usage without sacrificing quality. They explore the technical trade-offs between constant bitrate and variable bitrate encoding, the role of per-title encoding optimization, and how the streaming giant uses machine learning to dynamically encode every frame. The episode also touches on why this matters for mobile users and emerging markets. Listeners learn a concrete example of how a real-world engineering team turned a bandwidth problem into a competitive advantage.

#Netflix #VideoEncoding #StreamingTechnology #BandwidthOptimization #MachineLearning #PerTitleEncoding #CBRvsVBR #EngineeringOrg #TechnicalLeadership #Architecture #CTOPodcast #FexingoBusiness #BusinessPodcast #TechInfrastructure...

How Stripe Uses Idempotency Keys to Prevent Double Charges

#25

06/01/2026

In this episode, Lucas and Luna dive into one of the most elegant patterns in distributed systems: the idempotency key. Using Stripe's payment API as the central case, they explain how a single HTTP header prevents duplicate charges during network retries, how Shopify applies the same pattern to order creation, and why idempotency is a fundamental principle for any system that deals with money, inventory, or state changes. The discussion covers the mechanics of idempotency keys, their role in exactly-once semantics, and practical trade-offs like key expiration and storage. Listeners will walk away understanding a concrete tool to make...

How Monzo Keeps Its Banking App Running Like a Startup

#24

06/01/2026

In this episode, we dive into how Monzo, the UK digital bank, maintains a startup-like engineering velocity while managing millions of transactions daily. We explore their use of event sourcing and the CQRS pattern to decouple read and write workloads, and how they keep their core banking ledger simple despite scaling to over 9 million customers. Lucas breaks down Monzo's approach to feature flags and gradual rollouts—treating every deployment as an experiment. Luna chimes in with her own experience from a fintech that tried a similar architecture and hit unexpected pain points. We also touch on how Monzo's engineering te...

How Palantir Migrated to the Cloud Without Losing Security Clearance

#23

05/31/2026

Palantir runs critical infrastructure for the US military and intelligence community. In 2020, the company began migrating its entire stack from on-premise government data centers to Amazon Web Services while maintaining top-secret security accreditation. This episode breaks down the technical architecture that made the move possible: how Palantir built a 'cloud bridge' that let legacy and cloud environments run in parallel, the zero-trust networking layer that replaced traditional VPNs, and the compliance automation that turned six-month audits into continuous monitoring. Lucas and Luna also discuss what the migration reveals about the future of defense tech procurement and why the Pentagon's...