Ship It Weekly - DevOps, SRE, Platform and Cloud Engineering News
Ship It Weekly is a short, practical recap of what actually matters in DevOps, SRE, cloud infrastructure, and platform engineering.Each episode, your host Brian Teller walks through the latest outages, releases, tools, and incident writeups, then translates them into “here’s what this means for your systems” instead of just reading headlines. Expect a couple of main stories with context, a quick hit of tools or releases worth bookmarking, and the occasional segment on on-call, burnout, or team culture.This isn’t a certification prep show or a lab walkthrough. It’s aimed at people who are already working in the spa...
PeopleSoft Zero-Day Exploited, npm v12 Install Script Changes, GitHub Agentic Tokens, Anthropic Model Risk, and Default Trust Breaking
This episode of Ship It Weekly is about default trust getting punished. Brian covers Oracle’s emergency PeopleSoft advisory for CVE-2026-35273, npm v12 changing install-script defaults, GitHub Agentic Workflows moving away from long-lived personal access tokens, and Anthropic disabling Fable 5 and Mythos 5 after a U.S. export-control directive. The common thread: legacy ERP systems, package installs, CI/CD agents, and AI models all become production risks when teams trust the default without checking what that trust can actually do.
In the lightning round, Brian covers Tekton CloudEvents moving to a dedicated events controller, NVIDIA Triton Inference Se...
Ship It Conversations: Meta’s Francois Richard on AI Incident Response, SLOs, and Reliability at Scale
This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.
In this Ship It: Conversations episode, I talk with Francois Richard, Engineering Director at Meta, about reliability at scale, how AI is changing production risk, what teams actually learn from incidents, and why recovery practice matters just as much as prevention.
We talk about the proactive and reactive sides of reliability, why SLOs should represent a promise to users instead of just another dashboard number, how incident reviews should drive real system improvements, and how teams can practice recovery...
Coinbase Outage, Meta AI Account Recovery, AWS AgentCore Code Injection, Apigee Tenant Isolation, and the Glue That Breaks Production
This episode of Ship It Weekly is about the hidden glue holding production together.
Brian covers Coinbase’s May 7 outage postmortem, where an AWS us-east-1 cooling failure exposed the difference between being “multi-AZ” on paper and actually being able to recover when stateful, low-latency systems are tied to a failed zone.
Then he looks at Meta’s AI-assisted Instagram support issue and why account recovery is identity infrastructure, not just customer support. If AI can influence password resets, email changes, MFA resets, or account ownership flows, that workflow needs to be treated like a production control...
Kiro CLI Approval Bypass, Amazon Braket Pickle Risk, AWS Org Logging, KEDA Upgrades, and Automation’s Hidden Boundaries
This episode of Ship It Weekly is about automation’s hidden boundaries. Brian covers Kiro CLI CVE-2026-9255, where piped stdin could act like user approval, Amazon Braket SDK CVE-2026-9291 and the very normal Python pickle risk hiding inside quantum job results, AWS Organizations finally emitting CloudTrail events when accounts join or leave an org, and KEDA updates that remind us autoscaling upgrades are production behavior changes.
The bigger thread this week is that automation does not remove boundaries. It moves them. Approval paths, trusted data, account membership, scaling signals, platform access, and AI-generated output all ne...
GitHub Supply Chain Attacks, Railway’s GCP Outage, Discord’s Voice Failure, AWS Retry Changes, and Trusted Tool Risk
This episode of Ship It Weekly is about trusted tools becoming production dependencies. Brian covers a rough GitHub supply chain week, including the compromised Nx Console VS Code extension tied to exposed GitHub internal repositories and the Megalodon campaign abusing GitHub Actions workflows across thousands of public repos.
The bigger thread this week is that the tools around production are increasingly part of production. Brian also covers Railway’s GCP account suspension outage, Discord’s voice outage during a Kubernetes migration, AWS changing SDK retry behavior, CVE-2026-9133 in the RabbitMQ AWS plugin, and a Reddit story abou...
Ship It Conversations: Jake Warner on Cycle.io, Bare Metal’s Comeback, and Why Private Cloud Is Getting Interesting Again
This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.
In this Ship It: Conversations episode, I talk with Jake Warner, founder and CEO of Cycle.io, about private cloud, bare metal, Kubernetes fatigue, and why some teams are rethinking how much infrastructure complexity they actually want to carry.
We talk about why bare metal and private cloud are getting interesting again, especially around cost, performance, data sovereignty, compliance, and platform ownership. Jake explains how Cycle approaches infrastructure as a pool of resources, why he thinks in terms of “en...
CISA’s GitHub Leak, AI Root Cause Analysis, Copilot Agents, Claude Code in CI/CD, and Kubernetes Seccomp Risk
This episode of Ship It Weekly is about secrets, agents, risky defaults, and follow-up work that never gets done. Brian covers the CISA contractor GitHub leak involving AWS keys, internal docs, Terraform, Kubernetes, Argo CD, and CI/CD context, plus AWS DevOps Agent doing automated RCA across Datadog, Elasticsearch, CloudTrail, and EKS.
Brian also covers MS Copilot Studio computer-using agents, Claude Code in Bitbucket Agentic Pipelines, CVE-2026-46333 and Kubernetes seccomp defaults, GitHub OIDC for Dependabot, Java pods getting OOMKilled, LLM-generated SQL that can be wrong but still run, and why postmortem action items die without ownership.<...
AI Agents Get API Access and Identity: GitHub Copilot Cloud Agents, MCP Auth, Ansible Automation, OpenAI Daybreak, and the New Production Risk
This episode of Ship It Weekly is about AI agents moving from helpful coding assistants into real operational actors. Brian covers GitHub making Copilot cloud agent tasks available through a REST API, Auth0 bringing authentication and authorization to MCP servers, Red Hat positioning Ansible as a trusted execution layer for agentic IT operations, and OpenAI Daybreak pushing AI deeper into security research and remediation.
The bigger thread this week is authority: what these agents can reach, what they can change, who approved the action, and who owns the outcome when something breaks.
Brian also covers...
Cursor Deletes PocketOS Prod DB, .de DNSSEC Outage, Bluesky Postmortem, Argo CD, and Copy Fail
This episode of Ship It Weekly is about modern reliability getting squeezed from both directions. Old-school failures still hit hard, like broken DNSSEC, kernel privilege escalation bugs, and GitOps behavior changes. But newer automation layers add a second kind of risk, where AI agents, machine identity, and cloud control planes can do real damage fast when authority is too broad. Brian covers the Cursor and PocketOS production database wipe, the .de DNSSEC outage and Cloudflare’s response, Bluesky’s April outage postmortem, Argo CD v3.1.16 reaching end of life plus the v3.4.1 behavior change, Linux kernel CVE-2026-31431 under acti...
Ship It Conversations: Gareth Kersey on IaCConf 2026, AI, and Corey Quinn’s Terraform Keynote
This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.
This episode is not sponsored. I wanted to cover IaCConf because the theme lines up closely with what Ship It Weekly focuses on: infrastructure, platform engineering, DevOps, SRE, and how teams are adapting to AI-driven change.
In this Ship It: Conversations episode, I talk with Gareth Kersey about IaCConf 2026, a free virtual conference focused on infrastructure as code, platform engineering, DevOps, SRE, and infrastructure operations. The conference is May 14th 2026.
The main theme is “keeping pace.” Not just...
GitHub RCE, AI Agent Prompt Injection, and the New Reality: Your Developer Toolchain Is Production Now
This episode of Ship It Weekly is about the developer toolchain becoming part of production. Brian covers GitHub’s critical git push RCE, AI-assisted reverse engineering, prompt injection against AI agents in GitHub workflows, Elementary’s malicious CLI release, GitHub’s merge queue regression, Cal.com going closed source, and Copilot moving toward usage-based billing. Plus: MinIO’s repo archive, Ghostty leaving GitHub, Docker Hardened Images, and Azure DevOps security updates.
Links
GitHub git push RCE https://github.blog/security/securing-the-git-push-pipeline-responding-to-a-critical-remote-code-execution-vulnerability/
AI-assisted reverse engineering https://www.darkreading.com/application-security/reverse-engineering-ai-unearths-high-severity-github-bug
AI agents +...
Kubernetes 1.36, Gateway API v1.5, AWS Copilot End of Support, and Cloudflare Non-Human Identities
This episode of Ship It Weekly is about platforms getting sharper about defaults, ownership, and the old paths they are no longer willing to quietly carry forever. Brian covers Kubernetes 1.36 and why it feels more like a cleanup-and-maturity release than a flashy feature dump, Gateway API v1.5 moving more networking behavior into the stable path, AWS Copilot CLI reaching end of support and what that means for teams still sitting on the older “easy” ECS workflow, Airbnb’s alert-development overhaul and why noisy or weak alerts are often a workflow problem long before they become an on-call problem, and Cloudf...
Ship It Conversations: Stephane Moser on Pipedrive’s Jenkins-to-GitHub Actions Migration, Argo CD, and CI/CD at Scale
This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.
In this Ship It: Conversations episode, I talk with Stephane Moser about Pipedrive’s move from Jenkins to GitHub Actions, building self-hosted runners on Kubernetes, shifting deployments toward GitOps with Argo CD, and what it actually takes to roll out a big CI/CD change across a large engineering org.
We talk about why Jenkins had become painful, from Groovy friction to noisy-neighbor problems on shared VMs, why GitHub Actions fit better, how reusable workflows and custom actions helped, wh...
AWS Interconnect GA, Cloudflare Mesh, GitLab 19, EKS Auto Mode, and OpenTelemetry Config
This episode of Ship It Weekly is about networking, ingress, and private access moving further up into the platform layer. Brian covers AWS Interconnect going generally available, Cloudflare Mesh, GitLab 19.0 breaking changes around Gateway API and bundled services, EKS Auto Mode networking, and OpenTelemetry declarative config reaching stability. He also hits containerd security patches, GitHub’s new Code Security risk assessment, and AWS guidance on securing AI agents with MCP. (Amazon Web Services, Inc.)
Links
AWS Interconnect GA and last mile connectivity https://aws.amazon.com/blogs/aws/aws-interconnect-is-now-generally-available-with-a-new-option-to-simplify-last-mile-connectivity/
Cloudflare Mesh https://blog.cl...
Special: Claude Mythos Preview and Project Glasswing: AI Exploit Discovery, Zero-Day Risk, Business Fallout, and What It Means for DevOps, Cloud, and Platform Security
In this Ship It Weekly special, Brian breaks down Claude Mythos Preview and Project Glasswing, and why this story matters beyond normal AI launch hype.
Anthropic is treating Mythos like a real security inflection point, not just a better coding model. Project Glasswing is their coordinated effort to get early access into the hands of defenders, critical software maintainers, and major infrastructure organizations before similar capability becomes more broadly available. If OpenClaw was about agents becoming a new control plane, this episode is about what happens when finding ways into messy environments and control planes starts getting...
Amazon S3 Files, Malicious npm Plugins, Trivy Fallout, and Kubernetes’ Gateway Shift
This episode of Ship It Weekly is about the interface layer becoming the story. Brian covers Amazon S3 Files and why it feels more like a managed filesystem layer in front of S3 than “S3 is EFS now,” including how it relates to the old s3fs and FUSE-style approach. He also digs into 36 malicious npm packages posing as Strapi plugins, the uglier follow-on to the Trivy incident he discussed previously, Kubernetes Ingress2Gateway 1.0 and the push toward Gateway API, and Kubernetes Agent Sandbox as a sign that newer AI-style workloads are starting to reshape the platform itself.
Li...
Ship It Conversations: David Tuite on Backstage, Internal Developer Portals, and the Shift to AI Agents
This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.
In this Ship It: Conversations episode, I talk with David Chute, founder and CEO of Roadie, about internal developer portals, Backstage, automation, and how IDPs may evolve as AI agents become more common in engineering workflows.
We talk about the difference between a platform and a portal, the three common problems IDPs usually try to solve, why discoverability tends to be the first pain teams feel, and why a lot of orgs should start with automation before trying to...
GitHub Actions Hardening, Airbnb Config Rollouts, Cloudflare Rust Restarts, ECS Managed Daemons, and Terraform Access Controls
This episode of Ship It Weekly is about the quiet platform work that keeps things safe before they break. Brian covers GitHub Actions hardening in Kubernetes-related repos, Airbnb’s safer config rollouts, Cloudflare’s zero-downtime Rust restarts, Amazon ECS Managed Daemons, and HCP Terraform access controls with IP allow lists and temporary AWS permission delegation.
Links
GitHub Actions security roadmap
https://github.blog/news-insights/product-news/whats-coming-to-our-github-actions-2026-security-roadmap/
Airbnb config rollouts
https://medium.com/airbnb-engineering/safeguarding-dynamic-configuration-changes-at-scale-5aca5222ed68
Cloudflare graceful restarts for Rust
https://blog.cloudflare.com...
Hackerbot-Claw Grows, Xygeni Tag Poisoning, GitHub Search HA, Windows SID Failures, and AI Skills Supply Chain
This episode of Ship It Weekly is about the places where convenience quietly turns into trust.
Brian revisits the Trivy story by zooming out to the bigger hackerbot-claw GitHub Actions campaign, then gets into the Xygeni tag-poisoning compromise, GitHub’s search high availability rebuild for GitHub Enterprise Server, Windows Server 2025 surfacing duplicate SID problems in cloned images, and the agent-skills ecosystem replaying package supply chain history. Plus: a quick lightning round on GitHub pausing self-hosted runner minimum-version enforcement and March secret scanning updates.
Links
OpenSSF advisory on active GitHub Actions exploitation https://seclists.or...
Ship It Conversations: Ang Chen on Project Vera, AI Cloud Emulation, and Safer Infrastructure Testing
This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.
In this Ship It: Conversations episode, I talk with Ang Chen from the University of Michigan about Project Vera, a cloud emulator built to help teams test infrastructure changes more safely before they touch real cloud.
We talk about why testing against real cloud APIs is slow, expensive, and risky, how Vera works under tools like Terraform and CloudFormation, what “high fidelity” actually means, and where a tool like this could fit in local dev and CI/CD.
Th...
McKinsey AI Flaw, Kafka Goes Diskless, Google Buys Wiz, AWS Copilot Ends, and AI Gateway on Kubernetes
This week on Ship It Weekly, Brian looks at what happens when new interfaces create old responsibilities.
McKinsey patched a vulnerability in its internal AI tool Lilli, Kafka contributors are pushing a diskless-topics model that rethinks durability and replication in cloud environments, and Google officially closed Wiz acquisition in one of the biggest cloud-security moves. Plus: AWS is sunsetting Copilot CLI, Kubernetes launches an AI Gateway Working Group.
Links
McKinsey statement on Lilli
https://www.mckinsey.com/about-us/media/statement-on-strengthening-safeguards-within-the-lilli-tool
Kafka diskless topics proposal
https://cwiki.apache...
Meta Buys Moltbook, Block AI Layoffs Get Messier, Atlassian Cuts Jobs, and GitHub Explains the Outages
This week on Ship It Weekly, Brian covers five “AI meets reality” stories that every DevOps, SRE, security, and platform team can learn from.
Block’s AI layoff story is getting messier as follow-up reporting pushes back on the original framing, Meta bought Moltbook and brought more attention to the trust and security problems already showing up around AI-agent platforms, and Atlassian cut about 10% of its workforce while saying AI is changing the skills and roles it needs. Plus: GitHub gives one of the more honest outage breakdowns we’ve seen lately, Anthropic and Mozilla show a more gro...
Ship It Conversations: Yvonne Young on Linux Foundations, Mentorship, and Getting Job Ready in Cloud
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).
In this Ship It: Conversations episode I talk with Yvonne Young, a cloud and Linux mentor active in the CloudWhistler community. We talk about the real path into cloud and DevOps, why Linux still matters as a foundation, what “job ready” actually means, and why focus, consistency, and business thinking matter more than chasing every new tool.
Highlights
Linux fundamentals still matter because so much of cloud and infra work sits on top of LinuxWhat “job ready” really means: p...AWS Bahrain/UAE Data Center Issues Amid Iran Strikes, ArgoCD vs Flux GitOps Failures, GitHub Actions Hackerbot-Claw Attacks (Trivy), RoguePilot Codespaces Prompt Injection, Block “AI Remake” Layoffs, Claude Code Security
This week on Ship It Weekly, Brian looks at how the boundary of ops keeps expanding.
We cover AWS flagging issues in Bahrain/UAE amid Iran strikes, ArgoCD vs Flux and why ArgoCD can get stuck in failed sync states, GitHub Actions being exploited at scale (plus Trivy’s incident), RoguePilot prompt injection meeting real credentials in Codespaces, Block’s “AI remake” layoffs, and Anthropic’s Claude Code Security for defenders.
Lightning round: DeepSeek model access geopolitics, Vercel’s agentic security boundaries, a KEV CVE to patch, an MCP-atlassian SSRF-to-RCE chain, and Claude Cowork scheduled tasks.
Cloudflare BYOIP BGP Withdrawals, Clerk’s Postgres Query-Plan Flip Outage, and AWS Kiro Permissions Lessons (Grafana Privesc + runc CVEs)
This week on Ship It Weekly, Brian looks at how the boundary of ops keeps expanding.
We cover AWS flagging issues in Bahrain/UAE amid Iran strikes, ArgoCD vs Flux and why ArgoCD can get stuck in failed sync states, GitHub Actions being exploited at scale (plus Trivy’s incident), RoguePilot prompt injection meeting real credentials in Codespaces, Block’s “AI remake” layoffs, and Anthropic’s Claude Code Security for defenders.
Lightning round: DeepSeek model access geopolitics, Vercel’s agentic security boundaries, a KEV CVE to patch, an MCP-atlassian SSRF-to-RCE chain, and Claude Cowork scheduled tasks.
Ship It Conversations: Mike Lady on Day Two Readiness + Guardrails in the AI Era
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).
In this Ship It: Conversations episode I talk with Mike Lady (Senior DevOps Engineer, distributed systems) from Enterprise Vibe Code on YouTube. We talk day two readiness, guardrails/quality gates, and why shipping safely matters even more now that AI can generate code fast.
Highlights
Day 0 vs Day 1 vs Day 2 (launching vs operating and evolving safely)What teams look like without guardrails (“hope is not a strategy”)Why guardrails speed you up long-term (less firefighting, more predictable delivery)Day...Ship It Weekly – DevOps and SRE News for Engineers Who Run Production
Ship It Weekly is a DevOps and SRE news podcast for engineers who run real systems.
Every week I break down what actually matters in cloud, Kubernetes, CI/CD, infrastructure as code, and production reliability. No hype. No vendor spin. Just practical analysis from someone who’s been on call and shipped systems at scale.
This isn’t a tutorial show. It’s a signal filter.
I cover major industry shifts, security incidents, cloud provider changes, and tooling updates, then explain what they mean for platform teams and engineers operating in production.
If y...
GitHub Agentic Workflows, Gentoo Leaves GitHub, Argo CD 3.3 Upgrade Gotcha, AWS Config Scope Creep
This week on Ship It Weekly, Brian hits five stories where the “defaults” are shifting under ops teams.
GitHub is bringing Agentic Workflows into Actions, Gentoo is migrating off GitHub to Codeberg, Argo CD upgrades are forcing Server-Side Apply in some paths, AWS Config quietly expanded coverage again, and EC2 nested virtualization is now possible on virtual instances.
Links
YouTube episodes https://www.youtube.com/watch?v=tuuLlo2rbI0&list=PLYLi5KINFnO7dVMbhsJQTKRFXfSSwPmuL&pp=sAgC
OnCallBrief https://oncallbrief.com
Teller’s Tech Substack https://tellerstech.substack.com/
GitHub...
Special: OpenClaw Security Timeline and Fallout: CVE-2026-25253 One-Click Token Leak, Malicious ClawHub Skills, Exposed Agent Control Panels, and Why Local AI Agents Are a New DevOps/SRE Control Plane (OpenAI Hires Founder)
In this Ship It Weekly special, Brian breaks down the OpenClaw situation and why it’s bigger than “another CVE.”
OpenClaw is a preview of what platform teams are about to deal with: autonomous agents running locally, wired into real tools, real APIs, and real credentials. When the trust model breaks, it’s not just data exposure. It’s an operator compromise.
We walk through the recent timeline: mass internet exposure of OpenClaw control panels, CVE-2026-25253 (a one-click token leak that can turn your browser into the bridge to your local gateway), a skills marketplac...
When guardrails break prod: GitHub “Too Many Requests” from legacy defenses, Kubernetes nodes/proxy GET RCE, HCP Vault resilience in an AWS regional outage, and PCI DSS scope creep
This week on Ship It Weekly, Brian hits four stories where the guardrails become the incident.
GitHub had “Too Many Requests” caused by legacy abuse protections that outlived their moment. Takeaway: controls need owners, visibility, and a retirement plan.
Kubernetes has a nasty edge case where nodes/proxy GET can turn into command execution via WebSocket behavior. If you’ve ever handed out “telemetry” RBAC broadly, go audit it.
HashiCorp shared how HCP Vault handled a real AWS regional disruption: control plane wobbled, Dedicated data planes kept serving. Control plane vs data plane separation...
Azure VM Control Plane Outage, GitHub Agent HQ (Claude + Codex), Claude Opus 4.6, Gemini CLI, MCP
This week on Ship It Weekly, Brian hits four “control plane + trust boundary” stories where the glue layer becomes the incident.
Azure had a platform incident that impacted VM management operations across multiple regions. Your app can be up, but ops is degraded.
GitHub is pushing Agent HQ (Claude + Codex in the repo/CI flow), and Actions added a case() function so workflow logic is less brittle.
MCP is becoming platform plumbing: Miro launched an MCP server and Kong launched an MCP Registry.
Links
Azure status incident (VM service mana...
CodeBreach in AWS CodeBuild, Bazel TLS Certificate Expiry Breaks Builds, Helm Charts Reliability Audit, and New n8n Sandbox Escape RCE
This week on Ship It Weekly, Brian looks at four “glue failures” that can turn into real outages and real security risk.
We start with CodeBreach: AWS disclosed a CodeBuild webhook filter misconfig in a small set of AWS-managed repos. The takeaway is simple: CI trigger logic is part of your security boundary now.
Next is the Bazel TLS cert expiry incident. Cert failures are a binary cliff, and “auto renew” is only one link in the chain.
Third is Helm chart reliability. Prequel reviewed 105 charts and found a lot of demo-friendly defaults that don...
Ship It Conversations: AI Automation for SMBs: What to Automate (And What Not To) (with Austin Reed)
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).
In this Ship It: Conversations episode I talk with Austin Reed from horizon.dev about AI and automation for small and mid-sized businesses, and what actually works once you leave the demo world.
We get into the most common automation wins he sees (sales and customer service), why a lot of projects fail due to communication and unclear specs more than the tech, and the trap of thinking “AI makes it cheap.” Austin shares how they push teams toward quic...
curl Shuts Down Bug Bounties Due to AI Slop, AWS RDS Blue/Green Cuts Switchover Downtime to ~5 Seconds, and Amazon ECR Adds Cross-Repository Layer Sharing
This week on Ship It Weekly, Brian looks at three different versions of the same problem: systems are getting faster, but human attention is still the bottleneck.
We start with curl shutting down their bug bounty program after getting flooded with low-quality “AI slop” reports. It’s not a “security vs maintainers” story, it’s an incentives and signal-to-noise story. When the cost to generate reports goes to zero, you basically DoS the people doing triage.
Next, AWS improved RDS Blue/Green Deployments to cut writer switchover downtime to typically ~5 seconds or less (single-region). That’s a big deal...
n8n Auth RCE (CVE-2026-21877), GitHub Artifact Permissions, and AWS DevOps Agent Lessons
This week on Ship It Weekly, the theme is simple: the automation layer has become a control plane, and that changes how you should think about risk.
We start with n8n’s latest critical vulnerability, CVE-2026-21877. This one is different from the unauth “Ni8mare” issue we covered in Episode 12. It’s authenticated RCE, which means the real question isn’t only “is it internet exposed,” it’s who can log in, who can create or modify workflows, and what those workflows can reach. Takeaway: treat workflow automation tools like CI systems. They run code, they hold creden...
Ship It Conversations: Human-in-the-Loop Fixer Bots and AI Guardrails in CI/CD (with Gracious James)
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).
In this Ship It: Conversations episode I talk with Gracious James Eluvathingal about TARS, his “human-in-the-loop” fixer bot wired into CI/CD.
We get into why he built it in the first place, how he stitches together n8n, GitHub, SSH, and guardrailed commands, and what it actually looks like when an AI agent helps with incident response without being allowed to nuke prod. We also dig into rollback phases, where humans stay in the loop, and why validating ever...
n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix Temporal
This week on Ship It Weekly, Brian’s theme is basically: the “automation layer” is not a side tool anymore. It’s part of your perimeter, part of your reliability story, and sometimes part of your budget problem too.
We start with the n8n security issue. A lot of teams use n8n as glue for ops workflows, which means it tends to collect credentials and touch real systems. When something like this drops, the right move is to treat it like production-adjacent infra: patch fast, restrict exposure, and assume anything stored in the tool is high val...
Ship It Conversations: Backstage vs Internal IDPs, and Why DevEx Muscle Matters (with Danny Teller)
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).
I sat down with Danny Teller, a DevOps Architect and Tech Lead Manager at Tipalti, to talk about internal developer platforms and the reality behind “just set up a developer portal.” We get into Backstage versus internal IDPs, why adoption is the real battle, and why platform/DevEx maturity matters more than whatever tool you pick.
What we covered
Backstage vs internal IDPs Backstage is a solid starting point for a developer portal, but it doesn’t magica...
Fail Small, IaC Control Planes, and Automated RCA
This week on Ship It Weekly, Brian kicks off the new year with one theme: automation is getting faster, and that makes blast radius and oversight matter more than ever.
We start with Cloudflare’s “fail small” mindset. The core idea is simple: big outages usually come from correlated failure, not one box dying. If a bad change lands everywhere at once, you’re toast. “Fail small” is about forcing problems to stay local so you can stop the bleeding before it becomes global.
Next is Pulumi’s push to be the control plane for all your IaC...
Ship It Conversations: From Full-Stack to Cloud/DevOps, One Project at a Time (with Eric Paatey)
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).
I sat down with Eric Paatey, a Cloud & DevOps Engineer who’s been transitioning from full-stack web development into cloud/devops, and building real skills through hands-on projects instead of just collecting tools and buzzwords.
We talk about what that transition actually feels like, what’s helped most, and why you don’t need a rack of servers to learn DevOps.
What we covered Eric’s path into DevOps How he moved from building web apps to caring a...