When Clean Data Is Actually Dirty
We often treat data cleaning as a neutral step.Delete missing rows. Fill gaps with the mean. Move on.But cleaning is not neutral. It is a modeling decision.In this episode, we unpack the statistical consequences of deletion and simple imputation, and why what looks “clean” can fundamentally alter your estimand, distort variance, and bias inference.We walk through:The formal role of the missingness indicatorThe difference between MCAR, MAR, and MNARWhy complete-case analysis is rarely as safe as it seemsHow mean imputation collapses variance and attenuates regression slopesWhen multiple imputation and inverse probability weighting are appropriateWhy sensitivity analysis beco...
When Clean Data Is Actually Dirty
“Cleaning” data is often treated as a harmless preprocessing step.
Delete missing rows.
Fill gaps with the mean.
Move forward.
But cleaning is not neutral.
It is a modeling decision that can change:
The estimandThe sampling mechanismThe bias–variance trade-offIn this episode, we examine the statistical dangers of deletion and simple imputation — and why naïve cleaning can quietly corrupt inference.