What this site is for
ML Monitoring Report covers ML observability and MLOps from a production-engineering perspective. Here's what we publish.
ML Monitoring Report covers ML observability and MLOps from inside production engineering. The kind of writing we wanted to find when we were debugging a model that worked in eval and broke in prod.
What we publish:
Drift, the unsexy version. Concept drift, label drift, feature drift, training/serving skew. How to detect it in real systems, what thresholds actually catch problems, why most monitoring dashboards lie about it.
Production failure writeups. When models go wrong in the real world — silently degraded predictions, retraining loops gone bad, embedding-store corruption, vector-DB consistency issues — postmortems we wish vendors would publish.
Tooling reviews, honest. Arize, Fiddler, WhyLabs, Evidently, NannyML, Aporia, the open-source observability stack. Where each helps, where it solves problems you don’t have, what to install when you’re starting from zero.
MLOps without the hype cycle. Feature stores, model registries, evaluation pipelines, online inference. What’s worth adopting, what’s reinventing things SREs solved a decade ago, what’s genuinely new.
What we don’t publish:
- Vendor-sponsored “thought leadership”
- “Top 10 MLOps tools” listicles
- Anything we couldn’t show running in production
Pseudonymous bylines. Tips and corrections to the editor.
Real content starts shortly.
ML Monitoring Report — in your inbox
Production ML monitoring, drift, and reliability. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Data Drift Detection in Machine Learning: Methods, Tests, and Production Practice
A practical guide to data drift detection in machine learning: statistical tests, detection architectures, threshold tuning, and when to trigger retraining in production.
ML Model Monitoring Best Practices for Production Systems
A practitioner's guide to ML model monitoring best practices: drift detection, metric selection, alerting architecture, and retraining triggers for models running in production.

Silent Quality Decay in Production LLM Apps: How to Detect Drift Before Users Do
Your eval scores are green. Customer complaints are up. The gap between offline metrics and production reality is the biggest reliability problem in LLM ops — here's how to close it.