Topics

Browse posts by category and tag — every topic we cover, with the latest pieces under each.

Categories

monitoring 7 posts

Embedding Store Reliability: What to Monitor Beyond Recall@k

Vector indexes fail differently than relational stores. The recall, version-coverage, and drift metrics that catch silent embedding-store decay before users do.
Data, Concept, and Prediction Drift: A Decision Framework

The three drift types fail differently and demand different monitors. A practical framework for telling data drift from concept drift from prediction
Monitoring Models When Ground Truth Is Late or Never Arrives

Delayed labels are the defining hard problem of ML monitoring. Strategies for the blind period between prediction and ground truth — proxy signals
Choosing Monitoring Metrics: PSI, KS, and Calibration

PSI, the KS test, and calibration error answer different questions about a model in production. A practical guide to which metric to reach for, what each
Monitoring Tabular Models vs LLM Systems: What Transfers

Drift detection, SLOs, and metric selection were built for tabular models. Some of it carries directly to LLM systems, some of it breaks, and some has no
Training-Serving Skew: The Failure That Drift Detection Misses

Your data isn't drifting and your model is still wrong. Training-serving skew is a distinct production failure mode that input-drift monitors do not catch

practices 2 posts

ops 1 posts

Silent Quality Decay in Production LLM Apps: Detecting Drift

Your eval scores are green. Customer complaints are up. The gap between offline metrics and production reality is the biggest reliability problem in LLM

Tools 1 posts

Best ML Model Monitoring Tools 2026: A Practitioner's Comparison

Arize, Evidently AI, WhyLabs, Fiddler, W&B, and Prometheus stacked against real production requirements — drift detection, latency tracking, LLM