Research Notes
Notes from the lab: research, builds, and what we’re learning in production.
- Jun 30, 2026 The future of AI runs on a context graph. Three principles we learned building an org-wide memory, and why we don't build it the way most people do.
- Jun 10, 2026 DSpark: speculative decoding that pays off. DeepSeek bolts a lightweight sequential head onto a parallel drafter and makes draft length adaptive, so bigger blocks stop hurting throughput under load.
- May 28, 2026 Attention over the layers. Residual connections haven't changed since 2016. Kimi replaces fixed accumulation with learned attention over depth, and claws back a quarter of the compute.
- May 20, 2026 MSA: retrieving thought, not text. Three ways to give an LLM memory, all broken. Memory Sparse Attention is a fourth, it retrieves the model's own internal representations instead of text chunks.
- Apr 23, 2026 Designing agents around the cache. Cache hit rate, more than model choice or prompt quality, is what separates agents that run sustainably in production from ones that burn cash. How the mechanism works, what silently breaks it, and how we built Sketch around it.