Shashank Dhiman

Now

updated · May 2026

What I'm on, right now.

A live page. Reads more like a notebook than a portfolio. Updated when the answer to “what are you working on” actually changes.

01 / Reading

Distributed systems, slowly.

  • Working through Designing Data-Intensive Applications — Kleppmann. The chapter on replication is the one I keep coming back to.
  • Skimming recent LLM-serving papers when they cross my feed — interested in the queuing and batching side, less the model architecture side.
02 / Building

Small experiments around inference.

  • A toy RAG harness over my own blog content. Mostly an excuse to think about chunking and vector recall at small scale.
  • Sharpening LeetCode mediums in the background — system-design interviews are where the next conversation lives.
03 / Curious about

Open questions on my desk.

  • How do LLM inference systems handle backpressure when the queue saturates and the user is still typing?
  • At what concurrency does client-side inference stop being cheaper than centralised? The proctoring rebuild answered it for one workload — is there a general shape?

Reach out

If you're working on something in this space, I'd like to hear about it.