Shashank Dhiman

Now

updated · May 2026

What I'm on, right now.

A live page. Reads more like a notebook than a portfolio. Updated when the answer to “what are you working on” actually changes.

01 / Reading

Distributed systems, slowly.

Working through Designing Data-Intensive Applications — Kleppmann. The chapter on replication is the one I keep coming back to.
Skimming recent LLM-serving papers when they cross my feed — interested in the queuing and batching side, less the model architecture side.

02 / Building

Small experiments around inference.

A toy RAG harness over my own blog content. Mostly an excuse to think about chunking and vector recall at small scale.
Sharpening LeetCode mediums in the background — system-design interviews are where the next conversation lives.

03 / Curious about

Open questions on my desk.

How do LLM inference systems handle backpressure when the queue saturates and the user is still typing?
At what concurrency does client-side inference stop being cheaper than centralised? The proctoring rebuild answered it for one workload — is there a general shape?

Reach out

If you're working on something in this space, I'd like to hear about it.

Get in touch Selected work