Shashank Dhiman

Engineering Log

Problems worth writing down.

Production breaks in ways tests never predicted. Each entry below is one investigation — the clue that surfaced it, the root cause once the dust settled, and what shipped to make it not happen again.

1 entry · newest first

P2aws s3 · mediarecorder · node

Hard reload. Recording gone forever

Proctoring video streamed directly into a single S3 multipart upload. Submit called CompleteMultipartUpload. Any browser crash or hard reload before submit left every uploaded part permanently orphaned — no assembled object, no recovery path.

Evidence

$ aws s3api list-multipart-uploads --bucket proctoring-prod

UploadId : a3f9...c2d1 State : in-progress

Parts : 14 uploaded

Object : <does not exist — 404>

Root cause

A multipart upload only becomes a readable S3 object after CompleteMultipartUpload fires. Until then the parts exist in S3 storage but produce a 404 on any GET. One missed event = permanent data loss with no server-side recovery hook.

Fix

Replaced the continuous stream with independent 30-second segments, each uploaded as a standalone PutObject — a complete, immediately readable file the moment it lands. A BullMQ job runs 10 minutes after quiz end time and merges whatever segments arrived, so even a full browser crash leaves an admin-viewable recording.

Notes — long-form

Pieces I've sat with long enough to argue for. Most live on GitHub Pages while the publishing setup is intentionally simple.

01 Sept 202524 min readFeatured

How we built a real-time AI proctoring system — and watched it collapse at five users.

There is a particular kind of failure that only reveals itself under real load.

real-timewebrtcarchitecture
Read on github pages

More notes in draft — published when they say something I actually believe.

More gets added as more breaks — and gets put back together.

Back home