Shashank Dhiman
Selected workCase study · 01

The system that survives 200 concurrent students.

One year. Three architectures. A few near-collapses on the way to one that holds.

Masters' Union · Production

Scroll

There is a particular kind of failure that only reveals itself under real load.

The architecture looked clean on a whiteboard. The tests passed. The demo went smoothly. Then five students joined an exam at once — and the CPU pinned at one hundred percent, and stayed.

The opening

How it routes

An SFU, not a transcoder.

200+ studentsSFUmediasoup · rtp forwardno server decode · O(1) cpu/streamADMIN<200ms alertsREDISpub/sub · presencePOSTGREScomposite idxcritical path
  1. 01Clients

    Each student's browser produces a WebRTC track and runs its own behavioral inference locally. The server never sees raw frames it has to analyze.

  2. 02SFU router

    Mediasoup forwards encoded RTP, never decodes it. Per-stream cost stays roughly flat as concurrency grows — O(1) on the hot path.

  3. 03Browser inference

    MediaPipe Tasks (WASM) runs in a student-side worker thread. Suspicion scoring stays under the 2–3 second budget for live alerts.

  4. 04Server role

    Validates evidence frames, deduplicates events, fans out scores to admin dashboards over Socket.IO. The server is a referee, not an inference worker.

The cliff

Then five students joined an exam at the same time.

Users1

Five users. That was the cliff.

Not slow. Wrong. The fix wasn't optimization — it was moving inference off the server entirely.

CPU5%

In code

Two or three lines that quietly do the heavy lifting.

01 / frame_loop.py

Why it broke: each analyzed frame blocks the event loop for 50–140ms. At five concurrent streams there isn't enough wall-clock left in a second to do the work.

frame_loop.py

python

# get_comprehensive_analysis() is BLOCKING — 50–140ms per frame.# At 5 concurrent streams, the event loop runs out of time. while not stop_token.is_set():    frame = await track.recv()    bgr = _video_frame_to_bgr(frame)     analysis = await asyncio.to_thread(        engine.get_comprehensive_analysis, bgr    )

02 / scoreStore.ts

How the admin dashboard survived 80 score updates per second: scores live outside React state. Only the one tile subscribed to a given student re-renders.

scoreStore.ts

typescript

// External store — writes do not trigger React re-renders.const useStudentScore = (studentId: string) => {  const subscribe = useCallback(    (cb) => scoreStore.subscribeForId(studentId, cb),    [studentId],  );  const getSnapshot = () => scoreStore.getScoreForId(studentId);   // Only re-renders when THIS student's score changes.  return useSyncExternalStore(subscribe, getSnapshot, getSnapshot);};

What changed

Then this. Now this.

Before

Naïve architecture · 1 server doing everything

  • Max concurrent users

    5

    server CPU pinned at 100%

  • End-to-end score latency

    820ms

    naive server-side path

  • Re-renders per score update

    200

    every tile in the grid

After

SFU · client inference · O(1) per stream

  • Concurrent users supported

    200+

    server CPU roughly flat

  • End-to-end score latency

    110ms

    87% reduction

  • Re-renders per score update

    1

    subscribed tile only

Selected lessons

What the system taught me on the way to holding 200 students.

  1. Lesson 01

    Co-location debt is invisible until load arrives.

    Running the AI worker and the media server on the same instance was fine in development. The architecture had a tripwire we'd never set — by the time concurrency revealed it, the system was already on fire.

  2. Lesson 02

    Moving compute to the client is architectural, not just an optimisation.

    Shifting inference into the browser didn't just lift CPU off the server — it changed the trust boundary. Every downstream component (evidence, validation, scoring) had to be reasoned about again. Performance was the symptom; topology was the fix.

  3. Lesson 03

    React's state model is not designed for 80 updates per second.

    useState and useReducer are built for user-driven events. A live proctoring score feed is a continuous stream. useSyncExternalStore exists for exactly this — high-frequency writes that should bypass the render tree entirely.

Three of nine. The rest — WebRTC lifecycle, production-vs-design numbers, real-time failure modes— live in the blog.

Read the full piece

The full piece

Want the unabridged version?

The unabridged 24-minute deep dive on GitHub Pages — three more lessons, the WebRTC race conditions, the dashboard refactor in detail.

Read the full piece

Up next

Real-Time Messaging

Up next in the queue · case study soon

Back to selected work