The system that survives 200 concurrent students.
One year. Three architectures. A few near-collapses on the way to one that holds.
Masters' Union · Production
There is a particular kind of failure that only reveals itself under real load.
The architecture looked clean on a whiteboard. The tests passed. The demo went smoothly. Then five students joined an exam at once — and the CPU pinned at one hundred percent, and stayed.
▍ The opening
How it routes
An SFU, not a transcoder.
01Clients
Each student's browser produces a WebRTC track and runs its own behavioral inference locally. The server never sees raw frames it has to analyze.
02SFU router
Mediasoup forwards encoded RTP, never decodes it. Per-stream cost stays roughly flat as concurrency grows — O(1) on the hot path.
03Browser inference
MediaPipe Tasks (WASM) runs in a student-side worker thread. Suspicion scoring stays under the 2–3 second budget for live alerts.
04Server role
Validates evidence frames, deduplicates events, fans out scores to admin dashboards over Socket.IO. The server is a referee, not an inference worker.
The cliff
Then five students joined an exam at the same time.
Five users. That was the cliff.
Not slow. Wrong. The fix wasn't optimization — it was moving inference off the server entirely.
In code
Two or three lines that quietly do the heavy lifting.
01 / frame_loop.py
Why it broke: each analyzed frame blocks the event loop for 50–140ms. At five concurrent streams there isn't enough wall-clock left in a second to do the work.
frame_loop.py
python
# get_comprehensive_analysis() is BLOCKING — 50–140ms per frame.# At 5 concurrent streams, the event loop runs out of time. while not stop_token.is_set(): frame = await track.recv() bgr = _video_frame_to_bgr(frame) analysis = await asyncio.to_thread( engine.get_comprehensive_analysis, bgr )02 / scoreStore.ts
How the admin dashboard survived 80 score updates per second: scores live outside React state. Only the one tile subscribed to a given student re-renders.
scoreStore.ts
typescript
// External store — writes do not trigger React re-renders.const useStudentScore = (studentId: string) => { const subscribe = useCallback( (cb) => scoreStore.subscribeForId(studentId, cb), [studentId], ); const getSnapshot = () => scoreStore.getScoreForId(studentId); // Only re-renders when THIS student's score changes. return useSyncExternalStore(subscribe, getSnapshot, getSnapshot);};What changed
Then this. Now this.
Before
Naïve architecture · 1 server doing everything
Max concurrent users
5server CPU pinned at 100%
End-to-end score latency
820msnaive server-side path
Re-renders per score update
200every tile in the grid
After
SFU · client inference · O(1) per stream
Concurrent users supported
200+server CPU roughly flat
End-to-end score latency
110ms87% reduction
Re-renders per score update
1subscribed tile only
Selected lessons
What the system taught me on the way to holding 200 students.
- Lesson 01
Co-location debt is invisible until load arrives.
Running the AI worker and the media server on the same instance was fine in development. The architecture had a tripwire we'd never set — by the time concurrency revealed it, the system was already on fire.
- Lesson 02
Moving compute to the client is architectural, not just an optimisation.
Shifting inference into the browser didn't just lift CPU off the server — it changed the trust boundary. Every downstream component (evidence, validation, scoring) had to be reasoned about again. Performance was the symptom; topology was the fix.
- Lesson 03
React's state model is not designed for 80 updates per second.
useState and useReducer are built for user-driven events. A live proctoring score feed is a continuous stream. useSyncExternalStore exists for exactly this — high-frequency writes that should bypass the render tree entirely.
Three of nine. The rest — WebRTC lifecycle, production-vs-design numbers, real-time failure modes— live in the blog.
Read the full pieceThe full piece
Want the unabridged version?
The unabridged 24-minute deep dive on GitHub Pages — three more lessons, the WebRTC race conditions, the dashboard refactor in detail.
Read the full piece