The system that survives 200 concurrent students.

One year. Three architectures. A few near-collapses on the way to one that holds.

Masters' Union · EdTech · Production

▍ Scroll

▍ The opening

There is a particular kind of failure that only reveals itself under real load.

The architecture looked clean on a whiteboard. The tests passed. The demo went smoothly. Then five students joined an exam at once — and the CPU pinned at one hundred percent, and stayed.

How it routes

An SFU, not a transcoder.

add load~6 concurrent streams · cpu stays flat (o(1) forward)

FIG. — System architecture · hover a node to inspect

01Clients
Each student's browser produces a WebRTC track and runs its own behavioral inference locally. The server never sees raw frames it has to analyze.
02SFU router
Mediasoup forwards encoded RTP, never decodes it. Per-stream cost stays roughly flat as concurrency grows — O(1) on the hot path.
03Browser inference
MediaPipe Tasks (WASM) runs in a student-side worker thread. Suspicion scoring stays under the 2–3 second budget for live alerts.
04Server role
Validates evidence frames, deduplicates events, fans out scores to admin dashboards over Socket.IO. The server is a referee, not an inference worker.

The cliff

Then five students joined an exam at the same time.

Users1

Five users. That was the cliff.

Not slow. Wrong. The fix wasn't optimization — it was moving inference off the server entirely.

CPU5%

In code

Two or three lines that quietly do the heavy lifting.

01 / frame_loop.py

Why it broke: each analyzed frame blocks the event loop for 50–140ms. At five concurrent streams there isn't enough wall-clock left in a second to do the work.

frame_loop.py

python

# get_comprehensive_analysis() is BLOCKING — 50–140ms per frame.# At 5 concurrent streams, the event loop runs out of time. while not stop_token.is_set():    frame = await track.recv()    bgr = _video_frame_to_bgr(frame)     analysis = await asyncio.to_thread(        engine.get_comprehensive_analysis, bgr    )

02 / scoreStore.ts

How the admin dashboard survived 80 score updates per second: scores live outside React state. Only the one tile subscribed to a given student re-renders.

scoreStore.ts

typescript

// External store — writes do not trigger React re-renders.const useStudentScore = (studentId: string) => {  const subscribe = useCallback(    (cb) => scoreStore.subscribeForId(studentId, cb),    [studentId],  );  const getSnapshot = () => scoreStore.getScoreForId(studentId);   // Only re-renders when THIS student's score changes.  return useSyncExternalStore(subscribe, getSnapshot, getSnapshot);};

▍

LLMs as infrastructure.

Not the chat UI — the plumbing. Model calls treated like any other unreliable network dependency: routed, rate-limited, retried, and budgeted.

01
OpenRouter as a routing layer
Third-party LLM APIs integrated as infrastructure across internal services — provider routing, fallback models, and rate-limit handling live in one place instead of scattered fetch calls.
openrouter · llm api · node
02
AI session summaries from raw events
Post-session proctoring reports generated from structured event streams — gaze shifts, tab switches, alert timestamps — so a reviewer reads one summary instead of replaying an hour of footage.
prompt over events · postgres · openrouter
03
Behavioral classification at the edge
MediaPipe eye-gaze and tab-switch inference runs in client worker threads, with multi-signal correlation preventing false escalations. Moving the models off the server bought +35% proctoring throughput.
mediapipe · web workers · +35% throughput

What changed

Then this. Now this.

Before

Naïve architecture · 1 server doing everything

Max concurrent users
5
server CPU pinned at 100%
End-to-end score latency
820ms
naive server-side path
Re-renders per score update
200
every tile in the grid

After

SFU · client inference · O(1) per stream

Concurrent users supported
200+
server CPU roughly flat
End-to-end score latency
110ms
87% reduction
Re-renders per score update
1
subscribed tile only

Selected lessons

What the system taught me on the way to holding 200 students.

Lesson 01
Co-location debt is invisible until load arrives.
Running the AI worker and the media server on the same instance was fine in development. The architecture had a tripwire we'd never set — by the time concurrency revealed it, the system was already on fire.
Lesson 02
Moving compute to the client is architectural, not just an optimisation.
Shifting inference into the browser didn't just lift CPU off the server — it changed the trust boundary. Every downstream component (evidence, validation, scoring) had to be reasoned about again. Performance was the symptom; topology was the fix.
Lesson 03
React's state model is not designed for 80 updates per second.
useState and useReducer are built for user-driven events. A live proctoring score feed is a continuous stream. useSyncExternalStore exists for exactly this — high-frequency writes that should bypass the render tree entirely.

Three of nine. The rest — WebRTC lifecycle, production-vs-design numbers, real-time failure modes— live in the blog.

Read the full piece

The full piece

Want the unabridged version?

The unabridged 24-minute deep dive on GitHub Pages — three more lessons, the WebRTC race conditions, the dashboard refactor in detail.

Read the full piece

Up next

Doubt & Discussion

Next case study · four chat types, one socket layer

Read the case study

The system that survives 200 concurrent students.

An SFU, not a transcoder.

Five users. That was the cliff.

Two or three lines that quietly do the heavy lifting.

LLMs as infrastructure.

OpenRouter as a routing layer

AI session summaries from raw events

Behavioral classification at the edge

Then this. Now this.

What the system taught me on the way to holding 200 students.

Co-location debt is invisible until load arrives.

Moving compute to the client is architectural, not just an optimisation.

React's state model is not designed for 80 updates per second.

Want the unabridged version?

Doubt & Discussion