Proof of concept: programmatic detection and replacement of on-screen English text in Khan-style blackboard videos, enabling localization at scale. Text appears at realistic angles — because Sal doesn't write in straight lines. GitHub repo
This demo simulates a three-phase pipeline for localizing Khan Academy video text:
Phase 1 — Original video plays. Text appears on the blackboard at various angles, mimicking Sal Khan's handwriting style.
Phase 2 — Frames are scanned backwards (seeing complete text first makes OCR reliable). Each text region is detected, timestamped, and translated.
Phase 3 — A black overlay covers each English text region (diagonal reveal from top-left), and the translated text is written in at the same angle as the original.
The actual pipeline uses OpenCV + Tesseract OCR for detection, and Remotion (React for video) or Python/PIL for rendering. This is a concept demo — no video processing happens in the browser.
Combined with AI dubbing (voice clone or TTS), this could reduce per-video localization effort from ~1 hour to ~15 minutes of review.