Slate Hunting, Automated | Jason Peterson

Every shot in a film begins or ends with a slate. The classic clapboard, or a digital burn-in card. It carries the shot name, the take number, and the timecode. The timecode is the one that matters. Dailies operators use it to sync sound to picture. Lose the slate, lose the sync.

Finding the slate isn’t hard. Finding it across thousands of clips per project, every project, every week, is the part that adds up. Slates show up anywhere in the clip: deep into the head, at the tail, sometimes inverted on tail slates. Dailies operators call it “slate hunting.” Minutes per scene, scaled across an entire production.

I built a tool that does it automatically. The interesting parts are the constraint that shaped the whole design, the three-stage pipeline that came out of it, and where the training data came from.

The constraint that shaped everything

Footage for unreleased films is confidential. Studios won’t put dailies through a cloud service for inference. The detector had to run on-device, on the dailies operator’s existing machine, sharing the GPU with the colour grading platform they were already running.

That single requirement ruled out most of the obvious modern choices. No transformer, no zero-shot vision-language model, no API call. Inference had to fit alongside the host application without taking the GPU it was using, and per-frame latency had to be low enough that scanning a clip didn’t feel like a wait.

I designed for the constraint: small models, on-device, sub-50ms per frame, under a gigabyte of VRAM each.

Three stages, not one model

The job has three distinct sub-problems, and trying to do them with one network would have meant a bigger, slower, more opaque thing. Three small specialists work better.

1. Object detection. A MobileNet V2 finds slates in each frame, and detects the timecode digits within them.

2. Multi-frame clap search. A slate can be in two states: clapper open (raised), or shut (snapped closed). The “clap” — the moment the sticks meet — is the exact frame the audio team needs for sync. The detector also classifies open vs shut, so I can walk the candidate frames in a clip and identify the precise clap frame, not just any frame containing a slate.

3. Timecode OCR. A second MobileNet V2 reads the timecode digits off the clap frame. Error correction (sanity-checking against timecode continuity rules) is in progress.

The clean job boundaries between the three stages mean failures are debuggable — a bad bounding box looks different from a missed open/shut classification, which looks different from garbled OCR. And each stage improves on its own training schedule without entangling the others.

Where the training data came from (it didn’t exist)

Here’s the part I learned the most from. There is no public dataset of film slates. Even if there were, real slates vary enormously: handwritten clapboards, digital burn-ins, different fonts, layouts, colours, motion blur, glare, lens characteristics. Hand-labelling thousands of examples wasn’t viable as a side project.

I synthesised them.

The pipeline ran in two stages.

Blender generated each slate as a 3D object. Different shapes (rectangle clapboard, modern digital burn-in card), different fonts, randomised values for shot name, take number, timecode. Procedural variation across hundreds of combinations of layout, colour, font, type style — and crucially, both clapper states. Output: clean rendered slates with perfect ground-truth labels — known bounding boxes, known timecode strings, known clapper state.

ComfyUI then ran each clean render through a diffusion pipeline that added the texture of real footage: motion blur, lens grain, lighting variation, sensor noise, slightly off-axis angles, partial occlusion. The diffusion model knew nothing about slates as a concept. It just knew what film footage looks like, and it applied that texture to the Blender renders.

The result was thousands of training images that looked like real captured footage, every one of them with ground-truth labels attached automatically. Real footage was held out and used only for evaluation — never for training.

The detector generalises to real slates it has never seen. The OCR model reads timecodes off them. That generalisation is the whole point of doing it this way.

The system in action

The video shows detection on standard smart slates at the head of a clip, detection on an inverted tail slate, and the debug overlays during inference.

The numbers

Inference latency: ~45ms per frame, detection + OCR combined
VRAM: 800MB per model, fits alongside the host colour grading application without contention
Training time: 3–4 hours on a single GPU
Training data: synthetic, generated in batches as needed
Real footage: evaluation only

How it lives in the host system

The detector runs inside Flexi, FilmLight’s ML plugin for Baselight. Flexi loads the MobileNet models directly into the timeline, so detection runs per-frame in the same pipeline as the rest of the grade. Python via FLAPI handles aggregate frame data analysis, the UI surface, and orchestration across the three stages. The result is a tool that lives inside the dailies operator’s existing workflow rather than as a separate app to context-switch into.

What changed for the operator

I sent v0.1.0 to a dailies operator at a post house for testing. The note that came back:

I have been running slate detect v0.1.0 through its paces and I have to say I am extremely impressed. As is, I can see this tool offering tremendous value to a Dailies operator, eliminating much of the “slate hunting” that has to be done. This is one of the first things I’ve worked on in a while that feels really bleeding edge.

“Slate hunting” went from minutes per scene to seconds across a whole shot group.

What’s next

A CoreML port is in progress. Apple Silicon’s neural engine should bring inference latency down further and free the discrete GPU entirely for the host application. OCR error correction lands soon after.

The lesson I’m taking forward: when the data doesn’t exist, build it. When the constraint says small, design for it. Several specialists with clean job boundaries usually beat one generalist with a fuzzy one.