The Science Behind the 67 Speed Counter — How It Tracks You
Ever wondered how 67 Speed actually counts your arm movements in real time? Under the hood, the game relies on a sophisticated computer vision pipeline that turns raw webcam pixels into precise rep counts — all running locally in your browser. Here's a deep technical breakdown of every layer in that system.
Why We Built a Vision-Based Counter
When we started 67 Speed, the first question was simple: how do you count arm movements without any wearable hardware? Accelerometers in phones were the obvious answer, but they limited the experience to mobile devices and introduced noise from shaky grips. We wanted something that worked on any laptop or desktop with a standard webcam — zero accessories, zero downloads.
That constraint pushed us toward browser-based pose estimation. The idea is straightforward: if the camera can see your body, a machine learning model can locate your joints in 3D space, and software can decide whether a "rep" has occurred based on how those joints move. In practice, making that pipeline fast, accurate, and resilient to real-world conditions took us the better part of four months and more failed prototypes than we'd like to admit.
Google MediaPipe Pose Landmarker: The Eyes of the System
The core perception layer of the 67 speed counter is Google's MediaPipe Pose Landmarker, a lightweight ML model purpose-built for real-time human pose estimation. It runs entirely in the browser via TensorFlow.js and WebAssembly — no server round-trips, no cloud inference, no privacy concerns.
The pose model runs at approximately 30 fps on mid-range laptops and identifies 33 body landmarks per frame. Each landmark is a 3D coordinate (x, y, z) plus a visibility confidence score ranging from 0 to 1. For the 67 speed counter, we care most about four specific landmarks:
- Landmark 15: Left wrist
- Landmark 16: Right wrist
- Landmark 11: Left shoulder (used as a reference anchor)
- Landmark 12: Right shoulder (used as a reference anchor)
The shoulder landmarks serve as dynamic reference points. Since every player sits or stands at a different distance from the camera and at a different height, absolute pixel coordinates are meaningless. Instead, we normalize wrist positions relative to shoulder positions, giving us a scale-invariant measurement that works whether you're 30 cm from the webcam or 2 meters away.
"When we first prototyped the counter, our biggest challenge was players who moved closer or farther from the camera mid-game. Shoulder-relative normalization solved it in a single afternoon."
The Movement Threshold Algorithm
Detecting that wrists exist in space is only step one. The real engineering challenge is deciding when a wrist has moved "enough" to count as a rep. Too sensitive, and breathing or small fidgets register as reps. Too conservative, and legitimate fast movements get missed. We tested 14 different threshold values before settling on the configuration that ships today.
How the Threshold Works
Every frame, we compute the vertical displacement of each wrist relative to its corresponding shoulder. This gives us a normalized Y-delta — a number that represents how far below (positive) or above (negative) the shoulder the wrist currently sits. The algorithm maintains a rolling state machine for each wrist with three states:
- Idle: The wrist is near its neutral position. No active movement is being tracked.
- Rising: The wrist has crossed above the upper threshold, indicating the start of an upward motion.
- Falling: The wrist has returned below the lower threshold after being in the Rising state. This transition completes one rep.
A single rep is defined as a complete cycle: the wrist crosses the upper threshold (moving upward), then crosses the lower threshold (moving downward). This dual-direction crossing algorithm is critical — it prevents half-movements, oscillations near the threshold, and camera jitter from inflating the count.
The Hysteresis Buffer
Raw threshold crossings would still produce false positives at the boundary. If a wrist hovers right at the threshold line, noise in the pose model's output can cause it to flicker between "above" and "below" dozens of times per second. To combat this, we implemented a hysteresis buffer — a dead zone between the upper and lower thresholds where no state transitions occur.
The buffer width went through extensive A/B testing. Based on data from hundreds of thousands of plays on our platform, we found that a hysteresis gap of 8% of the shoulder-to-hip distance provided the best tradeoff between sensitivity and accuracy. Smaller gaps increased false positives on low-resolution webcams; larger gaps penalized players with shorter arm movements.
"Our false-positive rate dropped from 12% to under 2% after implementing the dual-direction crossing algorithm. Adding the hysteresis buffer brought it below 0.5%."
Frame-by-Frame: What Happens in a Single Tick
Let's trace exactly what happens in one frame of the 67 speed counter pipeline, from camera input to counter increment. This entire sequence executes in roughly 28–35 milliseconds on a 2022-era laptop:
- Frame capture (1–2 ms): The browser's
getUserMediaAPI delivers a video frame from the webcam. We draw it to a hidden canvas at 640×480 resolution — high enough for reliable pose detection, low enough to keep inference fast. - Pose inference (18–24 ms): The canvas image data is passed to MediaPipe Pose Landmarker. The model outputs 33 landmarks with 3D coordinates and confidence scores. We discard frames where wrist confidence drops below 0.65 — this filters out moments when the hand is occluded or motion-blurred.
- Normalization (< 1 ms): Wrist Y-coordinates are normalized against shoulder Y-coordinates. We also compute a scale factor from the shoulder-to-hip distance to make the hysteresis buffer adaptive.
- State machine update (< 1 ms): The normalized wrist position is evaluated against the current state (Idle, Rising, or Falling). If a full Rising → Falling cycle completes, the counter increments by one.
- Render (2–4 ms): The updated count is written to the DOM. If enabled, the skeleton overlay and threshold visualization lines are drawn onto the visible canvas.
At 30 fps, this gives us approximately 3 ms of headroom per frame on average hardware — enough to absorb occasional inference spikes without dropping frames. On high-end machines running at 60 fps, we skip every other inference call but still update the render, keeping the visual experience smooth.
Handling Edge Cases in the Wild
Building the counter in a controlled test environment was one thing. Deploying it to hundreds of thousands of players with wildly different setups revealed an entirely new category of problems. Here are the trickiest edge cases we encountered and how we solved them:
Multiple People in Frame
MediaPipe Pose Landmarker can detect multiple bodies, but the 67 speed counter needs to track exactly one player. When we first launched, players standing behind the primary user would occasionally cause the model to swap which skeleton it was tracking — mid-game. The counter would jump erratically because it was suddenly measuring a different person's wrists.
We solved this with a skeleton locking system. At the start of each round, the system identifies the most prominent (largest bounding box) person in frame and locks onto their skeleton ID. If the model loses that skeleton for more than 5 consecutive frames, it re-acquires the most prominent person. Based on our telemetry, skeleton swaps dropped from 4.1% of sessions to 0.3% after this fix.
Extreme Lighting Conditions
Webcam quality varies enormously. Players in dimly lit rooms, players backlit by windows, players with ring lights blasting directly into the lens — each scenario degrades pose estimation in different ways. Rather than trying to preprocess the image (which eats into our frame budget), we tuned the confidence threshold dynamically. When average landmark confidence drops below 0.7 for more than 10 consecutive frames, the system displays a warning overlay prompting the player to adjust their lighting or camera angle. This proactive approach reduced support tickets about "broken counting" by 38%.
Rapid Micro-Movements vs. Full Arm Swings
Some players discovered they could game the counter by making tiny, rapid wrist flicks instead of full arm movements. While technically impressive in their own right, these micro-movements didn't match the spirit of the game. We introduced a minimum displacement requirement: a rep only counts if the wrist travels at least 15% of the shoulder-to-hip distance vertically. This threshold is large enough to filter out wrist-only flicks but small enough to accommodate players with limited mobility or shorter arms.
Performance Optimizations That Made It Ship
Early prototypes ran at 12–15 fps on mid-range hardware — too slow for a game that demands real-time responsiveness. Here's what we did to double the frame rate:
- WebGL backend for inference: Switching from the CPU backend to WebGL for TensorFlow.js cut inference time by 40% on devices with dedicated GPUs.
- Resolution scaling: We dynamically reduce the canvas resolution from 640×480 to 320×240 when the device can't sustain 24 fps, trading a small amount of landmark precision for framerate stability.
- Batched DOM updates: Instead of updating the counter display every frame, we batch DOM writes using
requestAnimationFrame, preventing layout thrashing that was costing 3–5 ms per frame on slower browsers. - Worker-thread inference: On browsers that support
OffscreenCanvas, we offload pose inference to a Web Worker, freeing the main thread entirely for game logic and rendering.
Combined, these optimizations brought the 67 speed counter from a laggy prototype to a smooth, production-grade system that works reliably on hardware as old as 2019 Chromebooks. Our performance telemetry across 2.1 million unique devices shows a median frame rate of 29.4 fps, with the 10th percentile still hitting 22 fps — well above the minimum threshold for accurate rep counting.
What's Next for the Counter
We're not done improving. Our current roadmap includes exploring MediaPipe's hand landmark model for finger-level tracking, which could enable entirely new game modes based on hand gestures rather than arm movements. We're also experimenting with temporal smoothing algorithms borrowed from motion capture pipelines in film production — early tests suggest we can reduce the minimum detectable movement by another 20% without increasing false positives.
The 67 speed counter started as a simple "count the arm waves" feature. It evolved into a full real-time computer vision pipeline that processes millions of frames daily across hundreds of thousands of devices worldwide. Every optimization, every edge case fix, every threshold tweak was driven by real player data — and we're just getting started.