← All posts

How LekSync Achieves Sub-100ms Audio Sync Over Wi-Fi

How LekSync Achieves Sub-100ms Audio Sync Over Wi-Fi

Synchronizing audio across multiple devices sounds simpler than it is. Play the same file on two phones and they'll drift — each phone's clock ticks at a slightly different rate, network packets arrive at slightly different times, and audio output buffers add variable latency. Without active correction, two phones playing the same track will be seconds apart within minutes.

LekSync keeps all receivers within 100 milliseconds of each other continuously. Here's the technical implementation that makes this possible.

Step 1: Decoding Source Audio to PCM

When the host selects a track, LekSync doesn't stream the compressed audio file (MP3, AAC, FLAC) directly to receivers. Instead, it decodes the source file to raw PCM (Pulse-Code Modulation) — the uncompressed digital audio representation — using Android's MediaCodec API.

PCM is the format that audio hardware actually plays. Decoding at the source rather than at each receiver means:

  • Every receiver gets identical audio data, not slightly differently-decoded versions of a compressed file.
  • The host controls the decode quality — receivers don't need to handle codec-specific decoding.
  • Timing is consistent — PCM samples have a fixed, known playback duration, which makes synchronization math precise.

Step 2: Framing the PCM Stream

Raw PCM is then divided into fixed-size frames by LekSync's framing layer. Each frame contains:

  • A sequence number (so receivers can detect and handle packet gaps)
  • A timestamp indicating when these samples should play (relative to session start)
  • The raw PCM sample data

Framing serves two purposes: it gives each chunk of audio a precise playback target time, and it allows the receiver to detect when a frame is missing (sequence gap) and handle it gracefully — typically by playing silence for that window rather than stalling the stream.

Step 3: UDP Transmission Over the Local Network

Framed PCM packets are sent over UDP on port 5000. UDP is a connectionless, fire-and-forget protocol — it sends packets without waiting for acknowledgment or retransmitting dropped ones.

For real-time audio this is the correct choice. The alternative — TCP — guarantees delivery by retransmitting dropped packets. But retransmission introduces variable latency: the stream stalls while waiting for the missing packet to be resent and received. For a human listening to music, a 50ms stream stall is far more disruptive than a 4ms gap of silence (one dropped UDP packet).

On a reliable local Wi-Fi network (which a phone hotspot provides), UDP packet loss is typically under 0.1%. The framing layer's gap-handling makes the occasional dropped packet inaudible.

Step 4: Receiver-Side Jitter Buffer

Even on a local Wi-Fi network, packets don't arrive at perfectly regular intervals. Network jitter — variation in packet arrival timing — can be 5–30ms depending on network load and the host phone's transmit scheduler.

Each receiver maintains a small jitter buffer: a queue of incoming audio frames held for a brief window before playback. The buffer smooths out arrival timing variations. If packets arrive slightly early, they wait in the buffer. If a packet arrives slightly late, it may miss its playback window — in which case the framing sequence number helps the buffer skip forward cleanly.

The jitter buffer introduces a small fixed latency (typically 50–80ms), but this is consistent across all receivers — so it doesn't create sync offset between devices.

Step 5: Position Sync Protocol

Even with precise framing and UDP transmission, devices drift over time because of slight clock differences between phones. Android's system clock is not perfectly synchronized across devices — one phone's millisecond timer may run slightly faster than another's.

LekSync's position sync protocol corrects for this continuously. The host periodically broadcasts the current playback position (in milliseconds from session start) to all receivers. Each receiver:

  1. Receives the position broadcast.
  2. Compares the host position to its own local playback position.
  3. If it is behind: accelerates audio playback slightly (imperceptibly, ~2–5% speed adjustment for a few hundred milliseconds) until it catches up.
  4. If it is ahead: pauses audio briefly (for a few tens of milliseconds) to let the host position catch up to it.

These micro-corrections happen continuously throughout the session. Because each correction is small (usually under 50ms), the adjustment is inaudible. The result is that drift never accumulates — receivers stay within a tight window of the host's position indefinitely.

When a receiver reconnects after a disconnection, a full position resync is triggered immediately rather than waiting for the next periodic broadcast — this snaps the device back into sync within one correction cycle.

Online Rooms: WebRTC + Opus

For sessions where participants aren't on the same local network (LekSync's Online Rooms feature), the architecture shifts to WebRTC with the Opus audio codec.

WebRTC is a peer-to-peer protocol: once the connection is established, audio travels directly from the host to each receiver without passing through a server. This keeps latency lower than cloud-relay architectures. The Opus codec is designed specifically for real-time audio — it has low algorithmic delay (typically 2.5–20ms) and graceful packet-loss handling built in.

Online rooms achieve slightly higher sync offset than local hotspot (typically 50–100ms across the session, vs. under 30ms locally) because internet routing adds variance that local networks don't have. But 50–100ms is still below the echo threshold for most music, and the position sync protocol continues to operate on the online path.

Why This Architecture Matters at Scale

LekSync's broadcast model — one host sending to all receivers simultaneously — means adding more receivers doesn't add per-receiver latency. The host sends one packet per frame, and the network delivers it to all receivers. Compare this to architectures where the host sends individual streams to each receiver: those systems add host-side CPU and bandwidth load linearly with receiver count, causing the system to degrade as the room grows.

On a modern 5 GHz Wi-Fi hotspot, the broadcast model handles 5 receivers comfortably within the latency budget. On a proper Wi-Fi access point (not a phone hotspot), this extends further.

The Result in Practice

Under typical hotspot conditions:

  • All receivers using wired audio output: sync within 10–30ms of host (inaudible difference)
  • Mixed wired/speaker output: sync within 20–50ms (inaudible for most music)
  • Bluetooth output on some receivers: offset of 100–300ms relative to wired devices (Bluetooth codec latency — not LekSync's transmission latency — see our troubleshooting guide)

The 100ms headline target is a conservative, worst-case figure for the transmission side. Actual sync quality in a controlled environment (everyone on 5 GHz hotspot, wired audio output) is routinely under 30ms — well below the perceptible echo threshold.

For the comparison between peer-to-peer and cloud architectures and why P2P wins for this use case, see: Why Peer-to-Peer Beats Cloud for Real-Time Music Sharing.

Download LekSync free on Google Play and test the sync yourself — sub-100ms is audibly different from the cloud-based alternatives.

Frequently Asked Questions

What does "sub-100ms sync" actually mean in practice?
It means all receivers are playing within 100 milliseconds of each other at any given moment. Human hearing detects audio offset as an echo above roughly 20–30ms for music with a strong beat. Sub-100ms keeps everyone below the audible echo threshold in typical listening conditions.
Why does LekSync use UDP instead of TCP?
TCP retransmits dropped packets — which causes the stream to stall waiting for retransmission. For real-time audio, a brief gap (dropped UDP packet) is far less disruptive than a stall. LekSync's framing layer handles packet gaps gracefully, making UDP the correct choice for latency-sensitive audio.
What codec does LekSync use to encode the audio?
LekSync decodes source audio files to raw PCM using Android's MediaCodec API, then transmits the PCM stream over the network. On the online/WebRTC path, it uses the Opus codec, which is the standard for real-time audio in WebRTC applications.
How does the position sync protocol correct drift without restarting playback?
The host periodically broadcasts the current playback position timestamp to all receivers. Each receiver compares the received position to its local playback position and applies a correction — either fast-forwarding slightly (if behind) or pausing for a fraction of a second (if ahead). This happens below the perceptible threshold when corrections are small.
Does sync quality degrade as more devices join?
Not significantly in hotspot mode — the host broadcasts audio packets to all receivers simultaneously rather than sending individually. Adding receivers adds network load proportionally to receiver count, but on a modern 5 GHz hotspot, 5 devices is well within capacity.
What is the maximum reliable sync range over Wi-Fi?
On a 2.4 GHz hotspot: approximately 20–30 meters indoors, less through walls. On 5 GHz: 10–15 meters but with more bandwidth capacity. For larger spaces, a proper Wi-Fi access point (not the host phone's hotspot) extends range substantially while maintaining sync quality.

Try LekSync free

Stream music in sync with your friends — over hotspot, online, or from any browser.

Download on Google Play

Latest Posts