AI-powered millimeter-wave radar can transcribe phone calls from several meters away by sensing tiny vibrations from a smartphone’s earpiece. In controlled tests, researchers demonstrated up to 60% transcription accuracy at distances up to three meters.
The work, presented at WiSec 2025, highlights a growing class of side-channel risks. It suggests that everyday devices emitting or sensing radio waves could one day infer sensitive information without ever recording a microphone feed.
How radar “hears” a call
Smartphones vibrate subtly when audio plays through the earpiece. Millimeter-wave radar can detect motion on the order of micrometers, capturing those vibrations as time-varying signals that correlate with speech patterns.
Researchers processed these radar returns into features Whisper could understand, turning mechanical vibrations into words. The approach bypasses traditional microphones and instead treats the phone itself as a tiny vibrating speaker enclosure.
Did you know?
Millimeter-wave radar can detect sub-millimeter motion and has been used to monitor human breathing and heart rate through clothing and walls in controlled studies.
From keywords to conversations
Earlier efforts could only spot a handful of predefined words reliably. The new system scales to full sentences with a vocabulary near 10,000 words, moving from keyword spotting to conversation-level inference in realistic scenarios.
That leap comes from improved radar signal processing and better language modeling, enabling context to fill gaps where the raw signal is weak. It is an advance that shifts the privacy stakes meaningfully higher.
Adapting Whisper for noisy radar data
Radar-derived audio is extremely noisy and bandwidth-limited compared to normal speech. The team used low-rank adaptation to specialize Whisper by retraining a small fraction of parameters, optimizing it for low-SNR, radar-like inputs.
This keeps the core model intact while teaching it to decode distorted spectro-temporal cues. The result is usable transcripts even when the signal looks unintelligible to conventional systems.
ALSO READ | Why is Chinese media targeting Nvidia’s H20 now?
What 60% accuracy really means
Sixty percent does not read like a wiretap-grade transcript, but it can still reveal names, places, numbers, and intent. Like lip reading, partial recognition combined with context can expose highly sensitive details.
For targeted surveillance, even intermittent success may be enough to compromise confidentiality. The mere plausibility of recovery elevates the standard for threat models associated with phone calls.
Practical limits and lab constraints
Experiments used stationary setups, cooperative positioning, and controlled noise conditions friendlier than most public spaces. Movement, occlusions, and environmental clutter can degrade performance substantially in the wild.
Still, the trajectory is clear. As sensors, models, and denoising improve, the feasibility window may widen. Planning mitigations before real-world exploitation is prudent.
Mitigations and what comes next
Potential defenses include hardware damping around earpieces, OS modes that modulate audio to mask vibration signatures, and user guidance for sensitive calls. Enterprise policies may discourage open-area calls where line-of-sight sensing is possible.
Future research will likely test varied phone models, cases, and form factors and explore countermeasures that preserve call quality while blunting radar interference.
Why this matters now
Millimeter-wave sensing is proliferating across consumer and industrial tech. As capabilities spread, so does the risk surface for unconventional eavesdropping. Awareness today can drive smarter design choices tomorrow.
The findings underline a broader truth: privacy threats increasingly exploit physics, not just software. Addressing them will require coordinated advances in hardware design, signal processing, and policy.
Comments (0)
Please sign in to leave a comment