EchoVision
Real-Time Assistive Navigation
A hybrid NPU–CPU inference pipeline that deploys EfficientViT-SAM and YOLO on embedded hardware for real-time scene segmentation and obstacle detection. Built to run entirely on-device — no cloud dependency — for low-power wearable assistive navigation.
EchoVision running real-time segmentation on a low-power wearable device
The Bottleneck: Vision-Level Perception on Wearables
People with low or no vision rely on assistive tools that must understand the world fast enough to keep them safe — typically tens of milliseconds, not seconds. Modern segmentation models (SAM-class) can describe a scene with stunning fidelity, but at a computational cost that has historically required a tethered GPU or cloud round-trip. Neither option is acceptable on a wearable: tethers fail, and cloud hops add latency, drop on poor connectivity, and raise privacy concerns about constantly streaming a camera feed of someone's surroundings.
EchoVision was designed around a single constraint: everything must run on the device, in real time, on a battery. That constraint reframes the problem from "what's the best model" to "what's the best system".
A Hybrid NPU–CPU Pipeline
Wearable-class SoCs increasingly ship with dedicated neural accelerators (NPUs) alongside general-purpose CPU cores. Most vision pipelines pin the entire workload to one or the other and leave headroom on the table. EchoVision splits the workload by workload character, not by model:
YOLO Detection
Lightweight object detection runs continuously on the CPU, surfacing candidate regions of interest at high frame rate.
EfficientViT-SAM
The NPU executes a distilled SAM variant only on prompted regions, returning dense masks without paying full-image cost.
Spatial Reasoning
Mask + detection outputs are fused into an obstacle map and routed to the navigation feedback layer.
The result is an architecture where the NPU is never idle waiting on the CPU and the CPU is never blocked waiting on the NPU — a pipeline whose effective throughput is closer to the sum of the two units than to the slower of them.
Why This Matters for Assistive Navigation
The contribution isn't a new model — it's a deployment story. By demonstrating that a SAM-class segmenter can co-exist with real-time detection on commodity wearable silicon, EchoVision shows a path toward assistive devices that perceive their environment as richly as a sighted person does, without surrendering privacy, connectivity, or battery life. The same hybrid-inference recipe generalizes naturally to other on-body and on-robot perception tasks where latency, energy, and offline operation are non-negotiable.
Status & Next Steps
EchoVision was accepted to EIFCOM 2026, co-located with ACM MobiSys. Active follow-on work is exploring tighter integration with tactile feedback channels and extending the pipeline to indoor mapping and dynamic obstacle tracking — feeding directly into my broader research agenda on embedded perception for assistive and clinical robotics.