EchoVision

Real-Time Assistive Navigation

A hybrid NPU–CPU inference pipeline that deploys EfficientViT-SAM and YOLO on embedded hardware for real-time scene segmentation and obstacle detection. Built to run entirely on-device — no cloud dependency — for low-power wearable assistive navigation.

EchoVision wearable assistive navigation system in use

EchoVision running real-time segmentation on a low-power wearable device

EIFCOM

Accepted · 2026 (co-located with ACM MobiSys)

100%

On-Device · Zero Cloud Dependency

NPU + CPU

Hybrid Inference Pipeline

The Bottleneck: Vision-Level Perception on Wearables

People with low or no vision rely on assistive tools that must understand the world fast enough to keep them safe — typically tens of milliseconds, not seconds. Modern segmentation models (SAM-class) can describe a scene with stunning fidelity, but at a computational cost that has historically required a tethered GPU or cloud round-trip. Neither option is acceptable on a wearable: tethers fail, and cloud hops add latency, drop on poor connectivity, and raise privacy concerns about constantly streaming a camera feed of someone's surroundings.

EchoVision was designed around a single constraint: everything must run on the device, in real time, on a battery. That constraint reframes the problem from "what's the best model" to "what's the best system".

A Hybrid NPU–CPU Pipeline

Wearable-class SoCs increasingly ship with dedicated neural accelerators (NPUs) alongside general-purpose CPU cores. Most vision pipelines pin the entire workload to one or the other and leave headroom on the table. EchoVision splits the workload by workload character, not by model:

Step 01 · CPU

YOLO Detection

Lightweight object detection runs continuously on the CPU, surfacing candidate regions of interest at high frame rate.

Step 02 · NPU

EfficientViT-SAM

The NPU executes a distilled SAM variant only on prompted regions, returning dense masks without paying full-image cost.

Step 03 · CPU

Spatial Reasoning

Mask + detection outputs are fused into an obstacle map and routed to the navigation feedback layer.

The result is an architecture where the NPU is never idle waiting on the CPU and the CPU is never blocked waiting on the NPU — a pipeline whose effective throughput is closer to the sum of the two units than to the slower of them.

Why This Matters for Assistive Navigation

The contribution isn't a new model — it's a deployment story. By demonstrating that a SAM-class segmenter can co-exist with real-time detection on commodity wearable silicon, EchoVision shows a path toward assistive devices that perceive their environment as richly as a sighted person does, without surrendering privacy, connectivity, or battery life. The same hybrid-inference recipe generalizes naturally to other on-body and on-robot perception tasks where latency, energy, and offline operation are non-negotiable.

Status & Next Steps

EchoVision was accepted to EIFCOM 2026, co-located with ACM MobiSys. Active follow-on work is exploring tighter integration with tactile feedback channels and extending the pipeline to indoor mapping and dynamic obstacle tracking — feeding directly into my broader research agenda on embedded perception for assistive and clinical robotics.