FusionSense: Tri-Stage Learning for Energy-Efficient Multimodal Edge Intelligence

FusionSense introduces a novel, runtime-adaptive framework that enables energy-constrained autonomous systems to intelligently decide what to compute and transmit at the edge by learning cross-modal dependencies, achieving significant reductions in energy consumption and data transmission

Autonomous systems operating on edge devices face critical challenges regarding energy, latency, and reliability when processing multimodal sensor data (e.g., cameras and LiDAR). Prior methods often ignore cross-modal dependencies or rely on centralized processing. This paper presents FusionSense, a fusion-aware intelligent sensing framework designed to address these issues for energy-constrained edge systems.

The framework employs a three-step procedure for training lightweight near-sensor classifiers: 1) A server-side fusion model learns the downstream task; 2) Filter-out-safe (FoS) labels quantify the necessity of each modality relative to the fused decision; and 3) An edge-side fusion model incorporates these predictions as auxiliary signals. This results in a runtime decision layer that simultaneously reduces computation and communication while scaling linearly with the number of sensors.

Experimental results demonstrate substantial gains, showing that FusionSense sustains task quality at much higher data-reduction rates than uni-modal filters. For a dual-modality setup (RGB+Depth/LiDAR), the method delivers large end-to-end energy savings, including up to 33x lower energy consumption at 1% FoI prevalence, and a 92.3% reduction in quality loss while achieving 30% data reduction.

Source

arXiv – Machine Learning · arxiv.org

Read at source