paint-brush
Efficient 3D Object Detection for Edge Devicesby@omnidirectional

Efficient 3D Object Detection for Edge Devices

by Omnidirectional TechnologyMarch 2nd, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Panopticus is a 3D object detection system for edge devices, balancing accuracy and latency through an adaptive multi-branch model and spatial-aware execution scheduling.

Coin Mentioned

Mention Thumbnail
featured image - Efficient 3D Object Detection for Edge Devices
Omnidirectional Technology HackerNoon profile picture
0-item

ABSTRACT

1 INTRODUCTION

2 BACKGROUND: OMNIDIRECTIONAL 3D OBJECT DETECTION

3 PRELIMINARY EXPERIMENT

3.1 Experiment Setup

3.2 Observations

3.3 Summary and Challenges

4 OVERVIEW OF PANOPTICUS

5 MULTI-BRANCH OMNIDIRECTIONAL 3D OBJECT DETECTION

5.1 Model Design

6 SPATIAL-ADAPTIVE EXECUTION

6.1 Performance Prediction

5.2 Model Adaptation

6.2 Execution Scheduling

7 IMPLEMENTATION

8 EVALUATION

8.1 Testbed and Dataset

8.2 Experiment Setup

8.3 Performance

8.4 Robustness

8.5 Component Analysis

8.6 Overhead

9 RELATED WORK

10 DISCUSSION AND FUTURE WORK

11 CONCLUSION AND REFERENCES



4 OVERVIEW OF PANOPTICUS

Motivated by our observations and insights, we present Panopticus, an omnidirectional 3D object detection system designed for resource-constrained edge devices. Panopticus aims to maximize detection accuracy while meeting the latency objective with a given edge device. Figure 6 illustrates the system architecture of Panopticus. To accommodate camera views with varying inference requirements, we propose an omnidirectional 3D detection model equipped with multiple inference branches. Each branch comprises different modules from BEV-based 3D detectors, providing flexibility in processing each camera view with distinct detection capabilities. The model architecture can be modified by simply detaching the branch modules to meet


the diverse requirements accounting for latency and device. In the offline stage, runtime characteristics such as inference latency are profiled for each branch. Modules within a heavy branch that exceed latency or memory constraints are removed. At runtime, each of 𝑁 multi-view images at time 𝑡 is processed by the proper branch to generate accurate 3D bounding boxes. The optimal selection of branch and image combinations is determined by a spatial-adaptive execution scheduler. The scheduler estimates the expected accuracy and inference time for each branch-image pair, based on the predicted spatial distribution for the incoming frame. Then, the optimal 𝑁 image-branch pairs are selected, which maximizes overall accuracy while ensuring the total predicted latency is within the target latency. The detected boxes are used to update the tracked status of the surrounding objects, allowing the continuous observation of spatial characteristics.


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.