Authors:
2 BACKGROUND: OMNIDIRECTIONAL 3D OBJECT DETECTION
3.1 Experiment Setup
3.2 Observations
3.3 Summary and Challenges
5 MULTI-BRANCH OMNIDIRECTIONAL 3D OBJECT DETECTION
5.1 Model Design
6.1 Performance Prediction
5.2 Model Adaptation
6.2 Execution Scheduling
8.1 Testbed and Dataset
8.2 Experiment Setup
8.3 Performance
8.4 Robustness
8.5 Component Analysis
8.6 Overhead
3D object detection with omnidirectional views enables safety critical applications such as mobile robot navigation. Such applications increasingly operate on resource-constrained edge devices, facilitating reliable processing without privacy concerns or network delays. To enable cost-effective deployment, cameras have been widely adopted as a low-cost alternative to LiDAR sensors. However, the compute-intensive workload to achieve high performance of camera-based solutions remains challenging due to the computational limitations of edge devices. In this paper, we present Panopticus, a carefully designed system for omnidirectional and camera-based 3D detection on edge devices. Panopticus employs an adaptive multi-branch detection scheme that accounts for spatial complexities. To optimize the accuracy within latency limits, Panopticus dynamically adjusts the model’s architecture and operations based on available edge resources and spatial characteristics. We implemented Panopticus on three edge devices and conducted experiments across real-world environments based on the public self-driving dataset and our mobile 360° camera dataset. Experiment results showed that Panopticus improves accuracy by 62% on average given the strict latency objective of 33ms. Also, Panopticus achieves a 2.1× latency reduction on average compared to baselines.
Along with the advances in computer vision and deep neural networks (DNNs), 3D object detection has become a core component of numerous applications. For example, autonomous vehicles rely on precise and real-time perception of objects in an environment to establish safe navigation routes [55]. Since objects can approach from any direction, as shown in Figure 1, it is crucial to ensure perception through a comprehensive 360° field of view (FOV). Such omnidirectional perception requires the processing of substantial amounts of sensor data and demands high-end computing devices with AI accelerators for real-time processing [47]. Recently, the demand for mobile applications using omnidirectional 3D object detection has become widespread. Robots or drones providing personal services such as surveillance can benefit from such technology [16]. In addition, detecting surrounding obstacles and providing audible warnings of potential hazards can help people with visual impairments [39, 56]. These personalized applications must be processed on an edge device to minimize user privacy issues or network overheads. However, even the latest NVIDIA Jetson Orin series [8], offering advanced edge compute power, has 6.7× to 13.5× fewer Tensor cores for AI acceleration compared to the powerful A100 [9] used for cloud computing, which has the same underlying GPU architecture. Furthermore, edge AI applications must consider practical factors such as cost-effective deployments. As a result, much effort has been made to support such applications with low-cost cameras [1, 38, 42, 58]. Specifically, multiple cameras or a mobile 360° camera are utilized to facilitate omnidirectional perception
Edge AI services have a wide spectrum of accuracy and latency requirements. Despite recent advances, prior works have limitations in supporting both efficiency and accuracy on resource-constrained edge devices. DeepMix [18] offloaded complex DNN-based object detection tasks to a cloud server to reduce the computational burden on an edge device. Offloading omnidirectional perception tasks, however, may cause significant edge-cloud communication latency due to massive data transmission. PointSplit [37] supports parallelized operation on edge GPU and NPU, but the scheme is optimized for a specific 3D detection pipeline utilizing an RGB-D sensor with limited FOV. Meanwhile, various methods [1, 31, 34, 38] have enhanced the accuracy of camerabased solutions, which pose inherent difficulties due to the absence of 3D depth information. A line of works [29, 30, 52] has focused on developing DNNs to enhance depth prediction from RGB images. Also, the adoption of large-scale DNNs, such as feature extraction backbones using highresolution images, is essential for accuracy improvement [51]. However, processing multiple compute-intensive DNN tasks with omnidirectional inputs places substantial computational demands on resource-constrained edge devices.
In this paper, we propose Panopticus, a system that maximizes the accuracy of omnidirectional 3D object detection while meeting the latency requirements on edge devices. We preliminarily observed that camera-based 3D detectors have varying detection capabilities depending on spatial characteristics, which are determined by various factors such as the number or movement of objects. The key idea of Panopticus is to process each camera view optimally based on the understanding of short-term dynamics in spatial distribution. For example, a camera view containing a few static and proximate objects can be processed with a lightweight inference configuration to reduce the latency with a minimal accuracy loss. The saved latency margin can then be utilized to assign a high-performing inference configuration to a complex view where objects are moving fast or in a distant location, as shown in Figure 1
Several challenges exist in the design of Panopticus. First, prior 3D detection models fail to provide an efficient and dynamic inference scheme capable of differentiating the inference configuration for each camera view in the same video frame, such as backbone capacity or the use of enhanced depth estimation. Additionally, the model’s architecture must be adjustable to accommodate the various constraints, such as latency requirements, on a given device. Second, to maximize the accuracy within latency requirements, the optimal inference configuration must be decided for each camera view. This requires a runtime analysis of both changes in spatial distribution and the expected performance of inference configurations.
To enable architectural and operational adjustments of the model, we introduce an omnidirectional 3D object detection model with multiple inference branches. The model processes each view using one of the branches with varying detection capabilities, enabling fine-grained utilization of edge computing resources. The model’s architecture is designed to be modular, enabling flexible deployments by detaching a branch that violates given constraints. For the second challenge of maximizing accuracy within latency limits, we introduce a spatial-adaptive execution scheme. At runtime, the scheme predicts the performance of each branch based on the expected spatial distribution of the surrounding objects. Optimal combinations of branches and camera views, which maximize overall estimated accuracy while meeting the latency goal, are then selected for inference. We implemented Panopticus on three edge devices with different computational capabilities. The system was evaluated in various real-world environments, such as urban roads and streets, using a public autonomous driving dataset and our custom mobile 360° camera testbed. Extensive experiments showed that Panopticus outperformed its baselines under diverse scenarios in terms of both detection accuracy and efficiency.
The key contributions of our work are as follows: •
To the best of our knowledge, Panopticus is the first omnidirectional and camera-based 3D object detection system that achieves both accuracy and latency optimization on resource-constrained edge devices.
• We conducted an in-depth study to explore the varying capabilities of recent 3D detectors influenced by diverse characteristics of objects and spaces. Panopticus provides fine-grained control over omnidirectional perception and edge resource utilization, adapting to varying spatial complexities in dynamic environments.
• We fully implemented Panopticus as an end-to-end edge computing system using both a public self-driving
dataset and our mobile 360° camera testbed, showcasing its adaptability to the resource constraints of edge devices across a range of real-world conditions
This paper is