Publication:

Exploring the Benefits of Multimodal Sensor Fusion in Autonomous Driving: A Comparative Study of Camera and LiDAR Using Transformer Architectures for Object Detection

No Thumbnail Available

Files

SeniorThesis.pdf (10.98 MB)

Date

2025-04-09

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Accurate and robust object detection is critical for advancing autonomous driving systems. In recent years, transformer-based architectures have shown significant promise in this domain, offering improved performance over previous state-of-the-art technologies, largely due to their ability to handle long-range dependencies. This thesis explores the potential benefits of multimodal sensor fusion in autonomous driving by evaluating three transformer-based architectures for object detection tasks, each trained on the nuScenes dataset. The first model, TransFusion, integrates camera and LiDAR data within a unified transformer framework. The second model is a LiDAR-only variant, adapted from the TransFusion implementation to isolate the contribution from the LiDAR sensors. The third model, FCOS3D, is a camera-only model that isolates the contribution from the camera sensors. The primary goal of this research is to identify scenarios in which single-modality models (camera-only or LiDAR-only) produce conflicting detections and to analyze how the fusion-based approach handles these discrepancies. By closely examining these instances, the study evaluates whether LiDAR offers critical advantages over camera-only systems in consumer vehicles. Given the higher cost and complexity associated with LiDAR sensors, understanding whether these advantages justify the integration of LiDAR is vital for automotive manufacturers and researchers seeking to optimize safety, reliability, and system efficiency under cost constraints. Through extensive experimental evaluations, this thesis contributes insights into how multimodal fusion impacts object detection, revealing that while the LiDAR- only variant yields higher overall detection metrics in limited training environments, the camera-only approach excels at identifying near-range objects, and the fusion model effectively refines extraneous predictions. This synergy underscores trade-offs between cost and detection coverage, providing guidance for future sensor design and deployment strategies in the pursuit of a fully autonomous driving system.

Description

Keywords

Citation