Publication:

Exploring the Benefits of Multimodal Sensor Fusion in Autonomous Driving: A Comparative Study of Camera and LiDAR Using Transformer Architectures for Object Detection

dc.contributor.advisorKornhauser, Alain Lucien
dc.contributor.authorDoniger, Sammy
dc.date.accessioned2025-08-06T15:55:31Z
dc.date.available2025-08-06T15:55:31Z
dc.date.issued2025-04-09
dc.description.abstractAccurate and robust object detection is critical for advancing autonomous driving systems. In recent years, transformer-based architectures have shown significant promise in this domain, offering improved performance over previous state-of-the-art technologies, largely due to their ability to handle long-range dependencies. This thesis explores the potential benefits of multimodal sensor fusion in autonomous driving by evaluating three transformer-based architectures for object detection tasks, each trained on the nuScenes dataset. The first model, TransFusion, integrates camera and LiDAR data within a unified transformer framework. The second model is a LiDAR-only variant, adapted from the TransFusion implementation to isolate the contribution from the LiDAR sensors. The third model, FCOS3D, is a camera-only model that isolates the contribution from the camera sensors. The primary goal of this research is to identify scenarios in which single-modality models (camera-only or LiDAR-only) produce conflicting detections and to analyze how the fusion-based approach handles these discrepancies. By closely examining these instances, the study evaluates whether LiDAR offers critical advantages over camera-only systems in consumer vehicles. Given the higher cost and complexity associated with LiDAR sensors, understanding whether these advantages justify the integration of LiDAR is vital for automotive manufacturers and researchers seeking to optimize safety, reliability, and system efficiency under cost constraints. Through extensive experimental evaluations, this thesis contributes insights into how multimodal fusion impacts object detection, revealing that while the LiDAR- only variant yields higher overall detection metrics in limited training environments, the camera-only approach excels at identifying near-range objects, and the fusion model effectively refines extraneous predictions. This synergy underscores trade-offs between cost and detection coverage, providing guidance for future sensor design and deployment strategies in the pursuit of a fully autonomous driving system.
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp01w9505392z
dc.language.isoen_US
dc.titleExploring the Benefits of Multimodal Sensor Fusion in Autonomous Driving: A Comparative Study of Camera and LiDAR Using Transformer Architectures for Object Detection
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-04-10T01:58:05.876Z
pu.contributor.authorid920269205
pu.date.classyear2025
pu.departmentOps Research & Financial Engr
pu.minorFinance
pu.minorComputer Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SeniorThesis.pdf
Size:
10.98 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download