Computer Science, 1987-2025
Permanent URI for this collectionhttps://theses-dissertations.princeton.edu/handle/88435/dsp01mp48sc83w
Browse
Browsing Computer Science, 1987-2025 by Author "Bhattacharjee, Roma"
- Results Per Page
- Sort Options
ChangeIn: A Dynamic Camera Intrinsics Benchmark for Robust Computer Vision Systems
(2025-04-10) Bhattacharjee, Roma; Deng, JiaModern Simultaneous Localization and Mapping (SLAM) systems assume that camera intrinsics remain fixed throughout a video. This assumption breaks down in real-world, “in-the-wild” scenarios where zoom and focus can vary dynamically, meaning existing SLAM systems do not work on real-world videos. To address this, we introduce ChangeIn, a benchmark tailored for per-frame dynamic intrinsics prediction, laying the groundwork for more robust SLAM and other vision systems that can handle dynamic intrinsics.
To properly label ChangeIn’s dataset, we recorded a set of calibration videos using traditional calibration boards and resorting to drone-based methods for wider FOVs. Then, a comprehensive lookup table (LUT) was built that interpolates between our collected calibration data to map any lens focal length and focus distance to intrinsic parameters. Using this table, we produced ground truth intrinsics for a diverse collection of 389 real-world videos (126 indoor, 263 outdoor), featuring focal lengths from 17mm to 250mm and captured across varied environments—from Princeton’s campuses to urban centers. In addition to featuring changing zooms and focus distances, these videos include both static and moving cameras, as well as dynamic scene elements like people, vehicles, and interaction with objects, adding to the realism and complexity of the dataset. Evaluation of three existing intrinsic prediction methods on this benchmark demonstrates much room for improvement.
We hope ChangeIn will serve as a valuable resource for developing and evaluating models that estimate intrinsics on a per-frame basis and ultimately foster new research into SLAM and other computer vision applications on real-world videos.