Publication: Starting Small: Using Machine Learning Techniques to Identify Physically Plausible Tracks in High-Pileup Collision Events
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Analyzing the physical properties of particles scattered after high-speed collisions is an important component of particle physics research. But reconstructing particle tracks from the position data of thousands of scattered particles in high-pileup events is a difficult task. Assessing how physically plausible a track formed by a set of points is could serve as the final step in a machine learning pipeline that identifies possible reconstructed tracks, and providing an accurate assessment for the plausibility of the tracks is therefore critical for training earlier steps in the pipeline. Thus, we attempt to create a neural network that can classify sets of position points as part of one track or part of multiple different tracks. To ensure our classifier is robust, we generate the sets of points that do not come from one true track by slightly perturbing a true track, either by randomly moving points by an amount proportional to the deviation of points from a circle fit of the track or by simply swapping out some of the points for one of their nearest neighbors. We then train the neural network on these true and perturbed tracks and try to find the model that can most accurately identify the true tracks, working to ensure that the classifier is effective in the high-momentum regimes that are most relevant for track reconstruction. We find that using transfer learning by first training a model on fake tracks that are easy to identify before training using more difficult fake tracks is markedly more effective than just directly training on the difficult tracks. Using this transfer learning strategy, we create a classifier that has a total momentum-weighted accuracy of 0.6608 on the most difficult category of fake tracks and an area under the receiver operating characteristic curve of 0.7280. Finally, we suggest possible improvements and alternate methods that could improve this performance and move closer to a classifier that can be reliably incorporated into a training pipeline.