Princeton University users: to view a senior thesis while away from campus, connect to the campus network via the Global Protect virtual private network (VPN). Unaffiliated researchers: please note that requests for copies are handled manually by staff and require time to process.
 

Publication:

Inferences of DNA sequences from Förster Resonance Energy Transfer data

Loading...
Thumbnail Image

Files

Ciccone_Matthew_Thesis_Final.pdf (1.1 MB)

Date

2025-04

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Access Restrictions

Abstract

Förster resonance energy transfer (FRET) is a spectroscopy technique that measures energy transfer between acceptor and donor compounds attached to biological molecules. It is a convenient metric for distance measurement on the angstrom scale. Theoretically, this process could be applied to DNA mixtures to identify key components in a variety of applications, such as pathogen detection in crops. The uncertainty inherent in the use of the procedure for identification must be overcome with a similarly stochastic process. This work presents a foundational dataset which serves as a proof-of-concept for the potential to create an unsupervised model that could be used to determine whether detectable patterns arise from FRET efficiency sampling of such mixtures. A coarse-grained DNA model is selected for compatibility with relevant protein models and close approximation of realistic datasets. Through CMA-ES performed on MD simulations of twelve batches of potential acceptor probes that seek to select for interaction with one genome of interest, a basis for synthetic data generation is achieved. Analysis shows that significant changes to the fitness of probes for selectivity of interaction with a specific DNA mixture can occur, showcasing that extraction of patterned information from a synthetic FRET profile for the purpose of mixture identification is plausible. This work provides insight into how machine learning models can be used for the characterization and identification of unknown DNA sequence mixtures while also furthering the advancement of such analysis in dynamic environmental systems like what can be found in field conditions.

Description

Keywords

Citation