Princeton University users: to view a senior thesis while away from campus, connect to the campus network via the Global Protect virtual private network (VPN). Unaffiliated researchers: please note that requests for copies are handled manually by staff and require time to process.
 

Publication:

Compact, Fast, and Low-Energy Language Modeling with Differentiable Logic Network Transformers

Loading...
Thumbnail Image

Files

Warren_Conor.pdf (1.51 MB)

Date

2025-04-14

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Access Restrictions

Abstract

Deep learning has experienced widespread adoption across various disciplines and applications because of its versatile problem-solving capabilities. Such versatility arises from the diverse set of deep learning architectures that have proposed and optimized for different settings. The transformer is one such deep learning architecture: especially effective at learning the long-term relationships that characterize natural language, it has achieved state-of-the-art performance on language-related tasks. Its aptitude, however, is scale-dependent, and the scale required to achieve such striking performance leads to three significant inefficiencies in transformer-based language models: large memory footprints, high inference latencies, and high energy consumption – all of which render the deployment of transformers prohibitively expensive in general and entirely infeasible in resource-constrained environments. The recent introduction of efficient and performant differentiable logic networks (DLNs) as an alternative to standard neural networks may help alleviate these limitations when other techniques like pruning, quantization, parameter-efficient finetuning, knowledge distillation, and architecture modification fall short. The present work explores this possibility, replacing the feedforward neural networks of a pretrained transformer model with highly efficient DLNs to produce DLN-transformers (DLN-Ts). The DLN-Ts we synthesize here demonstrate similar performance to the baseline transformer model on the GLUE benchmark, with inferred improvements in memory use, inference latency, and energy consumption. The DLN-T, therefore, may be a viable precursor to a compact, fast, and low-energy language model.

Description

Keywords

Citation