StackBERT-Enhancer: A Dual-Layer BERT-Based Framework for Enhancer Identification and Strength Classification in Genomic Data

dc.contributor.advisorKim, Wooyoung
dc.contributor.authorTran, Phat
dc.date.accessioned2025-08-01T22:09:45Z
dc.date.available2025-08-01T22:09:45Z
dc.date.issued2025-08-01
dc.date.submitted2025
dc.descriptionThesis (Master's)--University of Washington, 2025
dc.description.abstractAccurately identifying and classifying crucial regulatory DNA sequences known as enhancers is a significant challenge, as traditional computational methods often struggle with their complex, context-dependent nature and lack interpretability. This thesis introduces StackBERT-Enhancer, a novel deep learning framework to address these limitations, focusing on two primary tasks: distinguishing enhancer sequences from non-enhancer sequences and classifying identified enhancers by their activity levels. The proposed framework employs multiple transformer-based language models, each independently trained on DNA sequences tokenized with different k-mer sizes, allowing for the capture of sequence dependencies across various scales. These individual models are then integrated into a stacking ensemble architecture, which significantly boosts classification accuracy, robustness, and generalization, achieving state-of-the-art results of 83.5% in enhancer identification and 99.0% in enhancer strength classification. The framework utilizes distributed multi-GPU systems for efficient model training and incorporates interpretability techniques such as SHapley Additive exPlanations (SHAP) for feature importance and attention score analysis for sequence motif discovery, bridging predictive power with biological insight. This advanced approach offers a robust and interpretable tool for enhancer analysis, holding strong potential for applications in disease modeling and broader biomedical research.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherTran_washington_0250O_28014.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53217
dc.language.isoen_US
dc.rightsCC BY
dc.subjectComputational Genomics
dc.subjectDistributed Training
dc.subjectEnhancer Identification
dc.subjectModel Interpretability
dc.subjectStacking Ensemble
dc.subjectTransformer Models
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectBioinformatics
dc.subject.otherComputing and software systems
dc.titleStackBERT-Enhancer: A Dual-Layer BERT-Based Framework for Enhancer Identification and Strength Classification in Genomic Data
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tran_washington_0250O_28014.pdf
Size:
10.35 MB
Format:
Adobe Portable Document Format