My research integrates computational linguistics and machine learning to build human-centered AI systems. These projects demonstrate the intersection of domain expertise in linguistics, cognitive science, and speech processing with modern ML/AI techniques.
Leading the development of a clinically-validated AI pipeline for managing dementia-related agitation through therapeutic conversational interactions. This project creates the first AI system that combines real-time emotion recognition with expert-designed empathetic therapeutic response generation for the 55+ million people worldwide living with dementia.
Systematic analysis of emotion recognition in conversation (ERC), moving beyond "black box" accuracy to understand how models work. Achieved state-of-the-art text-only performance on IEMOCAP using strictly causal context, surpassing prior methods that exploit future utterances.
A data-efficient approach to LLM alignment using linguistically-motivated symbolic rules instead of expensive human preference data. This work introduces RLSF (Reinforcement Learning from Symbolic Feedback), achieving 100% alignment accuracy within 300 training steps—30-170x faster than traditional RLHF methods.
Contributed to MultiCAT, a comprehensive annotation framework for multimodal team communication, published at NAACL 2025. Developed annotation schemas for verbal and non-verbal communication patterns in collaborative settings, incorporating Theory of Mind and multi-party conversation analysis.
First Author | Applied automated acoustic analysis to 100,000+ tokens from Korean broadcast speech, demonstrating how speakers hyperarticulate phonetic cues in lexically confusable contexts. This work bridges corpus linguistics with speech technology.
HLT Internship | Built text classification system for customer support automation. Manually labeled 273 tickets, extracted 2,935 TF-IDF features, and compared 5 classifiers. Linear SVC achieved 70.4% accuracy; CNN with Word2Vec embeddings achieved 69%.
Python, R, C++, Bash
PyTorch, TensorFlow, HuggingFace Transformers, Scikit-learn
LLMs (GPT-4, Claude), Whisper, wav2vec2, BERT/RoBERTa, Praat, Azure Speech
NVIDIA GPUs (Academic Grant), Saturn Cloud, AWS
Experimental design, Statistical analysis, IRR, Clinical validation