Research Projects

My research integrates computational linguistics and machine learning to build human-centered AI systems. These projects demonstrate the intersection of domain expertise in linguistics, cognitive science, and speech processing with modern ML/AI techniques.

Emotion-Aware Conversational AI for Dementia Care

Role: AI/ML Lead Project funded by UCOP Noyce Initiative

Overview

Leading the development of a clinically-validated AI pipeline for managing dementia-related agitation through therapeutic conversational interactions. This project creates the first AI system that combines real-time emotion recognition with expert-designed empathetic therapeutic response generation for the 55+ million people worldwide living with dementia.

Technical Contributions

Linguistics & Cognitive Science Integration

Technologies: Python, PyTorch, Transformers, OpenAI Whisper, wav2vec2, Azure Cognitive Services, Streamlit, HuggingFace, SentenceTransformers

Discourse Position and Context Length Matter: Information-Theoretic Analysis of Text-Based Emotion in IEMOCAP

Role: First & Corresponding Author arXiv Preprint 2026

Overview

Systematic analysis of emotion recognition in conversation (ERC), moving beyond "black box" accuracy to understand how models work. Achieved state-of-the-art text-only performance on IEMOCAP using strictly causal context, surpassing prior methods that exploit future utterances.

Technical Contributions

Linguistics & Cognitive Science Integration

Technologies: Python, PyTorch, CUDA, Transformers, RoBERTa, LSTM, Scikit-learn, HuggingFace Accelerate, SenticNet

LLM Alignment for Linguistic Empathy Using Direct Preference Optimization

Role: First & Corresponding Author Submitted to ACL 2026

Overview

A data-efficient approach to LLM alignment using linguistically-motivated symbolic rules instead of expensive human preference data. This work introduces RLSF (Reinforcement Learning from Symbolic Feedback), achieving 100% alignment accuracy within 300 training steps—30-170x faster than traditional RLHF methods.

Technical Contributions

Linguistics & Cognitive Science Integration

Technologies: Python, PyTorch, Transformers, DPO, Claude API, HuggingFace, Scikit-learn, Amazon MTurk

Additional Projects

Multimodal Team Communication Analysis

Contributed to MultiCAT, a comprehensive annotation framework for multimodal team communication, published at NAACL 2025. Developed annotation schemas for verbal and non-verbal communication patterns in collaborative settings, incorporating Theory of Mind and multi-party conversation analysis.

Technologies: Multimodal annotation, Theory of Mind, Multi-party conversation, Inter-rater reliability analysis

Big Data Phonetics: Korean Stop Hyperarticulation

First Author | Applied automated acoustic analysis to 100,000+ tokens from Korean broadcast speech, demonstrating how speakers hyperarticulate phonetic cues in lexically confusable contexts. This work bridges corpus linguistics with speech technology.

Technologies: Python, Praat scripting, Forced alignment, Statistical modeling

ML-Based Support Ticket Classification (FindingFive)

HLT Internship | Built text classification system for customer support automation. Manually labeled 273 tickets, extracted 2,935 TF-IDF features, and compared 5 classifiers. Linear SVC achieved 70.4% accuracy; CNN with Word2Vec embeddings achieved 69%.

Technologies: Python, Scikit-learn, Keras, CNN, Word2Vec, TF-IDF, Zoho API

Technical Skills

Programming Languages

Python, R, C++, Bash

ML/AI Frameworks

PyTorch, TensorFlow, HuggingFace Transformers, Scikit-learn

NLP & Speech

LLMs (GPT-4, Claude), Whisper, wav2vec2, BERT/RoBERTa, Praat, Azure Speech

Cloud & Hardware

NVIDIA GPUs (Academic Grant), Saturn Cloud, AWS

Research Methods

Experimental design, Statistical analysis, IRR, Clinical validation