
About
I am a junior at Vanderbilt University majoring in Computer Science and Mathematics. My research interests lie at the intersection of machine learning, artificial intelligence, and applied mathematics. I am passionate about developing robust and interpretable AI systems to efficiently solve real-world problems across diverse fields. I plan to pursue graduate studies to further advance my research.
Research Interests
Current Research Projects
Crossmodal Human Enhancement of Multimodal AI
Human perception is inherently multisensory, with sensory modalities influencing one another. To develop more human-like multimodal AI models, it is essential to design systems that not only process multiple sensory inputs but also reflect their interconnections. In this study, we investigate the cross-modal interactions between vision and audition in large multimodal transformer models. Additionally, we fine-tune the visual processing of a state-of-the-art multimodal model using human visual behavioral embeddings
Image Quality and Neural Networks
Training data encompasses inherent biases, and it is often not immediately clear what constitutes good or bad training data with respect to these biases. Among such biases is image quality for visual datasets, which is multifaceted, involving aspects such as blur, noise, and resolution. In this study, we investigate how different aspects of image quality and its variance within training datasets affect neural network performance and their alignment with human neural representations. By analyzing large-scale image datasets using advanced image quality metrics, we categorize images based on diverse quality factors and their variances.
Perceptual-Initialization for Vision-Language Models
This project introduces Perceptual-Initialization (PI), a paradigm shift in visual representation learning that incorporates human perceptual structure during the initialization phase rather than as a downstream fine-tuning step. By integrating human-derived triplet embeddings from the NIGHTS dataset to initialize a CLIP vision encoder, followed by self-supervised learning, our approach demonstrates significant zero-shot performance improvements across multiple classification and retrieval benchmarks without any task-specific fine-tuning. Our findings challenge conventional wisdom by showing that embedding human perceptual structure during early representation learning yields more capable vision-language aligned systems that generalize immediately to unseen tasks.
Recent Publications
- Hu, Y., Wang, R., Zhao, S. C., Zhan, X., Kim, D. H., Wallace, M., & Tovar, D. A. (2025). Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment. arXiv. https://arxiv.org/abs/2505.14204