Tovar Brain-Inspired AI Lab

About

I am a junior at Vanderbilt University majoring in Computer Science and Mathematics. My research interests lie at the intersection of machine learning, artificial intelligence, and applied mathematics. I am passionate about developing robust and interpretable AI systems to efficiently solve real-world problems across diverse fields. I plan to pursue graduate studies to further advance my research.

Research Interests

Brain Alignment Multimodal Integration Machine Learning Computer Vision Artificial Intelligence Embodied AI

Current Research Projects

Featured Project

Crossmodal Human Enhancement of Multimodal AI

Human perception is inherently multisensory, with sensory modalities influencing one another. To develop more human-like multimodal AI models, it is essential to design systems that not only process multiple sensory inputs but also reflect their interconnections. In this study, we investigate the cross-modal interactions between vision and audition in large multimodal transformer models. Additionally, we fine-tune the visual processing of a state-of-the-art multimodal model using human visual behavioral embeddings

Image Quality and Neural Networks

Training data encompasses inherent biases, and it is often not immediately clear what constitutes good or bad training data with respect to these biases. Among such biases is image quality for visual datasets, which is multifaceted, involving aspects such as blur, noise, and resolution. In this study, we investigate how different aspects of image quality and its variance within training datasets affect neural network performance and their alignment with human neural representations. By analyzing large-scale image datasets using advanced image quality metrics, we categorize images based on diverse quality factors and their variances.

Featured Project

Perceptual-Initialization for Vision-Language Models

This project introduces Perceptual-Initialization (PI), a paradigm shift in visual representation learning that incorporates human perceptual structure during the initialization phase rather than as a downstream fine-tuning step. By integrating human-derived triplet embeddings from the NIGHTS dataset to initialize a CLIP vision encoder, followed by self-supervised learning, our approach demonstrates significant zero-shot performance improvements across multiple classification and retrieval benchmarks without any task-specific fine-tuning. Our findings challenge conventional wisdom by showing that embedding human perceptual structure during early representation learning yields more capable vision-language aligned systems that generalize immediately to unseen tasks.

Recent Publications

Hu, Y., Wang, R., Zhao, S. C., Zhan, X., Kim, D. H., Wallace, M., & Tovar, D. A. (2025). Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment. arXiv. https://arxiv.org/abs/2505.14204

Runchen (Clara) Wang