Albert (Yang) Hu

Data Science Graduate Student

Scroll for more info

About

My work explores the intersection of deep learning and neuroscience, analyzing brain activity during multimodal processing with transformer-based models. I have extensive experience in machine learning applications across physics, neuroscience research, and large language model in problem-solving, I am excited about extending my work toward a human aligned multimodal AI agents that enhance our understanding of complex systems and solve practical questions.

Research Interests

Brain Alignment Machine Learning Artificial Intelligence Large Language Models

Current Research Projects

Personalized Brain-Inspired AI Models (CLIP-Human Based Analysis)
Featured Project

Personalized Brain-Inspired AI Models (CLIP-Human Based Analysis)

We introduced personalized brain-inspired AI models by integrating human behavioral embeddings and neural data to better align artificial intelligence with cognitive processes. The study fine-tunes multimodal state-of-the-art AI model (CLIP) in a stepwise manner—first using large-scale behavioral decisions, then group-level neural data, and finally, participant-specific neural dynamics—enhancing both behavioral and neural alignment. The results demonstrate the potential for individualized AI systems, capable of capturing unique neural representations, with applications spanning medicine, cognitive research, and human-computer interfaces.

Crossmodal Human Enhancement of Multimodal AI
Featured Project

Crossmodal Human Enhancement of Multimodal AI

Human perception is inherently multisensory, with sensory modalities influencing one another. To develop more human-like multimodal AI models, it is essential to design systems that not only process multiple sensory inputs but also reflect their interconnections. In this study, we investigate the cross-modal interactions between vision and audition in large multimodal transformer models. Additionally, we fine-tune the visual processing of a state-of-the-art multimodal model using human visual behavioral embeddings

Investigating the Emergence of Complexity from the Dimensional Structure of Mental Representations

Visual complexity significantly influences our perception and cognitive processing of stimuli, yet its quantification remains challenging. This project explores two computational approaches to address this issue. First, we employ the CLIP-HBA model, which integrates pre-trained CLIP embeddings with human behavioral data, to decompose objects into constituent dimensions and derive personalized complexity metrics aligned with human perception. Second, we directly prompt AI models to evaluate specific complexity attributes, such as crowding and patterning, enabling the assessment of distinct complexity qualities without relying on human-aligned embeddings. By comparing the predictive power of these models through optimization and cross-validation, we aim to discern the unique aspects of visual complexity each captures, thereby enhancing our understanding of how complexity affects perception and informing the development of more effective visual communication strategies.​

Image Quality and Neural Networks

Training data encompasses inherent biases, and it is often not immediately clear what constitutes good or bad training data with respect to these biases. Among such biases is image quality for visual datasets, which is multifaceted, involving aspects such as blur, noise, and resolution. In this study, we investigate how different aspects of image quality and its variance within training datasets affect neural network performance and their alignment with human neural representations. By analyzing large-scale image datasets using advanced image quality metrics, we categorize images based on diverse quality factors and their variances.

Speech-to-Sentiment Pipeline for Analyzing Semantic and Expressive Changes Across the Lifespan

This project aims to develop an accessible pipeline and user interface that converts spoken language into sentiment and semantic analyses. The initial application will investigate how semantics and expressivity in speech evolve with age. By processing speech inputs to extract prosodic features—such as intonation, tone, and pitch—and semantic content, we will generate representational embeddings that encapsulate both the meaning and emotional nuances of spoken language. Utilizing multimodal models, the pipeline will facilitate the comparison of these embeddings across different age groups, providing insights into the developmental trajectory of speech characteristics throughout the human lifespan.