Abstract
Recent advances in multimodal training allow for integration of images and text within a unified model. Given their black-box nature, little is known on the strategies developed by vision-language models (VLMs) to allow efficient communication between the two modalities. This seminar discusses communication in VLMs combining geometric methods to describe transformers’ hidden representations and techniques from mechanistic interpretability. We will show that for VLMs that generate in both modalities the communication of the image information to the textual part of the prompt is mediated by a single token, and that editing its content enables steering the image semantics and its textual description. We will then discuss preliminary results on the behaviour of VLMs when they are presented images contradicting the factual knowledge stored in the underlying language part.
Speaker Bio
Alberto Cazzaniga is a Researcher at the RIT Institute at AREA Science Park in Trieste, where he coordinates the activities of the LADE research group focused on applications of Artificial Intelligence in life and material sciences. After completing a DPhil in Mathematics at the University of Oxford and a Claude Leon Fellowship at AIMS-SA, he moved to Trieste and transitioned to research in Artificial Intelligence. He is interested in the emergence of meaningful features in deep-learning models trained by self-supervision, and in particular transformers. His recent research is focused on understanding computational strategies of these models to enhance their performance in applications and make them more robust and trustworthy.See you there :rocket: