Odjel za računarstvo Hrvatske sekcije IEEE  poziva vas na predavanje pod naslovom:

How Many Words is A Picture Really Worth? On Training and Evaluating Large Vision-Language Models

koje će održati Dr. sc. Goran Glavaš u srijedu, 29. svibnja 2024. godine s početkom u 10:00 sati u Sivoj vijećnici Fakulteta elektrotehnike i računarstva.

Predavanje je otvoreno za sve zainteresirane, a posebno se pozivaju studenti. Jezik predavanja je engleski.

Životopis predavača i sažetak predavanja nalaze se u nastavku obavijesti.

Sažetak predavanja:

Large Vision-Language Models (LVLMs), commonly obtained by aligning a pretrained visual encoder (e.g., a Vision Transformer, ViT) to a pretrained large language model (LLM), have recently led to impressive results not only in image captioning, but also on a wide range of visual understanding and reasoning tasks (e.g., visual question answering). Nonetheless, there are a number of factors involved, ranging from the architecture of the alignment module to the exact "training mix" (i.e., training tasks and data) that strongly determine the effectiveness of the resulting LVLM. Moreover, LVLMs (much like their text-only counterparts), are not inherently multilingual and suffer from hallucination. In this talk, I'll explore training and evaluation protocols for LVLMs, focusing in particular on (i) efficiently training  competitive massively multilingual LVLMs, (ii) training with grounding objectives, reported to reduce hallucinative tendencies of LVLMs, and (iii) pitfalls of existing LVLM evaluation and possible remedies.

 

O predavaču:

Goran Glavaš is a Full Professor for Natural Language Processing at the University of Würzburg (Germany), Center for AI and Data Science (CAIDAS). He obtained his Ph.D. at the Text Analysis and Knowledge Engineering Lab (TakeLab), Faculty of Electrical Engineering and Computing, University of Zagreb. His research interests are in the areas of Natural Language Processing and Information Retrieval, with focus on multilingual NLP and IR and cross-lingual transfer, vision-and-language models and multimodal representation learning, information extraction, and NLP applications (primarily for social sciences and humanities). He has (co-)authored over 120 publications in the areas of NLP and IR, publishing regularly at top-tier NLP and IR venues (ACL, EMNLP, NAACL, EACL, TACL, SIGIR, ECIR). He is a prominent member of the Association for Computational Linguistics (ACL), where he served as an Editor-in-Chief of the ACL Rolling Review (ARR), a central reviewing service of the ACL, and regularly serves as an (Senior) Area Chair for top-tier conferences. He is a member of the Association for Computational Linguistics and German Society for Computational Linguistics (GSCL).

Autor: Lucija Petricioli
How Many Words is A Picture Really Worth? On Training and Evaluating Large Vision-Language Models
29. svibnja 2024. 10:00  -  11:00
Popis obavijesti