Invited Speakers

About Keynote Speaker

Dr. Mubarak Shah

Title: Multi-Modal Retrieval in Computer Vision

  • Center for Research in Computer Vision University of Central Florida
  • Email:
Retrieval is widely used in solving several computer vision problems including human re-identification, image geo-localization, face recognition, gait recognition etc. The aim is to learn robust representation, which is invariant to viewpoint, lighting conditions, and other nuisances, such that the representation of a query can be employed to retrieve relevant items from a gallery. I will start this presentation discussing two novel retrieval methods we have developed for clothes-changing re-identification and cross-view geo-localization employing Denoising Diffusion Probabilistic Models. Clothes-Changing Re-identification (CC-ReID) aims to match people appearing in non-overlapping cameras, even when they change their clothes across cameras. Cross-view image geo-localization (CVIG) involves matching a given street view image against the gallery of geo-tagged aerial images. Above two retrieval methods deal with single modality, an image. However, humans often rely on multiple modality inputs to have a better understanding of everyday events. Most commonly, we utilize vision, audio, and language to perceive an event as they provide complementary information for robust reasoning. Next, I will present a novel Semantic-Structure-Preserving Consistency approach by learning the modality-specific relationships in the joint embedding space to improve multimodal retrieval, involving audio, video, and text. I will end this presentation by presenting composed video retrieval (CoVR) where the objective is to retrieve the target video, given the reference video and its corresponding compositional change text.
Dr. Mubarak Shah, the UCF Trustee Chair Professor, is the founding director of Center for Research in Computer Visions at University of Central Florida (UCF). Dr. Shah is a fellow of ACM, IEEE, AAAS, NAI, IAPR, AAIA and SPIE. He has published extensively on topics related to human activity and action recognition, visual tracking, geo localization, visual crowd analysis, object detection and categorization, shape from shading, etc. He has served as ACM and IEEE Distinguished Visitor Program speaker. He is a recipient of 2022 PAMI Mark Everingham Prize for pioneering human action recognition datasets; 2019 ACM SIGMM Technical Achievement award; 2020 ACM SIGMM Test of Time Honorable Mention Award for his paper “Visual attention detection in video sequences using spatiotemporal cues”; 2020 International Conference on Pattern Recognition (ICPR) Best Scientific Paper Award; an honorable mention for the ICCV 2005 Where Am I? Challenge Problem; 2013 NGA Best Research Poster Presentation; 2nd place in Grand Challenge at the ACM Multimedia 2013 conference; and runner up for the best paper award in ACM Multimedia Conference in 2005 and 2010. At UCF he has received Pegasus Professor Award; University Distinguished Research Award; Faculty Excellence in Mentoring Doctoral Students; Faculty Excellence in Mentoring Postdoctoral Scholars, Scholarship of Teaching and Learning award; Teaching Incentive Program award; and Research Incentive Award.