Visual Question Answering: From Theory to Application
暫譯: 視覺問答:從理論到應用
Wu, Qi, Wang, Peng, Wang, Xin
- 出版商: Springer
- 出版日期: 2022-05-14
- 售價: $6,390
- 貴賓價: 9.5 折 $6,071
- 語言: 英文
- 頁數: 240
- 裝訂: Hardcover - also called cloth, retail trade, or trade
- ISBN: 9811909636
- ISBN-13: 9789811909634
-
相關翻譯:
視覺問答:理論與實踐 (簡中版)
相關主題
商品描述
Visual Question Answering (VQA) usually combines visual inputs like image and video with a natural language question concerning the input and generates a natural language answer as the output. This is by nature a multi-disciplinary research problem, involving computer vision (CV), natural language processing (NLP), knowledge representation and reasoning (KR), etc.
Further, VQA is an ambitious undertaking, as it must overcome the challenges of general image understanding and the question-answering task, as well as the difficulties entailed by using large-scale databases with mixed-quality inputs. However, with the advent of deep learning (DL) and driven by the existence of advanced techniques in both CV and NLP and the availability of relevant large-scale datasets, we have recently seen enormous strides in VQA, with more systems and promising results emerging.
This book provides a comprehensive overview of VQA, covering fundamental theories, models, datasets, and promising future directions. Given its scope, it can be used as a textbook on computer vision and natural language processing, especially for researchers and students in the area of visual question answering. It also highlights the key models used in VQA.
商品描述(中文翻譯)
視覺問題回答(Visual Question Answering, VQA)通常將視覺輸入(如圖像和視頻)與有關該輸入的自然語言問題結合,並生成自然語言答案作為輸出。這本質上是一個跨學科的研究問題,涉及計算機視覺(Computer Vision, CV)、自然語言處理(Natural Language Processing, NLP)、知識表示與推理(Knowledge Representation and Reasoning, KR)等領域。
此外,VQA 是一項雄心勃勃的工作,因為它必須克服一般圖像理解和問題回答任務的挑戰,以及使用混合質量輸入的大型數據庫所帶來的困難。然而,隨著深度學習(Deep Learning, DL)的興起,以及計算機視覺和自然語言處理領域中先進技術的存在和相關大型數據集的可用性,我們最近在 VQA 方面看到了巨大的進展,出現了更多系統和有希望的結果。
本書提供了 VQA 的全面概述,涵蓋基本理論、模型、數據集和有前景的未來方向。考慮到其範疇,本書可以作為計算機視覺和自然語言處理的教科書,特別適合從事視覺問題回答領域的研究人員和學生。它還突出了 VQA 中使用的關鍵模型。
作者簡介
Dr. Qi Wu is Senior Lecturer at the University of Adelaide and Chief Investigator at the ARC Centre of Excellence for Robotic Vision. He is also Director of Vision-and-Language Methods at the Australian Institute for Machine Learning. Dr Wu has been in the Computer Vision field for 10 years and he has a strong track record, having pioneered the field of Vision-and-Language, one of the most interesting and technically challenging areas of Computer Vision. This area, which has emerged over the last 5 years, represents the application of computer vision technology to problems that are closer to Artificial Intelligence. Dr Wu has made breakthroughs in methods and conceptual understanding to advance the field and is recognised as an international leader in the discipline. Beyond publishing some of the seminal papers in the area, he has organised a series of workshops in CVPR, ICCV and ACL. and authored key benchmarks that define the field. Recently, he led a team that won second place in VATEX Video Captioning Challenge, the first place in both TextVQA Challenge and MedicalVQA Challenge. His achievements have been recognised with the Australian Academy of Science J G Russel Award in 2019, one of four awards to ECRs across Australia; and an NVIDIA Pioneer Research Award.
Dr. Peng Wang is Professor at the School of Computer Science, Northwestern Polytechnical University, China. He previously served at the School of Computer Science, University of Adelaide, for four years. His research interests include computer vision, machine learning, and artificial intelligence.
Dr. Xin Wang is currently Assistant Professor at the Department of Computer Science and Technology, Tsinghua University. His research interests include cross-modal multimedia intelligence and inferable recommendations in social media. He has published several high-quality research papers for top conferences including ICML, KDD, WWW, SIGIR ACM Multimedia, etc. In addition to being selected for the 2017 China Postdoctoral innovative talents supporting program, he received the ACM China Rising Star Award in 2020.
Dr. Xiaodong He is Deputy Managing Director of JD AI Research; Head of the Deep Learning, NLP and Speech Lab; and Technical Vice President of JD.com. He is also Affiliate Professor at the University of Washington (Seattle), where he serves on doctoral supervisory committees. His research interests are mainly in artificial intelligence areas including deep learning, natural language, computer vision, speech, information retrieval, and knowledge representation. He has published more than 100 papers in ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, NIPS, ICLR, ICASSP, Proc. IEEE, IEEE TASLP, IEEE SPM, and other venues. He has received several awards including the Outstanding Paper Award at ACL 2015. He is Co-inventor of the DSSM, which is now broadly applied to language, vision, IR, and knowledge representation tasks. He also led the development of the CaptionBot, the world-first image captioning cloud service, deployed in 2016. He and colleagues have won major AI challenges including the 2008 NIST MT Eval, IWSLT 2011, COCO Captioning Challenge 2015, and VQA 2017. His work has been widely integrated into influential software and services including Microsoft Image Caption Services, Bing & Ads, Seeing AI, Word, and PowerPoint. He has held editorial positions with several IEEE journals, served as Area Chair for NAACL-HLT 2015 and served on the organizing committees/program committees of major speech and language processing conferences. He is IEEE Fellow and Member of the ACL.
Wenwu Zhu is currently Professor in the Department of Computer Science and Technology at Tsinghua University and Vice Dean of National Research Center for Information Science and Technology. Prior to his current post, he was Senior Researcher and Research Manager at Microsoft Research Asia. He was Chief Scientist and Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. He received his Ph.D. degree from New York University in 1996.
His current research interests are in the area of data-driven multimedia networking and multimedia intelligence. He has published over 350 referred papers and is Inventor or Co-inventor of over 50 patents. He received eight Best Paper Awards, including ACM Multimedia 2012 and IEEE Transactions on Circuits and Systems for Video Technology in 2001 and 2019.
He served as EiC for IEEE Transactions on Multimedia (2017-2019). He serves as Chair of the steering committee for IEEE Transactions on Multimedia, and he serves as Associate EiC for IEEE Transactions for Circuits and Systems for Video technology. He serves as General Co-Chair for ACM Multimedia 2018 and ACM CIKM 2019, respectively. He is AAAS Fellow, IEEE Fellow, SPIE Fellow, and Member of The Academy of Europe (Academia Europaea).
作者簡介(中文翻譯)
Dr. Qi Wu 是阿德萊德大學的高級講師及澳洲研究理事會機器人視覺卓越中心的首席研究員。他同時也是澳洲機器學習研究所的視覺與語言方法主任。Wu 博士在計算機視覺領域已有十年的經驗,並且在視覺與語言這一技術挑戰性極高且極具趣味性的領域中開創了先河。這一領域在過去五年中逐漸興起,代表了計算機視覺技術應用於更接近人工智慧的問題。Wu 博士在方法和概念理解上取得了突破,推進了該領域的發展,並被認可為該學科的國際領導者。除了發表一些該領域的開創性論文外,他還在 CVPR、ICCV 和 ACL 組織了一系列研討會,並撰寫了定義該領域的關鍵基準。最近,他帶領的團隊在 VATEX 影片標題挑戰賽中獲得第二名,在 TextVQA 挑戰賽和 MedicalVQA 挑戰賽中均獲得第一名。他的成就於 2019 年獲得澳洲科學院 J G Russel 獎,這是頒發給全澳洲早期研究者的四個獎項之一;並且獲得 NVIDIA 先驅研究獎。
Dr. Peng Wang 是中國西北工業大學計算機科學學院的教授。他曾在阿德萊德大學計算機科學學院任職四年。他的研究興趣包括計算機視覺、機器學習和人工智慧。
Dr. Xin Wang 目前是清華大學計算機科學與技術系的助理教授。他的研究興趣包括跨模態多媒體智能和社交媒體中的可推斷推薦。他已在 ICML、KDD、WWW、SIGIR、ACM Multimedia 等頂級會議上發表了多篇高質量的研究論文。除了被選為 2017 年中國博士後創新人才支持計劃外,他於 2020 年獲得 ACM 中國新星獎。
Dr. Xiaodong He 是京東人工智慧研究院的副總經理;深度學習、自然語言處理和語音實驗室的負責人;以及京東的技術副總裁。他同時是華盛頓大學(西雅圖)的兼任教授,並在博士生指導委員會中任職。他的研究興趣主要集中在人工智慧領域,包括深度學習、自然語言、計算機視覺、語音、信息檢索和知識表示。他在 ACL、EMNLP、NAACL、CVPR、SIGIR、WWW、CIKM、NIPS、ICLR、ICASSP、Proc. IEEE、IEEE TASLP、IEEE SPM 等多個會議上發表了超過 100 篇論文。他獲得了多個獎項,包括 2015 年 ACL 傑出論文獎。他是 DSSM 的共同發明人,該技術目前廣泛應用於語言、視覺、信息檢索和知識表示任務。他還主導了 CaptionBot 的開發,這是全球首個圖像標題雲服務,於 2016 年部署。他和同事們贏得了多個重要的人工智慧挑戰,包括 2008 年 NIST MT Eval、IWSLT 2011、COCO Captioning Challenge 2015 和 VQA 2017。他的工作已廣泛整合進影響力軟體和服務中,包括 Microsoft Image Caption Services、Bing & Ads、Seeing AI、Word 和 PowerPoint。他曾擔任多個 IEEE 期刊的編輯職位,並在 NAACL-HLT 2015 擔任區域主席,還參與了多個主要語音和語言處理會議的組織委員會/程序委員會。他是 IEEE Fellow 和 ACL 會員。
Wenwu Zhu 目前是清華大學計算機科學與技術系的教授,並擔任國家信息科學與技術研究中心的副院長。在目前的職位之前,他曾在微軟亞洲研究院擔任高級研究員和研究經理。2004 年至 2008 年,他在英特爾中國研究所擔任首席科學家和主任。他在 1996 年至 1999 年期間在貝爾實驗室新澤西州擔任技術人員。他於 1996 年在紐約大學獲得博士學位。
他目前的研究興趣集中在數據驅動的多媒體網絡和多媒體智能領域。他已發表超過 350 篇經過審核的論文,並擁有或共同擁有超過 50 項專利。他獲得了八項最佳論文獎,包括 2012 年 ACM Multimedia 和 2001 年及 2019 年 IEEE Transactions on Circuits and Systems for Video Technology。
他曾擔任 IEEE Transactions on Multimedia 的主編(2017-2019)。他擔任 IEEE Transactions on Multimedia 的指導委員會主席,並擔任 IEEE Transactions for Circuits and Systems for Video Technology 的副主編。他分別擔任 ACM Multimedia 2018 和 ACM CIKM 2019 的總共同主席。他是 AAAS Fellow、IEEE Fellow、SPIE Fellow,並且是歐洲學院(Academia Europaea)的成員。