Vision Language Models: Building Vlms with Hugging Face
暫譯: 視覺語言模型:使用 Hugging Face 建構 VLMs

Noyan, Merve, Marafioti, Andrés, Farré, Miquel

  • 出版商: O'Reilly
  • 出版日期: 2026-07-14
  • 售價: $2,700
  • 貴賓價: 9.5$2,565
  • 語言: 英文
  • 頁數: 406
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 9798341624047
  • ISBN-13: 9798341624047
  • 相關分類: DeepLearningNatural Language ProcessingComputer Vision
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Vision language models (VLMs) combine computer vision and natural language processing to create powerful systems that can interpret, generate, and respond in multimodal contexts. Vision Language Models is a hands-on guide to building real-world VLMs using the most up-to-date stack of machine learning tools from Hugging Face, Meta (PyTorch), NVIDIA (Cuda), OpenAI (CLIP), and others, written by leading researchers and practitioners Merve Noyan, Miquel Farré, Andrés Marafioti, and Orr Zohar. From image captioning and document understanding to advanced zero-shot inference and retrieval-augmented generation (RAG), this book covers the full VLM application and development lifecycle.

Designed for ML engineers, data scientists, and developers, this guide distills cutting-edge VLM research into practical techniques. Readers will learn how to prepare datasets, select the right architectures, fine-tune and deploy models, and apply them to real-world tasks across a range of industries.

  • Explore core model architectures and alignment techniques
  • Train and fine-tune VLMs with Hugging Face, PyTorch, and others
  • Deploy models for applications like image search and captioning
  • Implement advanced inference strategies, from zero-shot to agentic systems
  • Build scalable VLM systems ready for production use

商品描述(中文翻譯)

視覺語言模型(VLMs)結合了計算機視覺和自然語言處理,創造出強大的系統,能夠在多模態環境中解釋、生成和回應。視覺語言模型是一本實用指南,教導讀者如何使用來自Hugging Face、Meta(PyTorch)、NVIDIA(Cuda)、OpenAI(CLIP)等最新的機器學習工具堆疊來構建現實世界的VLM,由領先的研究人員和實踐者Merve Noyan、Miquel Farré、Andrés Marafioti和Orr Zohar撰寫。從圖像標註和文檔理解到先進的零樣本推理和檢索增強生成(RAG),本書涵蓋了完整的VLM應用和開發生命周期。

本指南專為機器學習工程師、數據科學家和開發人員設計,將前沿的VLM研究提煉為實用技術。讀者將學習如何準備數據集、選擇合適的架構、微調和部署模型,並將其應用於各行各業的實際任務。

- 探索核心模型架構和對齊技術
- 使用Hugging Face、PyTorch等訓練和微調VLM
- 部署模型以用於圖像搜索和標註等應用
- 實施先進的推理策略,從零樣本到代理系統
- 構建可擴展的VLM系統,準備投入生產使用