Build a Text-To-Image Generator (from Scratch): With Transformers and Diffusions
暫譯: 從零開始建立文本到圖像生成器:使用 Transformers 和擴散模型
Liu, Mark
- 出版商: Manning
- 出版日期: 2025-12-30
- 售價: $2,100
- 貴賓價: 9.5 折 $1,995
- 語言: 英文
- 頁數: 360
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1633435423
- ISBN-13: 9781633435421
-
相關分類:
DeepLearning
海外代購書籍(需單獨結帳)
相關主題
商品描述
Get a free eBook (PDF or ePub) from Manning as well as access to the online liveBook format (and its AI assistant that will answer your questions in any language) when you purchase the print book. This book takes you step-by-step through creating your own AI models that can generate images from text. You'll explore two methods of image generation--vision transformers and diffusion models--and learn vital AI development techniques as you go. Dive into the powerful models behind AI image generators. The best way to learn is to build something from scratch, and in this book you'll build your very own diffusion model and vision transformer. As you work through each stage of development, you'll develop an understanding of how these models can be customized, applied, and integrated for impressive multimodal AI. Build a Text-to-Image Generator (from Scratch) teaches you how to: - Build and train models to generate high resolution images based on text descriptions
- Edit an existing image based on text prompts
- Build and train a model to add captions to images
- Build and train a vision transformer to classify images
- Fine-tune LLMs for downstream tasks such as classification, text or image generation
- Better differentiate real images from deepfakes About the technology AI-generated images appear everywhere from high-end advertising to casual social media feeds. Text-to-image tools like Dall-e, Midjourney, and Flux make it easy to create AI art, but how do they work? In this book, you'll find out by building your own text-to-image generator! About the book Build a Text-to-Image Generator (from Scratch) explores both transformer-based image generation and diffusion models. You'll work hands-on to build a pair of simple generation models that can classify images, automatically add captions, reconstruct images, and enhance existing graphics. Author Mark Liu guides you every step of the way with clear explanations, informative diagrams, and eye-opening examples you can build on your own laptop. What's inside - Build a vision transformer to classify images
- Edit images using text prompts
- Fine-tune image models About the reader Requires basic knowledge of generative AI models and intermediate Python skills. About the author Mark Liu is the founding director of the Master of Science in Finance program at the University of Kentucky. He is also the author of Learn Generative AI with PyTorch. Table of Contents Part 1
1 A tale of two models: Transformers and diffusions
2 Build a transformer
3 Classify images with a vision transformer
4 Add captions to images
Part 2
5 Generate images with diffusion models
6 Control what images to generate in diffusion models
7 Generate high-resolution images with diffusion models
Part 3
8 CLIP: A model to measure the similarity between image and text
9 Text-to-image generation with latent diffusion
10 A deep dive into Stable Diffusion
Part 4
11 VQGAN: Convert images into sequences of integers
12 A minimal implementation of DALL-E
Part 5
13 New developments and challenges in text-to-image generation
A Installing PyTorch and enabling GPU training locally and in Colab
- Edit an existing image based on text prompts
- Build and train a model to add captions to images
- Build and train a vision transformer to classify images
- Fine-tune LLMs for downstream tasks such as classification, text or image generation
- Better differentiate real images from deepfakes About the technology AI-generated images appear everywhere from high-end advertising to casual social media feeds. Text-to-image tools like Dall-e, Midjourney, and Flux make it easy to create AI art, but how do they work? In this book, you'll find out by building your own text-to-image generator! About the book Build a Text-to-Image Generator (from Scratch) explores both transformer-based image generation and diffusion models. You'll work hands-on to build a pair of simple generation models that can classify images, automatically add captions, reconstruct images, and enhance existing graphics. Author Mark Liu guides you every step of the way with clear explanations, informative diagrams, and eye-opening examples you can build on your own laptop. What's inside - Build a vision transformer to classify images
- Edit images using text prompts
- Fine-tune image models About the reader Requires basic knowledge of generative AI models and intermediate Python skills. About the author Mark Liu is the founding director of the Master of Science in Finance program at the University of Kentucky. He is also the author of Learn Generative AI with PyTorch. Table of Contents Part 1
1 A tale of two models: Transformers and diffusions
2 Build a transformer
3 Classify images with a vision transformer
4 Add captions to images
Part 2
5 Generate images with diffusion models
6 Control what images to generate in diffusion models
7 Generate high-resolution images with diffusion models
Part 3
8 CLIP: A model to measure the similarity between image and text
9 Text-to-image generation with latent diffusion
10 A deep dive into Stable Diffusion
Part 4
11 VQGAN: Convert images into sequences of integers
12 A minimal implementation of DALL-E
Part 5
13 New developments and challenges in text-to-image generation
A Installing PyTorch and enabling GPU training locally and in Colab
商品描述(中文翻譯)
購買印刷版書籍時,您將獲得 Manning 提供的免費電子書(PDF 或 ePub),以及在線 liveBook 格式的訪問權限(及其 AI 助手,能用任何語言回答您的問題)。
本書將逐步引導您創建自己的 AI 模型,這些模型可以從文本生成圖像。您將探索兩種圖像生成方法——視覺轉換器(vision transformers)和擴散模型(diffusion models),並在過程中學習重要的 AI 開發技術。深入了解 AI 圖像生成器背後的強大模型。學習的最佳方式是從零開始構建一些東西,在本書中,您將構建自己的擴散模型和視覺轉換器。在每個開發階段中,您將了解這些模型如何進行自定義、應用和整合,以實現令人印象深刻的多模態 AI。從零開始構建文本到圖像生成器教您如何: - 構建和訓練模型,以根據文本描述生成高解析度圖像- 根據文本提示編輯現有圖像
- 構建和訓練模型為圖像添加標題
- 構建和訓練視覺轉換器以分類圖像
- 微調 LLM 以進行下游任務,如分類、文本或圖像生成
- 更好地區分真實圖像和深度偽造圖像 關於技術 AI 生成的圖像無處不在,從高端廣告到隨意的社交媒體動態。像 Dall-e、Midjourney 和 Flux 這樣的文本到圖像工具使創建 AI 藝術變得簡單,但它們是如何工作的呢?在本書中,您將通過構建自己的文本到圖像生成器來找出答案! 關於本書 從零開始構建文本到圖像生成器 探索基於轉換器的圖像生成和擴散模型。您將親自動手構建一對簡單的生成模型,這些模型可以分類圖像、自動添加標題、重建圖像並增強現有圖形。作者 Mark Liu 將用清晰的解釋、資訊豐富的圖表和您可以在自己筆記本電腦上構建的啟發性範例,指導您每一步。 內容概覽 - 構建視覺轉換器以分類圖像
- 使用文本提示編輯圖像
- 微調圖像模型關於讀者 需要具備生成 AI 模型的基本知識和中級 Python 技能。 關於作者 Mark Liu 是肯塔基大學金融碩士課程的創始主任。他也是 使用 PyTorch 學習生成 AI 的作者。 目錄 第一部分
1 兩個模型的故事:轉換器和擴散
2 構建轉換器
3 使用視覺轉換器分類圖像
4 為圖像添加標題
第二部分
5 使用擴散模型生成圖像
6 控制擴散模型生成的圖像
7 使用擴散模型生成高解析度圖像
第三部分
8 CLIP:測量圖像和文本之間相似性的模型
9 使用潛在擴散進行文本到圖像生成
10 深入了解穩定擴散
第四部分
11 VQGAN:將圖像轉換為整數序列
12 DALL-E 的最小實現
第五部分
13 文本到圖像生成的新發展和挑戰
A 在本地和 Colab 中安裝 PyTorch 並啟用 GPU 訓練
作者簡介
Dr. Mark Liu is a tenured finance professor and the founding director of the Master of Science in Finance program at the University of Kentucky. He has more than 20 years of coding experience, a Ph.D. in finance from Boston College.
作者簡介(中文翻譯)
劉博士是肯塔基大學金融學碩士課程的終身教授及創始主任。他擁有超過20年的程式設計經驗,並持有波士頓學院的金融學博士學位。