Modern Computer Vision with PyTorch: Explore deep learning concepts and implement over 50 real-world image applications

Ayyadevara, V. Kishore, Reddy, Yeshwanth




Get to grips with deep learning techniques for building image processing applications using PyTorch with the help of code notebooks and test questions

Key Features

  • Implement solutions to 50 real-world computer vision applications using PyTorch
  • Understand the theory and working mechanisms of neural network architectures and their implementation
  • Discover best practices using a custom library created especially for this book

Book Description

Deep learning is the driving force behind many recent advances in various computer vision (CV) applications. This book takes a hands-on approach to help you to solve over 50 CV problems using PyTorch1.x on real-world datasets.

You'll start by building a neural network (NN) from scratch using NumPy and PyTorch and discover best practices for tweaking its hyperparameters. You'll then perform image classification using convolutional neural networks and transfer learning and understand how they work. As you progress, you'll implement multiple use cases of 2D and 3D multi-object detection, segmentation, human-pose-estimation by learning about the R-CNN family, SSD, YOLO, U-Net architectures, and the Detectron2 platform. The book will also guide you in performing facial expression swapping, generating new faces, and manipulating facial expressions as you explore autoencoders and modern generative adversarial networks. You'll learn how to combine CV with NLP techniques, such as LSTM and transformer, and RL techniques, such as Deep Q-learning, to implement OCR, image captioning, object detection, and a self-driving car agent. Finally, you'll move your NN model to production on the AWS Cloud.

By the end of this book, you'll be able to leverage modern NN architectures to solve over 50 real-world CV problems confidently.

What You Will Learn

  • Train a NN from scratch with NumPy and PyTorch
  • Implement 2D and 3D multi-object detection and segmentation
  • Generate digits and DeepFakes with autoencoders and advanced GANs
  • Manipulate images using CycleGAN, Pix2PixGAN, StyleGAN2, and SRGAN
  • Combine CV with NLP to perform OCR, image captioning, and object detection
  • Combine CV with reinforcement learning to build agents that play pong and self-drive a car
  • Deploy a deep learning model on the AWS server using FastAPI and Docker
  • Implement over 35 NN architectures and common OpenCV utilities

Who this book is for

This book is for beginners to PyTorch and intermediate-level machine learning practitioners who are looking to get well-versed with computer vision techniques using deep learning and PyTorch. If you are just getting started with neural networks, you'll find the use cases accompanied by notebooks in GitHub present in this book useful. Basic knowledge of the Python programming language and machine learning is all you need to get started with this book.



- 使用PyTorch實現50個真實世界的電腦視覺應用解決方案
- 理解神經網絡架構的理論和工作機制以及其實現方式
- 探索使用專為本書創建的自定義庫的最佳實踐


您將從頭開始使用NumPy和PyTorch構建神經網絡(NN),並了解微調超參數的最佳實踐。然後,您將使用卷積神經網絡和遷移學習進行圖像分類,並了解它們的工作原理。隨著學習的進展,您將實現2D和3D多對象檢測、分割、人體姿態估計等多個用例,並了解R-CNN系列、SSD、YOLO、U-Net架構和Detectron2平台。本書還將指導您進行面部表情交換、生成新面孔和操縱面部表情,同時探索自編碼器和現代生成對抗網絡。您將學習如何將電腦視覺與LSTM和Transformer等NLP技術以及Deep Q-learning等RL技術相結合,實現OCR、圖像標題生成、物體檢測和自駕車代理。最後,您將在AWS Cloud上將NN模型部署到生產環境中。


- 使用NumPy和PyTorch從頭開始訓練NN
- 實現2D和3D多對象檢測和分割
- 使用自編碼器和先進的GAN生成數字和DeepFakes
- 使用CycleGAN、Pix2PixGAN、StyleGAN2和SRGAN操縱圖像
- 將電腦視覺與NLP相結合,實現OCR、圖像標題生成和物體檢測
- 將電腦視覺與強化學習相結合,構建玩乒乓球和自駕車的代理
- 使用FastAPI和Docker在AWS服務器上部署深度學習模型
- 實現超過35種NN架構和常見的OpenCV工具