Azure Databricks Cookbook: Accelerate and scale real-time analytics solutions using the Apache Spark-based analytics service
暫譯: Azure Databricks 食譜:使用基於 Apache Spark 的分析服務加速和擴展即時分析解決方案

Raj, Phani, Jaiswal, Vinod

  • 出版商: Packt Publishing
  • 出版日期: 2021-09-17
  • 售價: $1,940
  • 貴賓價: 9.5$1,843
  • 語言: 英文
  • 頁數: 448
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1789809711
  • ISBN-13: 9781789809718
  • 相關分類: Spark
  • 海外代購書籍(需單獨結帳)

商品描述

Key Features

  • Integrate with Azure Synapse Analytics, Cosmos DB, and Azure HDInsight Kafka Cluster to scale and analyze your projects and build pipelines
  • Use Databricks SQL to run ad hoc queries on your data lake and create dashboards
  • Productionize a solution using CI/CD for deploying notebooks and Azure Databricks Service to various environments

Book Description

Azure Databricks is a unified collaborative platform for performing scalable analytics in an interactive environment. The Azure Databricks Cookbook provides recipes to get hands-on with the analytics process, including ingesting data from various batch and streaming sources and building a modern data warehouse.

The book starts by teaching you how to create an Azure Databricks instance within the Azure portal, Azure CLI, and ARM templates. You'll work through clusters in Databricks and explore recipes for ingesting data from sources, including files, databases, and streaming sources such as Apache Kafka and EventHub. The book will help you explore all the features supported by Azure Databricks for building powerful end-to-end data pipelines. You'll also find out how to build a modern data warehouse by using Delta tables and Azure Synapse Analytics. Later, you'll learn how to write ad hoc queries and extract meaningful insights from the data lake by creating visualizations and dashboards with Databricks SQL. Finally, you'll deploy and productionize a data pipeline as well as deploy notebooks and Azure Databricks service using continuous integration and continuous delivery (CI/CD).

By the end of this Azure book, you'll be able to use Azure Databricks to streamline different processes involved in building data-driven apps.

What you will learn

  • Understand Databricks cluster options and when to use them
  • Read and write data from and to Azure sources such as ADLS Gen-2, EventHub, and more
  • Build a data warehouse in Azure Databricks
  • Perform ad hoc analysis on data lakes using Databricks SQL Analytics
  • Integrate with Azure Key Vault to access hidden data and Log Analytics for telemetry and monitoring
  • Integrate Databricks with Azure DevOps for version control and for deployment and to productionize the solution using CI/CD pipelines
  • Build a data processing pipeline for near real-time data analytics

Who this book is for

This recipe-based book is for data scientists, data engineers, big data professionals, and machine learning engineers who want to perform data analytics on their applications. Prior experience of working with Apache Spark and Azure is necessary to get the most out of this book.

商品描述(中文翻譯)

#### 主要特點

- 與 Azure Synapse Analytics、Cosmos DB 和 Azure HDInsight Kafka Cluster 整合,以擴展和分析您的專案並建立管道
- 使用 Databricks SQL 在您的資料湖上執行即時查詢並創建儀表板
- 使用 CI/CD 將解決方案生產化,將筆記本和 Azure Databricks 服務部署到各種環境

#### 書籍描述

Azure Databricks 是一個統一的協作平台,用於在互動環境中執行可擴展的分析。Azure Databricks Cookbook 提供了實作分析過程的食譜,包括從各種批次和串流來源攝取資料以及建立現代資料倉儲。

本書首先教您如何在 Azure 入口網站、Azure CLI 和 ARM 模板中創建 Azure Databricks 實例。您將學習如何在 Databricks 中操作叢集,並探索從來源(包括檔案、資料庫和串流來源如 Apache Kafka 和 EventHub)攝取資料的食譜。本書將幫助您探索 Azure Databricks 支援的所有功能,以建立強大的端到端資料管道。您還將了解如何使用 Delta 表和 Azure Synapse Analytics 建立現代資料倉儲。接下來,您將學習如何撰寫即時查詢,並通過使用 Databricks SQL 創建可視化和儀表板來從資料湖中提取有意義的見解。最後,您將部署和生產化資料管道,並使用持續整合和持續交付(CI/CD)部署筆記本和 Azure Databricks 服務。

在本書結束時,您將能夠使用 Azure Databricks 簡化構建資料驅動應用程式的不同過程。

#### 您將學到什麼

- 了解 Databricks 叢集選項及其使用時機
- 從 Azure 來源(如 ADLS Gen-2、EventHub 等)讀取和寫入資料
- 在 Azure Databricks 中建立資料倉儲
- 使用 Databricks SQL Analytics 對資料湖進行即時分析
- 與 Azure Key Vault 整合以訪問隱藏資料,並使用 Log Analytics 進行遙測和監控
- 將 Databricks 與 Azure DevOps 整合以進行版本控制和部署,並使用 CI/CD 管道將解決方案生產化
- 建立近即時資料分析的資料處理管道

#### 本書適合誰

這本基於食譜的書籍適合數據科學家、數據工程師、大數據專業人士和機器學習工程師,他們希望對其應用程式進行資料分析。具備 Apache Spark 和 Azure 的工作經驗是充分利用本書的必要條件。

作者簡介

Phani Raj is an Azure data architect at Microsoft. He has more than 12 years of IT experience and works primarily on the architecture, design, and development of complex data warehouses, OLTP, and big data solutions on Azure for customers across the globe.

Vinod Jaiswal is a data engineer at Microsoft. He has more than 13 years of IT experience and works primarily on the architecture, design, and development of complex data warehouses, OLTP, and big data solutions on Azure using Azure data services for a variety of customers. He has also worked on designing and developing real-time data processing and analytics reports from the data ingested from streaming systems using Azure Databricks.

作者簡介(中文翻譯)

Phani Raj 是微軟的 Azure 數據架構師。他擁有超過 12 年的 IT 經驗,主要負責為全球客戶在 Azure 上架構、設計和開發複雜的數據倉庫、OLTP 和大數據解決方案。

Vinod Jaiswal 是微軟的數據工程師。他擁有超過 13 年的 IT 經驗,主要負責使用 Azure 數據服務為各種客戶架構、設計和開發複雜的數據倉庫、OLTP 和大數據解決方案。他還參與設計和開發從流式系統中獲取數據的實時數據處理和分析報告,使用 Azure Databricks。

目錄大綱

  1. Creating an Azure Databricks Service
  2. Reading and Writing Data from and to Various Azure Services and File Formats
  3. Understanding Spark Query Execution
  4. Working with Streaming Data
  5. Integrating with Azure Key-Vault, App Configuration and Log Analytics
  6. Exploring Delta Lake in Azure Databricks
  7. Implementing Near-Real-Time Analytics and Building Modern Data Warehouse
  8. Azure Databricks SQL Analytics
  9. DevOps Integrations and Implementing CI/CD for Azure Databricks
  10. Understanding Security and Monitoring in Azure Databricks

目錄大綱(中文翻譯)


  1. Creating an Azure Databricks Service

  2. Reading and Writing Data from and to Various Azure Services and File Formats

  3. Understanding Spark Query Execution

  4. Working with Streaming Data

  5. Integrating with Azure Key-Vault, App Configuration and Log Analytics

  6. Exploring Delta Lake in Azure Databricks

  7. Implementing Near-Real-Time Analytics and Building Modern Data Warehouse

  8. Azure Databricks SQL Analytics

  9. DevOps Integrations and Implementing CI/CD for Azure Databricks

  10. Understanding Security and Monitoring in Azure Databricks

最後瀏覽商品 (19)