Programming Pig (Paperback)

Alan Gates

  • 出版商: O'Reilly
  • 出版日期: 2011-10-23
  • 定價: $1,300
  • 售價: 2.3$299
  • 語言: 英文
  • 頁數: 224
  • 裝訂: Paperback
  • ISBN: 1449302645
  • ISBN-13: 9781449302641
  • 相關分類: Hadoop大數據 Big-data
  • 立即出貨(限量) (庫存=2)

買這商品的人也買了...

商品描述

This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets.

Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.

  • Delve into Pig’s data model, including scalar and complex data types
  • Write Pig Latin scripts to sort, group, join, project, and filter your data
  • Use Grunt to work with the Hadoop Distributed File System (HDFS)
  • Build complex data processing pipelines with Pig’s macros and modularity features
  • Embed Pig Latin in Python for iterative processing and other advanced tasks
  • Create your own load and store functions to handle data formats and storage mechanisms
  • Get performance tips for running scripts on Hadoop clusters in less time

商品描述(中文翻譯)

這本指南是學習和參考Apache Pig的理想工具,Apache Pig是在Hadoop上執行並行數據流的開源引擎。使用Pig,您可以批量處理數據,而無需創建完整的應用程序,這使您可以輕鬆地對新數據集進行實驗。

《Programming Pig》向新用戶介紹了Pig,並為有經驗的用戶提供了全面的內容,包括Pig Latin腳本語言、Grunt shell和用於擴展Pig的用戶定義函數(UDFs)等關鍵功能。如果您需要分析數據量達到TB級別,本書將向您展示如何使用Pig高效地完成這項任務。

本書內容包括:
- 深入了解Pig的數據模型,包括標量和複雜數據類型
- 編寫Pig Latin腳本以對數據進行排序、分組、連接、投影和過濾
- 使用Grunt與Hadoop分佈式文件系統(HDFS)進行操作
- 使用Pig的宏和模塊化功能構建複雜的數據處理流程
- 將Pig Latin嵌入Python中進行迭代處理和其他高級任務
- 創建自己的加載和存儲函數以處理數據格式和存儲機制
- 獲得在Hadoop集群上運行腳本的性能提示,以節省時間。