Getting Started with Impala: Interactive SQL for Apache Hadoop (Paperback)

John Russell

  • 出版商: O'Reilly
  • 出版日期: 2014-11-04
  • 定價: $1,020
  • 售價: 9.0$918
  • 語言: 英文
  • 頁數: 152
  • 裝訂: Paperback
  • ISBN: 1491905778
  • ISBN-13: 9781491905777
  • 相關分類: HadoopSQL
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Learn how to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala—the massively parallel processing SQL query engine for Apache Hadoop. The best practices in this practical guide help you design database schemas that not only interoperate with other Hadoop components, and are convenient for administers to manage and monitor, but also accommodate future expansion in data size and evolution of software capabilities.

Ideal for database developers and business analysts, Getting Started with Impala includes advice from Cloudera’s development team, as well as insights from its consulting engagements with customers.

  • Learn how Impala integrates with a wide range of Hadoop components
  • Attain high performance and scalability for huge data sets on production clusters
  • Explore common developer tasks, such as porting code to Impala and optimizing performance
  • Use tutorials for working with billion-row tables, date- and time-based values, and other techniques
  • Learn how to transition from rigid schemas to a flexible model that evolves as needs change
  • Take a deep dive into joins and the roles of statistics

商品描述(中文翻譯)

學習如何在大數據環境中使用Impala這個Apache Hadoop的大規模並行處理SQL查詢引擎,撰寫、調整和移植SQL查詢和其他語句。本實用指南中的最佳實踐幫助您設計與其他Hadoop組件互操作並且方便管理和監控的數據庫模式,同時還能應對數據大小的未來擴展和軟件功能的演進。

《Getting Started with Impala》適合數據庫開發人員和業務分析師,其中包含Cloudera開發團隊的建議,以及與客戶的咨詢合作中獲得的見解。

- 學習Impala如何與各種Hadoop組件集成
- 在生產集群上實現大數據集的高性能和可擴展性
- 探索常見的開發人員任務,例如將代碼移植到Impala並優化性能
- 使用教程處理十億行表、基於日期和時間的值以及其他技術
- 學習如何從嚴格的模式過渡到隨著需求變化而演進的靈活模型
- 深入研究聯接和統計數據的角色