Enterprise Data Workflows with Cascading (Paperback)

Paco Nathan

商品描述

There is an easier way to build Hadoop applications. With this hands-on book, you’ll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the intricacies of MapReduce.

Working with sample apps based on Java and other JVM languages, you’ll quickly learn Cascading’s streamlined approach to data processing, data filtering, and workflow optimization. This book demonstrates how this framework can help your business extract meaningful information from large amounts of distributed data.

  • Start working on Cascading example projects right away
  • Model and analyze unstructured data in any format, from any source
  • Build and test applications with familiar constructs and reusable components
  • Work with the Scalding and Cascalog Domain-Specific Languages
  • Easily deploy applications to Hadoop, regardless of cluster location or data size
  • Build workflows that integrate several big data frameworks and processes
  • Explore common use cases for Cascading, including features and tools that support them
  • Examine a case study that uses a dataset from the Open Data Initiative

商品描述(中文翻譯)

有一種更簡單的方法來建立 Hadoop 應用程式。這本實用書將教你如何使用 Cascading,這是一個開源的 Hadoop 抽象框架,讓你能夠輕鬆地創建和管理功能強大的企業級數據處理應用程式,而不需要學習 MapReduce 的細節。

通過使用基於 Java 和其他 JVM 語言的示例應用程式,你將迅速了解 Cascading 的簡化數據處理、數據過濾和工作流程優化方法。本書演示了這個框架如何幫助你的企業從大量分散數據中提取有意義的信息。

本書的內容包括:
- 立即開始使用 Cascading 的示例項目
- 對任何格式、任何來源的非結構化數據進行建模和分析
- 使用熟悉的結構和可重用組件來構建和測試應用程式
- 使用 Scalding 和 Cascalog 領域特定語言進行工作
- 輕鬆部署應用程式到 Hadoop,無論集群位置或數據大小如何
- 構建集成多個大數據框架和流程的工作流程
- 探索 Cascading 的常見用例,包括支持它們的功能和工具
- 研究一個使用開放數據倡議計畫的數據集的案例研究