Scaling Big Data with Hadoop and Solr, 2/e (Paperback)

Hrishikesh Vijay Karambelkar



Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

About This Book

  • Explore different approaches to making Solr work on big data ecosystems besides Apache Hadoop
  • Improve search performance while working with big data
  • A practical guide that covers interesting, real-life use cases for big data search along with sample code

Who This Book Is For

This book is aimed at developers, designers, and architects who would like to build big data enterprise search solutions for their customers or organizations. No prior knowledge of Apache Hadoop and Apache Solr/Lucene technologies is required.

What You Will Learn

  • Understand Apache Hadoop, its ecosystem, and Apache Solr
  • Explore industry-based architectures by designing a big data enterprise search with their applicability and benefits
  • Integrate Apache Solr with big data technologies such as Cassandra to enable better scalability and high availability for big data
  • Optimize the performance of your big data search platform with scaling data
  • Write MapReduce tasks to index your data
  • Configure your Hadoop instance to handle real-world big data problems
  • Work with Hadoop and Solr using real-world examples to benefit from their practical usage
  • Use Apache Solr as a NoSQL database

In Detail

Together, Apache Hadoop and Apache Solr help organizations resolve the problem of information extraction from big data by providing excellent distributed faceted search capabilities.

This book will help you learn everything you need to know to build a distributed enterprise search platform as well as optimize this search to a greater extent, resulting in the maximum utilization of available resources. Starting with the basics of Apache Hadoop and Solr, the book covers advanced topics of optimizing search with some interesting real-world use cases and sample Java code.

This is a step-by-step guide that will teach you how to build a high performance enterprise search while scaling data with Hadoop and Solr in an effortless manner.


了解、設計、構建和優化您的大數據搜索引擎,使用Hadoop和Apache Solr


- 探索除了Apache Hadoop之外,使Solr在大數據生態系統中運作的不同方法
- 在處理大數據時提高搜索性能
- 一本實用指南,涵蓋了大數據搜索的有趣實際用例和示例代碼

本書適合對於為客戶或組織構建大數據企業搜索解決方案的開發人員、設計師和架構師。不需要事先了解Apache Hadoop和Apache Solr/Lucene技術。


- 了解Apache Hadoop、其生態系統和Apache Solr
- 通過設計具有適用性和優勢的大數據企業搜索,探索基於行業的架構
- 將Apache Solr與Cassandra等大數據技術集成,實現大數據的更好可擴展性和高可用性
- 通過擴展數據來優化大數據搜索平台的性能
- 編寫MapReduce任務來索引數據
- 配置您的Hadoop實例以處理現實世界的大數據問題
- 使用真實世界的示例來使用Hadoop和Solr,從中受益
- 將Apache Solr用作NoSQL數據庫


Apache Hadoop和Apache Solr共同幫助組織解決從大數據中提取信息的問題,提供優秀的分佈式分面搜索能力。

本書將幫助您學習構建分佈式企業搜索平台所需的一切,並將此搜索優化到更大程度,從而實現可用資源的最大利用。從Apache Hadoop和Solr的基礎知識開始,本書涵蓋了優化搜索的高級主題,並提供了一些有趣的實際用例和示例Java代碼。