Apache Solr for Indexing Data

Sachin Handiekar, Anshul Johri

  • 出版商: Packt Publishing
  • 出版日期: 2015-12-22
  • 售價: $1,550
  • 貴賓價: 9.5$1,473
  • 語言: 英文
  • 頁數: 160
  • 裝訂: Paperback
  • ISBN: 1783553235
  • ISBN-13: 9781783553235
  • 相關分類: 全文搜尋引擎 Full-text-search
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Enhance your Solr indexing experience with advanced techniques and the built-in functionalities available in Apache Solr

About This Book

  • Learn about distributed indexing and real-time optimization to change index data on fly
  • Index data from various sources and web crawlers using built-in analyzers and tokenizers
  • This step-by-step guide is packed with real-life examples on indexing data

Who This Book Is For

This book is for developers who want to increase their experience of indexing in Solr by learning about the various index handlers, analyzers, and methods available in Solr. Beginner level Solr development skills are expected.

What You Will Learn

  • Get to know the basic features of Solr indexing and the analyzers/tokenizers available
  • Index XML/JSON data in Solr using the HTTP Post tool and CURL command
  • Work with Data Import Handler to index data from a database
  • Use Apache Tika with Solr to index word documents, PDFs, and much more
  • Utilize Apache Nutch and Solr integration to index crawled data from web pages
  • Update indexes in real-time data feeds
  • Discover techniques to index multi-language and distributed data in Solr
  • Combine the various indexing techniques into a real-life working example of an online shopping web application

In Detail

Apache Solr is a widely used, open source enterprise search server that delivers powerful indexing and searching features. These features help fetch relevant information from various sources and documentation. Solr also combines with other open source tools such as Apache Tika and Apache Nutch to provide more powerful features.

This fast-paced guide starts by helping you set up Solr and get acquainted with its basic building blocks, to give you a better understanding of Solr indexing. You'll quickly move on to indexing text and boosting the indexing time. Next, you'll focus on basic indexing techniques, various index handlers designed to modify documents, and indexing a structured data source through Data Import Handler.

Moving on, you will learn techniques to perform real-time indexing and atomic updates, as well as more advanced indexing techniques such as de-duplication. Later on, we'll help you set up a cluster of Solr servers that combine fault tolerance and high availability. You will also gain insights into working scenarios of different aspects of Solr and how to use Solr with e-commerce data.

By the end of the book, you will be competent and confident working with indexing and will have a good knowledge base to efficiently program elements.

Style and approach

This fast-paced guide is packed with examples that are written in an easy-to-follow style, and are accompanied by detailed explanation. Working examples are included to help you get better results for your applications.

商品描述(中文翻譯)

增強您在Apache Solr中的索引體驗,並利用內建功能進行高級技術。

關於本書

- 學習分散索引和實時優化,以便在索引數據時即時更改
- 使用內建的分析器和標記器從各種來源和網絡爬蟲索引數據
- 這本逐步指南充滿了索引數據的實際示例

本書適合對Solr索引有興趣的開發人員,通過學習Solr中可用的各種索引處理程序、分析器和方法,來提高他們的索引經驗。預期讀者具備初級水平的Solr開發技能。

您將學到什麼

- 了解Solr索引的基本功能以及可用的分析器/標記器
- 使用HTTP Post工具和CURL命令在Solr中索引XML/JSON數據
- 使用數據導入處理程序從數據庫索引數據
- 使用Apache Tika和Solr一起索引Word文檔、PDF等
- 利用Apache Nutch和Solr集成從網頁索引爬取的數據
- 在實時數據源中更新索引
- 探索在Solr中索引多語言和分散數據的技術
- 將各種索引技術結合到一個實際的網上購物網站應用程序的工作示例中

詳細內容

Apache Solr是一個廣泛使用的開源企業搜索服務器,提供強大的索引和搜索功能。這些功能有助於從各種來源和文檔中獲取相關信息。Solr還與其他開源工具(如Apache Tika和Apache Nutch)結合,提供更強大的功能。

這本快節奏的指南從幫助您設置Solr並熟悉其基本組件開始,以便更好地理解Solr索引。您將迅速轉向索引文本並提高索引速度。接下來,您將專注於基本索引技術,各種設計用於修改文檔的索引處理程序,以及通過數據導入處理程序索引結構化數據源。

隨著學習的深入,您將學習執行實時索引和原子更新的技術,以及更高級的索引技術,如去重。之後,我們將幫助您設置一個結合容錯和高可用性的Solr服務器集群。您還將獲得有關Solr不同方面的工作場景以及如何使用Solr處理電子商務數據的見解。

通過閱讀本書,您將能夠熟練且自信地處理索引,並具備高效編程元素的良好知識基礎。

風格和方法

這本快節奏的指南充滿了易於理解的示例,並附有詳細的解釋。我們提供了工作示例,以幫助您為應用程序獲得更好的結果。