Parallel Database Techniques

Mahdi Abdelguerfi, Kam-Fai Wong

  • 出版商: Wiley
  • 出版日期: 1998-08-13
  • 售價: $3,660
  • 貴賓價: 9.5$3,477
  • 語言: 英文
  • 頁數: 230
  • 裝訂: Hardcover
  • ISBN: 0818683988
  • ISBN-13: 9780818683985
  • 相關分類: 資料庫
  • 海外代購書籍(需單獨結帳)




The use of parallel processing technology in the next generation of Database Management Systems (DBMSs) makes it possible to meet new and challenging requirements. Database technology in rapidly expanding new application areas brings unique challenges such as increased functionality and efficient handling of very large heterogeneous databases.

Abdelguerfi and Wong present the latest techniques in parallel relational databases illustrating high-performance achievements in parallel database systems. The text is structured according to the overall architecture of a parallel database system presenting various techniques that may be adopted to the design of parallel database software and hardware execution environments. These techniques can directly or indirectly lead to high-performance parallel database implementation.

The book's main focus follows the authors' engineering model: A survey of parallel query optimization techniques for requests involving multi-way joins; A new technique for a join operation that can be adopted in the local optimization stage; A framework for recovery in parallel database systems using the ACTA formalism; The architectural details of NCR's new Petabyte multimedia database system; A description of the Super Database Computer (SDC-II); A case study for a shared-nothing parallel database server that analyzes and compares the effectiveness of five data placement techniques.

Table of Contents:

1 Introduction.

1.1 Background.

1.2 Parallel Database Systems.

1.2.1 Computation Model.

1.2.2 Engineering Model.

1.3 About this Manuscript.


I: Request Manager.

2 Designing an Optimizer for Parallel Relational Systems.

2.1 Introduction.

2.2 Overall Design Issues.

2.2.1 Design a Simple Parallel Execution Model.

2.2.2 The Two-Phase Approach.

2.2.3 Parallelizing is Adding Information!

2.2.4 Two-Phase versus Parallel Approaches.

2.3 Parallelization.

2.3.1 Kinds of Parallelism.

2.3.2 Specifying Parallel Execution.

2.4 Search Space.

2.4.1 Slicing Hash Join Trees.

2.4.2 Search Space Size.

2.4.3 Heuristics.

2.4.4 The Two-Phase Heuristics.

2.5 Cost Model.

2.5.1 Exceptions to the Principle of Optimality.

2.5.2 Resources.

2.5.3 Skew and Size Model.

2.5.4 The Cost Function.

2.6 Search Strategies.

2.6.1 Deterministic Search Strategies.

2.6.2 Randomized Strategies.

2.7 Conclusion.


3 New Approaches to Parallel Join Utilizing Page Connectivity Information.

3.1 Introduction.

3.2 The Environment and a Motivating Example.

3.3 The Methodology.

3.3.1 Definition of Parameters.

3.3.2 The Balancing Algorithm.

3.3.3 Schedules for Reading Join Components and Data Pages.

3.4 Performance Analysis.

3.4.1 The Evaluation Method.

3.4.2 Evaluation Results.

3.5 Concluding Remarks and Future Work.


4 A Performance Evaluation Tool for Parallel Database Systems.

4.1 Introduction.

4.2 Performance Evaluation Methods.

4.2.1 Analytical Modeling.

4.2.2 Benchmarks.

4.2.3 Observations.

4.3 The Software Testpilot.

4.3.1 The Experiment Specification.

4.3.2 The Performance Assessment Cycle.

4.3.3 The System Interface.

4.4 The Software Testpilot and Oracle/Ncube.

4.4.1 Database System Performance Assessment.

4.4.2 The Oracle/Ncube Interface.

4.5 Preliminary Results.

4.6 Conclusion.


5 Load Placement in Distributed High-Performance Database Systems.

5.1 Introduction.

5.2 Investigated System.

5.2.1 System Architecture.

5.2.2 Load Scenarios.

5.2.3 Trace Analysis.

5.2.4 Load Setup.

5.3 Load Placement Strategies Investigated.

5.4 Scheduling Strategies for Transactions.

5.5 Simulation Results.

5.5.1 Influence of Scheduling.

5.5.2 Evaluation of the Load Placement Strategies.

5.5.3 Lessons Learned.

5.5.4 Decision Parameters Used.

5.6 Conclusion and Open Issues.


II: Parallel Machine Architecture.

6 Modeling Recovery in Client-Server Database Systems.

6.1 Introduction.

6.2 Uniprocessor Recovery and Formal.

Approach to Modeling Recovery.

6.2.1 Basic Formal Concepts.

6.2.2 Logging Mechanisms.

6.2.3 Runtime Policies for Ensuring Correctness.

6.2.4 Data Structures Maintained for Efficient Recovery.

6.2.5 Restart Recovery--The ARIES Approach.

6.3 LSN Sequencing Techniques for Multinode Systems.

6.4 Recovery in Client-Server Database Systems.

6.4.1 Client-Server EXODUS (ESM-CS).

6.4.2 Client-Server ARIES (ARIES/CSA).

6.4.3 Shared Nothing Clients with Disks (CD).

6.4.4 Summary of Recovery Approaches in Client-Server Architectures.

6.5 Conclusion.


7 Parallel Strategies for a Petabyte Multimedia Database Computer.

7.1 Introduction.

7.2 Multimedia Data Warehouse, Databases, and Applications.

7.2.1 Three Waves of Multimedia Database Development.

7.2.2 National Medical Practice Knowledge Bank Application.

7.3 Massively Parallel Architecture, Infrastructure, and Technology.

7.3.1 Parallelism.

7.4 Teradata-MM Architecture, Framework, and New Concepts.

7.4.1 Teradata-MM Architecture.

7.4.2 Key New Concepts.

7.4.3 SQL3.

7.4.4 Federated Coordinator.

7.4.5 Teradata Multimedia Object Server.

7.5 Parallel UDF Execution Analysis.

7.5.1 UDF Optimizations.

7.5.2 PRAGMA Facility.

7.5.3 UDF Value Persistence Facility.

7.5.4 Spatial Indices for Content-Based Querying.

7.6 Conclusion.


8 The MEDUSA Project.

8.1 Introduction.

8.2 Indexing and Data Partitioning.

8.2.1 Standard Systems.

8.2.2 Grid Files.

8.3 Dynamic Load Balancing.

8.3.1 Data Access Frequency.

8.3.2 Data Distribution.

8.3.3 Query Partitioning.

8.4 The MEDUSA Project.

8.4.1 The MEDUSA Architecture.

8.4.2 Software.

8.4.3 Grid File Implementation.

8.4.4 Load Balancing Strategy.

8.5 MEDUSA Performance Results.

8.5.1 Test Configuration.

8.5.2 Transaction Throughput.

8.5.3 Speedup.

8.5.4 Load Balancing Test Results.

8.6 Conclusions.


III: Partitioned Data Store.

9 System Software of the Super Database Computer SDC-II.

9.1 Introduction.

9.2 Architectural Overview of the SDC-II.

9.3 Design and Organization of the SDC-II System Software.

9.3.1 Parallel Execution Model.

9.3.2 I/O Model and Buffer Management Strategy for Bulk Data Transfer.

9.3.3 Process Model and Efficient Flow Control Mechanism.

9.3.4 Structure of the System Software Components.

9.4 Evaluation of the SDC-II System.

9.4.1 Details of a Sample Query Processing.

9.4.2 Comparison with Commercial Systems.

9.5 Conclusion.


10 Data Placement in Parallel Database Systems.

10.1 Introduction.

10.2 Overview of Data Placement Strategies.

10.2.1 Declustering and Redistribution.

10.2.2 Placement.

10.3 Effects of Data Placement.

10.3.1 STEADY and TPC-C.

10.3.2 Dependence on Number of Processing Elements.

10.3.3 Dependence on Database Size.

10.4 Conclusions.