Data Mining with Microsoft SQL Server 2000 Technical Reference

Claude Seidman

  • 出版商: MicroSoft
  • 出版日期: 2001-06-09
  • 售價: $2,000
  • 貴賓價: 9.5$1,900
  • 語言: 英文
  • 頁數: 400
  • 裝訂: Hardcover
  • ISBN: 0735612714
  • ISBN-13: 9780735612716
  • 相關分類: MSSQLSQLData-mining
  • 已過版

買這商品的人也買了...

商品描述

Description:

Learn how to turn mountains of raw data into useful information with this guide.

The amount of information stored in corporate databases is exploding exponentially. Data mining—finding meaningful patterns in all that data—can give any organization a competitive advantage. This book is the in-depth reference from Microsoft® for anyone who wants to take full advantage of the powerful data-mining features in SQL Server™ 2000. It examines the SQL Server 2000 Analysis Services architecture and shows how data mining fits into its complete suite of information-extraction technologies. Then it demonstrates how to structure and mine large databases with the algorithms included with SQL Server 2000 to find nuggets of useful information. It even shows how to create a practice data-mining model using data downloaded from a database. Coverage includes:

• INTRODUCTION TO DATA MINING: What data mining is and isn’t, plus important principles and definitions behind data-mining methodologies, including the role of data-mining models, statistics, and algorithms
• SQL SERVER 2000 ARCHITECTURE: How data mining fits into the SQL Server 2000 Analysis Services architecture and how it builds on the SQL Server 2000 relational database and its embedded online analytical processing (OLAP) engine
• DATA-MINING METHODS: How to choose the best data-mining method for the job—decision trees or clustering
• EASE OF USE FEATURES: How to use the Mining Model Wizard and the OLAP Mining Model Editor to simplify creating, training, and processing a model
• PROGRAMMING THE DATA-MINING SERVICES: How to use data-mining models and Data Transformation Services, PivotTable® Services, decision-support objects (DSO), PERL, Visual Basic®, Scripting Edition, XML, and other tools and languages to work with the data-mining engine

 

Table of Contents:

Acknowledgments  xi xi
Introduction  xiii xiii
PART I INTRODUCING DATA MINING  
1 Understanding Data Mining   3
    What Is Data Mining? 3
    Why Use Data Mining? 4
    How Data Mining Is Currently Used 6
    Defining the Terms 7
    Data Mining Methodology 9
        Analyzing the Problem 10
        Extracting and Cleansing the Data 10
        Validating the Data 10
        Creating and Training the Model 10
        Querying the Data Mining Model Data 10
        Maintaining the Validity of the Data-Mining Model 10
    Overview of Microsoft Data Mining 11
        Data Mining vs. OLAP 11
        Data-Mining Models 11
        Data-Mining Algorithms 12
        Using SQL Server Syntax to Data Mine 14
    Summary 14
2 Microsoft SQL Server Analysis Services Architecture 15
    Introduction to OLAP 16
        MOLAP 18
        ROLAP 18
        HOLAP 19
    Server Architecture 20
        Data Mining Services Within Analysis Services 20
    Client Architecture 21
        PivotTable Service 22
        OLE DB 23
        Decision Support Objects (DSO) 24
        Multidimensional Expressions (MDX) 25
        Prediction Joins 25
    Summary 26
3 Data Storage Models   27
    Why Data Mining Needs a Data Warehouse 27
        Maintaining Data Integrity 28
    Reporting Against OLTP Data Can Be Hazardous to Your Performance 31
    Data Warehousing Architecture for Data Mining 33
        Creating the Warehouse from OLTP Data 33
        Optimizing Data for Mining 36
        Physical Data Mining Structure 42
        Three-Tier Architecture 43
    Relational Data Warehouse 43
        Advantages of Relational Data Storage 44
        Building Supporting Tables for Data Mining 45
    OLAP cubes 46
        How Data Mining Uses OLAP Structures 46
        Advantages of OLAP Storage 47
        When OLAP Is Not Appropriate for Data Mining 49
    Summary 49
4 Approaches to Data Mining   51
    Directed Data Mining 51
    Undirected Data Mining 52
        Data Mining vs. Statistics 52
        Learning from Historical Data 57
        Predicting the Future 59
    Training Data-Mining Models 61
        Evaluating the Models and Avoiding Errors 62
    Summary 65
PART II DATA-MINING METHODS  
5 Microsoft Decision Trees   69
    Creating the Model 69
        Analysis Manager 70
    Visualizing the Model 87
        Dependency Network Browser 94
        Inside the Decision Tree Algorithm 97
        How Predictions Are Derived 109
        Navigating the Tree 109
        Navigation vs. Rules 112
        When to Use Decision Trees 113
    Summary 114
6 Creating Decision Trees with OLAP   115
    Creating the Model 115
        Select Source Type 116
        Select Source Cube and Data-Mining Technique 116
        Select Case 118
        Select Predicted Entity 119
        Select Training Data 121
        Select Dimension and Virtual Cube 121
        Completing the Data-Mining Model 123
    OLAP Mining Model Editor 125
        Content Detail Pane 126
        Structure Panel 126
        Prediction Tree List 126
    Analyzing Data with the OLAP Data-Mining Model 126
        Using the Generated Virtual Cube 128
        Using the Generated Dimension 129
    Summary 133
7 Microsoft Clustering   135
    The Search for Order 136
    Looking for Ways to Understand Data 136
    Clustering as an Undirected Data-Mining Technique 137
    How Clustering Works 138
        Overview of the Algorithm 138
        The K-Means Method Clustering Algorithm 138
        What Is Being Measured Exactly? 142
        Clustering Factors 142
        Measuring "Closeness" 143
    When to Use Clustering 146
        Visualize Relationships 146
        Highlight Anomalies 146
        Create Samples for Other Data-Mining Efforts 148
        Weaknesses of Clustering 148
    Creating a Data-Mining Model Using Clustering 149
        Select Source Type 150
        Select the Table or Tables for Your Mining Model 150
        Select the Data-Mining Technique 151
        Edit Joins 152
        Select the Case Key Column for Your Mining Model 152
        Select the Input and Predictable Columns 152
    Viewing the Model 154
        Organization of the Cluster Nodes 154
        Order of the Cluster Nodes 156
    Analyzing the Data 156
    Summary 158
PART III CREATING DATA–MINING APPLICATIONS WITH CODE  
8 Using Microsoft Data Transformation Services (DTS) 161
    What Is DTS? 162
    DTS Tasks 162
        Transform 162
        Bulk Insert 163
        Data Driven Query 163
        Execute Package 164
    Connections 167
        Sources 167
        Configuring a Connection 168
    DTS Package Workflow 169
        DTS Package Steps 169
        Precedence Constraints 170
    DTS Designer 171
        Opening the DTS Designer 171
        Saving a DTS Package 172
    dtsrun Utility 174
    Using DTS to Create a Data-Mining Model 177
        Preparing the SQL Server Environment 178
        Creating the Package 182
    Summary 208
9 Using Decision Support Objects (DSO)   209
    Scripting vs. Visual Basic 210
        The Server Object 211
        The Database Object 219
    Creating the Relational Data-Mining Model Using DSO 221
    Creating the OLAP Data-Mining Model Using DSO 230
        The DataSource Object 232
        Data-Mining Model (Decision Support Objects) 233
    Adding a New Data Source 233
    Analysis Server Roles 234
        Data-Mining Model Roles 235
    Summary 236
10 Understanding Data-Mining Structures 237
    The Structure of the Data-Mining Model Case 237
        Data-Mining Models Look Like Tables 237
    Using Code to Browse Data-Mining Models 238
    Using the Schema Rowsets 243
        MINING_MODELS Schema Rowset 243
        MINING_COLUMNS Schema Rowset 249
        MINING_MODEL_CONTENT Schema Rowset 259
        MINING_SERVICES Schema Rowset 262
        SERVICE_PARAMETERS Schema Rowset 266
        MODEL_CONTENT_PMML Schema Rowset 268
    Summary 269
11 Data Mining Using PivotTable Service 271
    Redistributing Components 272
    Installing and Registering Components 273
        File Locations 274
        Installation Registry Settings 275
        Redistribution Setup Programs 275
    Connecting to the PivotTable Service 276
        Connect to Analysis Services Using PivotTable Service 276
        Connect to Analysis Services Using HTTP 280
    Building a Local Data-Mining Model 280
        Storage of Local Mining Models 284
        SELECT INTO Statement 286
        INSERT INTO Statement 286
        OPENROWSET Syntax 287
        Nested Tables and the SHAPE Statement 289
    Using XML in Data Mining 290
        The PMML Standard 290
    Summary 296
12 Data-Mining Queries 297
    Components of a Prediction Query 297
        The Basic Prediction Query 298
        Specifying the Test Case Source 298
        Specifying Columns 300
        The PREDICTION JOIN Clause 300
        Using Functions as Columns 304
        Using Tabular Values as Columns 304
        The WHERE Clause 306
        Prediction Functions 307
        Predict 307
        PredictProbability 308
        PredictSupport 308
        PredictVariance 309
        PredictStdev 310
        PredictProbabilityVariance 310
        PredictProbabilityStdev 310
        PredictHistogram 310
        TopCount 313
        TopSum 313
        TopPercent 314
        RangeMin 314
        RangeMid 314
        RangeMax 314
        PredictScore 314
        PredictNodeId 315
    Prediction Queries with Clustering Models 315
        Cluster 315
        ClusterProbability 316
        ClusterDistance 316
    Using DTS to Run Prediction Queries 317
    Summary 322
APPENDIX    325
GLOSSARY   349
INDEX   359