pfeil_rechts

Suchen und Finden

Service

Managing and Mining Uncertain Data

von: Charu C. Aggarwal

Springer-Verlag, 2010

ISBN: 9780387096902 , 494 Seiten

Format: PDF, OL

Kopierschutz: DRM

Online-Lesen für: Windows PC,Mac OSX,Linux

Preis: 96,29 EUR

Mehr zum Inhalt

Managing and Mining Uncertain Data

Preface: 6
Contents: 8
List of Figures: 16
List of Tables: 21
Chapter 1 AN INTRODUCTION TO UNCERTAIN DATA ALGORITHMS AND APPLICATIONS: 22
1. Introduction: 22
2. Algorithms for Uncertain Data: 24
3. Conclusions: 27
Acknowledgements: 28
References: 28
Chapter 2 MODELS FOR INCOMPLETE AND PROBABILISTIC INFORMATION: 30
1. Introduction: 30
2. Incomplete Information and Representation Systems: 34
3. RA-Completeness and Finite Completeness: 35
4. Closure Under Relational Operations: 39
5. Algebraic Completion: 40
6. Probabilistic Databases and Representation Systems: 42
7. Probabilistic ?-Tables and Probabilistic Or-Set Tables: 43
8. Probabilistic c-tables: 45
9. Queries on Annotated Relations: 46
10. K-Relations: 48
11. Polynomials for Provenance: 51
12. Query Containment: 54
13. RelatedWork: 55
14. Conclusion and Further Work: 55
Acknowledgments: 56
References: 57
15. Appendix: 61
Chapter 3 RELATIONAL MODELS AND ALGEBRA FOR UNCERTAIN DATA: 64
1. Introduction: 64
2. Different Probabilistic Data Models: 67
2.1 Point-Valued Probability Measures Assigned to Each Tuple: 67
2.2 Point-Valued Probability Measures Assigned to Attributes and Attribute Sets: 71
2.3 Interval-Valued Probability Measures Assigned to Attribute-Values: 73
2.4 Interval-valued Probability Measures Assigned to Tuples: 75
3. Probabilistic Relational Algebra: 75
3.1 Basic Definitions: 75
3.2 Primary and Foreign Keys: 77
3.3 Relational Operations: 78
3.4 Relational Algebra: 81
3.5 Incomplete Distribution and Null Values: 82
4. Algebraic Implications of the Different Representations and Associated Assumptions: 86
4.1 Models Assigning Point-Valued Probability Measures at the Tuple Level: 87
4.2 Models Assigning Point-Valued Probability Measures at the Attribute Level: 88
4.3 Models Assigning Interval-Valued Probability Measures at the Attribute Level: 89
4.4 Models Assigning Interval-Valued Probability Measures to Tuples: 90
4.5 Some Observations on the Independence Assumption Across Tuples: 91
5. Concluding Remarks: 91
References: 93
Chapter 4 GRAPHICAL MODELS FOR UNCERTAIN DATA: 95
1. Introduction: 96
2. Graphical Models: Overview: 98
2.1 Directed Graphical Models: Bayesian Networks: 100
2.2 Undirected Graphical Models: Markov Networks: 102
2.3 Inference Queries: 103
3. Representing Uncertainty using Graphical Models: 105
3.1 PossibleWorld Semantics: 106
3.2 Shared Factors: 109
3.3 Representing Probabilistic Relations: 109
4. Query Evaluation over Uncertain Data: 110
4.1 Example: 112
4.2 Generating Factors during Query Evaluation: 114
4.3 Query Evaluation as Inference: 117
4.4 Optimizations: 118
5. RelatedWork and Discussion: 119
5.1 Safe Plans: 119
5.2 Representing Uncertainty using Lineage: 120
5.3 Probabilistic Relational Models: 120
5.4 Lifted Inference: 122
5.5 Scalable Inference using a Relational Database: 122
6. Conclusions: 123
Acknowledgments: 124
References: 125
Chapter 5 TRIO:ASYSTEMFORDATA,UNCERTAINTY,AND LINEAGE: 131
Introduction: 131
1. ULDBs: Uncertainty-Lineage Databases: 134
1.1 Alternatives: 135
1.2 ‘?’ (Maybe) Annotations: 135
1.3 Confidences: 136
1.4 Lineage: 137
1.5 Relational Queries: 139
2. TriQL: The Trio Query Language: 140
2.1 Operational Semantics: 140
2.2 Querying Confidences: 142
2.3 Querying Lineage: 142
2.4 Duplicate Elimination: 143
2.5 Aggregation: 144
2.6 Reorganizing Alternatives: 145
2.7 Horizontal Subqueries: 146
2.8 Query-Defined Result Confidences: 146
2.9 Other TriQL Query Constructs: 147
3. Data Modifications in Trio: 148
3.1 Inserts: 148
3.2 Deletes: 149
3.3 Updates: 149
3.4 Data Modifications and Versioning: 151
4. Confidence Computation: 151
5. Additional Trio Features: 153
6. The Trio System: 154
6.1 Encoding ULDB Data: 156
6.2 Basic Query Translation Scheme: 158
6.3 Duplicate Elimination: 159
6.4 Aggregation: 159
6.5 Reorganizing Alternatives: 160
6.6 Horizontal Subqueries: 160
6.7 Built-In Predicates and Functions: 161
6.8 Query-Defined Result Confidences: 162
6.9 Remaining Constructs: 162
References: 164
Chapter 6 MAYBMS: A SYSTEM FOR MANAGING LARGE UNCERTAIN AND PROBABILISTIC DATABASES: 166
1. Introduction: 166
2. Probabilistic Databases: 168
3. Query Language Desiderata: 169
4. The Algebra: 170
5. Representing Probabilistic Data: 176
6. Conceptual Query Evaluation, Rewritings, and Asymptotic Efficiency: 181
7. The MayBMS Query and Update Language: 187
8. The MayBMS System: 192
9. Conclusions and Outlook: 195
References: 197
Chapter 7 UNCERTAINTY IN DATA INTEGRATION: 200
1. Introduction: 200
2. Overview of the System: 202
2.1 Uncertainty in data integration: 202
2.2 System architecture: 203
2.3 Source of probabilities: 204
2.4 Outline of the chapter: 205
3. Uncertainty in Mappings: 205
3.1 Motivating probabilistic mappings: 205
3.2 Definition and Semantics: 207
3.3 Query Answering: 212
3.4 Creating P-mappings: 216
3.5 Broader Classes of Mappings: 219
3.6 Other Types of Approximate Schema Mappings: 221
4. Uncertainty in Mediated Schema: 222
4.1 P-Med-Schema Motivating Example: 222
4.2 Probabilistic Mediated Schema: 225
4.3 P-med-schema Creation: 227
4.4 Consolidation: 229
4.5 Other approaches: 231
5. Future Directions: 232
References: 233
Chapter 8 SKETCHING AGGREGATES OVER PROBABILISTIC STREAMS: 236
1. Introduction: 236
1.1 Aggregates over probabilistic streams: 238
1.2 Organization: 239
2. The Probabilistic Stream Model: 239
2.1 Problem definitions: 241
2.2 Frequency Moments and Quantiles: 242
3. Overview of techniques and summary of results: 244
4. Universal Sampling: 248
5. Frequency moments: DISTINCT and REPEAT-RATE: 249
5.1 DISTINCT: 249
5.2 REPEAT-RATE: 251
6. Heavy-Hitters, Quantiles, and MEDIAN: 253
7. A Binning Technique for MIN and MAX: 254
8. Estimating AVG using generating functions: 257
8.1 Generating functions: 257
8.2 Estimating AVG: 258
8.3 Approximating AVG by SUM/COUNT: 261
9. Discussion: 265
References: 266
Chapter 9 PROBABILISTIC JOIN QUERIES IN UNCERTAIN DATABASES: 269
1. Introduction: 269
2. Traditional Join Approaches: 270
2.1 Simple Nested-Loop Join: 271
2.2 Nested-Block-Loop Join: 272
2.3 Sort-Merge-Loop Join: 272
2.4 Other Join Methods: 273
2.5 Spatial Join Algorithms: 273
2.6 Spatial Join using a spatial index structure for both relations: 274
2.7 Spatial Join using a spatial index structure on one relation: 274
2.8 Spatial Join using no Spatial-Index Structure: 275
3. Uncertainty Models and Join Predicates: 275
3.1 The Continuous Uncertainty Model: 276
3.2 The Discrete Uncertainty Model: 278
3.3 Join Predicates and Score: 283
3.4 Probabilistic Join Query Types: 284
3.5 Example: 285
3.6 Overview of UncertaintyModels and Probabilistic Join Queries: 286
4. Approaches for Efficient Join Processing on Uncertain Data: 289
4.1 Confidence-Based Join Methods: 289
4.2 Probabilistic Similarity Joins: 292
4.3 Probabilistic Spatial Join: 300
5. Summary: 301
References: 304
Chapter 10 INDEXING UNCERTAIN DATA: 310
1. Introduction: 311
2. Data Models and Query Semantics: 313
2.1 Uncertain Attribute types: 314
3. Uncertainty Index for Continuous Domains: 314
3.1 Probability Threshold Indexing: 315
3.2 Special Case: Uniform PDFS: 318
3.3 2D mapping of intervals: 318
3.4 Join Queries: 320
3.5 Multi-dimensional Indexing: 322
4. Uncertainty Index for discrete domains: 322
4.1 Data Model and Problem Definition: 323
4.2 Probabilistic Inverted Index: 324
4.3 Probabilistic Distribution R-tree: 327
5. Indexing for Nearest Neighbor Queries: 329
References: 333
Chapter 11 RANGEANDNEARESTNEIGHBORQUERIESON UNCERTAIN SPATIOTEMPORAL DATA: 336
1. Introduction: 336
2. Range Search: 340
2.1 Query Definitions: 340
2.2 Filter and Refinement: 342
2.3 Nonfuzzy Range Search: 343
2.4 Fuzzy Range Search: 345
2.5 Indexing: 348
3. Nearest Neighbor Retrieval: 349
3.1 Query Definition: 349
3.2 Query Processing: 350
3.3 Variations of Nearest Neighbor Retrieval: 352
4. Summary: 356
References: 357
Chapter 12 PROBABILISTIC XML: 360
1. Introduction: 360
2. Sources of Uncertainty in XML Data: 361
3. Modeling Uncertainty using Tags: 362
4. Modeling Uncertainty using Semi-structured Data: 365
5. Modeling Uncertainty in XML with Independent or Mutually Exclusive Distribution in Point Probability: 367
6. Formal Model of Probabilistic Semi-structured Data with Arbitrary Probabilistic Distributions: 370
6.1 Motivating Examples: 372
6.2 Probabilistic Semi-structured Data Model: 374
6.3 Semantics: 379
6.4 PXML Algebra and Comparison with PreviousWork: 382
6.5 Probabilistic Aggregate Operations: 384
6.6 Modeling Interval Uncertainty in Semi-structured Data: 386
7. Summary: 389
Acknowledgements: 390
References: 391
Chapter 13 ON CLUSTERING ALGORITHMS FOR UNCERTAIN DATA: 394
1. Introduction: 394
2. Density Based Clustering Algorithms: 396
3. The UK-means and CK-means Algorithms: 399
4. UMicro: Streaming Algorithms for Clustering Uncertain Data: 400
4.1 The UMicro Algorithm: Overview: 402
4.2 Computing Expected Similarity: 403
4.3 Computing the Uncertain Boundary: 407
4.4 Further Enhancements: 407
5. Approximation Algorithms for Clustering UncertainData: 408
6. Conclusions and Summary: 408
Acknowledgements: 409
References: 409
Chapter 14 ON APPLICATIONS OF DENSITY TRANS FORMSFOR UNCERTAIN DATA MINING: 412
1. Introduction: 412
2. Kernel Density Estimation with Errors: 414
2.1 Scalability for Large Data Sets: 417
3. Leveraging Density Estimation for Classification: 421
4. Application of Density Based Approach to Outlier Detection: 424
4.1 Outlier Detection Approach: 425
4.2 Subspace Exploration for Outlier Detection: 426
5. Conclusions: 428
Acknowledgements: 428
References: 429
Chapter 15 FREQUENT PATTERN MINING ALGORITHMS WITH UNCERTAIN DATA: 431
1. Introduction: 432
2. Frequent Pattern Mining of Uncertain Data Sets: 433
3. Apriori-style Algorithms: 434
3.1 Pruning Methods for Apriori-Style Algorithms: 435
4. Set-Enumeration Methods: 438
5. Pattern Growth based Mining Algorithms: 438
5.1 Extending the H-mine algorithm: 439
5.2 Extending the FP-growth Algorithm: 440
5.3 Another Variation of the FP-growth Algorithm: 450
6. A Comparative Study on Challenging Cases: 450
6.1 Performance Comparison: 454
6.2 Scalability Comparison: 456
7. Generalization to the PossibleWorlds Model: 458
8. Discussion and Conclusions: 459
Acknowledgements: 460
References: 461
Chapter 16 PROBABILISTIC QUERYING AND MINING OF BIOLOGICAL IMAGES: 464
1. Introduction: 464
1.1 An Illustrative Example: 465
2. RelatedWork: 468
3. Probabilistic Image Analyses: 469
3.1 Probabilistic Segmentation: 469
3.2 Measuring Neurite Thickness: 472
3.3 Ganglion Cell Features: 473
4. Querying Probabilistic Image Data: 474
4.1 Range Queries on Uncertain Data: 475
4.2 k-NN Queries on Uncertain Data: 475
4.3 Adaptive, Piecewise-Linear Approximations: 477
4.4 Indexing the APLA: 477
4.5 Experimental Results: 478
5. Mining Probabilistic Image Data: 479
5.1 Defining Probabilistic Spatial Join (PSJ): 480
5.2 Threshold PSJ Query: 481
5.3 Top-k PSJ Query: 482
5.4 Experimental Results: 483
6. Conclusion: 484
Acknowledgements: 485
References: 486
Index: 492

AGB
Datenschutz
Impressum
Kontakt
F.A.Q
Widerruf

Alle Preise verstehen sich inklusive der gesetzlichen MwSt.

Managing and Mining Uncertain Data

von: Charu C. Aggarwal

Managing and Mining Uncertain Data

Preface

Contents

List of Figures

List of Tables

Chapter 1 AN INTRODUCTION TO UNCERTAIN DATA ALGORITHMS AND APPLICATIONS

1. Introduction

2. Algorithms for Uncertain Data

3. Conclusions

Acknowledgements

References

Chapter 2 MODELS FOR INCOMPLETE AND PROBABILISTIC INFORMATION

1. Introduction

2. Incomplete Information and Representation Systems

3. RA-Completeness and Finite Completeness

4. Closure Under Relational Operations

5. Algebraic Completion

6. Probabilistic Databases and Representation Systems

7. Probabilistic ?-Tables and Probabilistic Or-Set Tables

8. Probabilistic c-tables

9. Queries on Annotated Relations

10. K-Relations

11. Polynomials for Provenance

12. Query Containment

13. RelatedWork

14. Conclusion and Further Work

Acknowledgments

References

15. Appendix

Chapter 3 RELATIONAL MODELS AND ALGEBRA FOR UNCERTAIN DATA

1. Introduction

2. Different Probabilistic Data Models

2.1 Point-Valued Probability Measures Assigned to Each Tuple

2.2 Point-Valued Probability Measures Assigned to Attributes and Attribute Sets

2.3 Interval-Valued Probability Measures Assigned to Attribute-Values

2.4 Interval-valued Probability Measures Assigned to Tuples

3. Probabilistic Relational Algebra

3.1 Basic Definitions

3.2 Primary and Foreign Keys

3.3 Relational Operations

3.4 Relational Algebra

3.5 Incomplete Distribution and Null Values

4. Algebraic Implications of the Different Representations and Associated Assumptions

4.1 Models Assigning Point-Valued Probability Measures at the Tuple Level

4.2 Models Assigning Point-Valued Probability Measures at the Attribute Level

4.3 Models Assigning Interval-Valued Probability Measures at the Attribute Level

4.4 Models Assigning Interval-Valued Probability Measures to Tuples

4.5 Some Observations on the Independence Assumption Across Tuples

5. Concluding Remarks

References

Chapter 4 GRAPHICAL MODELS FOR UNCERTAIN DATA

1. Introduction

2. Graphical Models: Overview

2.1 Directed Graphical Models: Bayesian Networks

2.2 Undirected Graphical Models: Markov Networks

2.3 Inference Queries

3. Representing Uncertainty using Graphical Models

3.1 PossibleWorld Semantics

3.2 Shared Factors

3.3 Representing Probabilistic Relations

4. Query Evaluation over Uncertain Data

4.1 Example

4.2 Generating Factors during Query Evaluation

4.3 Query Evaluation as Inference

4.4 Optimizations

5. RelatedWork and Discussion

5.1 Safe Plans

5.2 Representing Uncertainty using Lineage

5.3 Probabilistic Relational Models

5.4 Lifted Inference

5.5 Scalable Inference using a Relational Database

6. Conclusions

Acknowledgments

References

Chapter 5 TRIO:ASYSTEMFORDATA,UNCERTAINTY,AND LINEAGE

Introduction

1. ULDBs: Uncertainty-Lineage Databases

1.1 Alternatives