Suchen und Finden
Preface
6
Contents
8
List of Figures
16
List of Tables
21
Chapter 1 AN INTRODUCTION TO UNCERTAIN DATA ALGORITHMS AND APPLICATIONS
22
1. Introduction
22
2. Algorithms for Uncertain Data
24
3. Conclusions
27
Acknowledgements
28
References
28
Chapter 2 MODELS FOR INCOMPLETE AND PROBABILISTIC INFORMATION
30
1. Introduction
30
2. Incomplete Information and Representation Systems
34
3. RA-Completeness and Finite Completeness
35
4. Closure Under Relational Operations
39
5. Algebraic Completion
40
6. Probabilistic Databases and Representation Systems
42
7. Probabilistic ?-Tables and Probabilistic Or-Set Tables
43
8. Probabilistic c-tables
45
9. Queries on Annotated Relations
46
10. K-Relations
48
11. Polynomials for Provenance
51
12. Query Containment
54
13. RelatedWork
55
14. Conclusion and Further Work
55
Acknowledgments
56
References
57
15. Appendix
61
Chapter 3 RELATIONAL MODELS AND ALGEBRA FOR UNCERTAIN DATA
64
1. Introduction
64
2. Different Probabilistic Data Models
67
2.1 Point-Valued Probability Measures Assigned to Each Tuple
67
2.2 Point-Valued Probability Measures Assigned to Attributes and Attribute Sets
71
2.3 Interval-Valued Probability Measures Assigned to Attribute-Values
73
2.4 Interval-valued Probability Measures Assigned to Tuples
75
3. Probabilistic Relational Algebra
75
3.1 Basic Definitions
75
3.2 Primary and Foreign Keys
77
3.3 Relational Operations
78
3.4 Relational Algebra
81
3.5 Incomplete Distribution and Null Values
82
4. Algebraic Implications of the Different Representations and Associated Assumptions
86
4.1 Models Assigning Point-Valued Probability Measures at the Tuple Level
87
4.2 Models Assigning Point-Valued Probability Measures at the Attribute Level
88
4.3 Models Assigning Interval-Valued Probability Measures at the Attribute Level
89
4.4 Models Assigning Interval-Valued Probability Measures to Tuples
90
4.5 Some Observations on the Independence Assumption Across Tuples
91
5. Concluding Remarks
91
References
93
Chapter 4 GRAPHICAL MODELS FOR UNCERTAIN DATA
95
1. Introduction
96
2. Graphical Models: Overview
98
2.1 Directed Graphical Models: Bayesian Networks
100
2.2 Undirected Graphical Models: Markov Networks
102
2.3 Inference Queries
103
3. Representing Uncertainty using Graphical Models
105
3.1 PossibleWorld Semantics
106
3.2 Shared Factors
109
3.3 Representing Probabilistic Relations
109
4. Query Evaluation over Uncertain Data
110
4.1 Example
112
4.2 Generating Factors during Query Evaluation
114
4.3 Query Evaluation as Inference
117
4.4 Optimizations
118
5. RelatedWork and Discussion
119
5.1 Safe Plans
119
5.2 Representing Uncertainty using Lineage
120
5.3 Probabilistic Relational Models
120
5.4 Lifted Inference
122
5.5 Scalable Inference using a Relational Database
122
6. Conclusions
123
Acknowledgments
124
References
125
Chapter 5 TRIO:ASYSTEMFORDATA,UNCERTAINTY,AND LINEAGE
131
Introduction
131
1. ULDBs: Uncertainty-Lineage Databases
134
1.1 Alternatives
135
1.2 ‘?’ (Maybe) Annotations
135
1.3 Confidences
136
1.4 Lineage
137
1.5 Relational Queries
139
2. TriQL: The Trio Query Language
140
2.1 Operational Semantics
140
2.2 Querying Confidences
142
2.3 Querying Lineage
142
2.4 Duplicate Elimination
143
2.5 Aggregation
144
2.6 Reorganizing Alternatives
145
2.7 Horizontal Subqueries
146
2.8 Query-Defined Result Confidences
146
2.9 Other TriQL Query Constructs
147
3. Data Modifications in Trio
148
3.1 Inserts
148
3.2 Deletes
149
3.3 Updates
149
3.4 Data Modifications and Versioning
151
4. Confidence Computation
151
5. Additional Trio Features
153
6. The Trio System
154
6.1 Encoding ULDB Data
156
6.2 Basic Query Translation Scheme
158
6.3 Duplicate Elimination
159
6.4 Aggregation
159
6.5 Reorganizing Alternatives
160
6.6 Horizontal Subqueries
160
6.7 Built-In Predicates and Functions
161
6.8 Query-Defined Result Confidences
162
6.9 Remaining Constructs
162
References
164
Chapter 6 MAYBMS: A SYSTEM FOR MANAGING LARGE UNCERTAIN AND PROBABILISTIC DATABASES
166
1. Introduction
166
2. Probabilistic Databases
168
3. Query Language Desiderata
169
4. The Algebra
170
5. Representing Probabilistic Data
176
6. Conceptual Query Evaluation, Rewritings, and Asymptotic Efficiency
181
7. The MayBMS Query and Update Language
187
8. The MayBMS System
192
9. Conclusions and Outlook
195
References
197
Chapter 7 UNCERTAINTY IN DATA INTEGRATION
200
1. Introduction
200
2. Overview of the System
202
2.1 Uncertainty in data integration
202
2.2 System architecture
203
2.3 Source of probabilities
204
2.4 Outline of the chapter
205
3. Uncertainty in Mappings
205
3.1 Motivating probabilistic mappings
205
3.2 Definition and Semantics
207
3.3 Query Answering
212
3.4 Creating P-mappings
216
3.5 Broader Classes of Mappings
219
3.6 Other Types of Approximate Schema Mappings
221
4. Uncertainty in Mediated Schema
222
4.1 P-Med-Schema Motivating Example
222
4.2 Probabilistic Mediated Schema
225
4.3 P-med-schema Creation
227
4.4 Consolidation
229
4.5 Other approaches
231
5. Future Directions
232
References
233
Chapter 8 SKETCHING AGGREGATES OVER PROBABILISTIC STREAMS
236
1. Introduction
236
1.1 Aggregates over probabilistic streams
238
1.2 Organization
239
2. The Probabilistic Stream Model
239
2.1 Problem definitions
241
2.2 Frequency Moments and Quantiles
242
3. Overview of techniques and summary of results
244
4. Universal Sampling
248
5. Frequency moments: DISTINCT and REPEAT-RATE
249
5.1 DISTINCT
249
5.2 REPEAT-RATE
251
6. Heavy-Hitters, Quantiles, and MEDIAN
253
7. A Binning Technique for MIN and MAX
254
8. Estimating AVG using generating functions
257
8.1 Generating functions
257
8.2 Estimating AVG
258
8.3 Approximating AVG by SUM/COUNT
261
9. Discussion
265
References
266
Chapter 9 PROBABILISTIC JOIN QUERIES IN UNCERTAIN DATABASES
269
1. Introduction
269
2. Traditional Join Approaches
270
2.1 Simple Nested-Loop Join
271
2.2 Nested-Block-Loop Join
272
2.3 Sort-Merge-Loop Join
272
2.4 Other Join Methods
273
2.5 Spatial Join Algorithms
273
2.6 Spatial Join using a spatial index structure for both relations
274
2.7 Spatial Join using a spatial index structure on one relation
274
2.8 Spatial Join using no Spatial-Index Structure
275
3. Uncertainty Models and Join Predicates
275
3.1 The Continuous Uncertainty Model
276
3.2 The Discrete Uncertainty Model
278
3.3 Join Predicates and Score
283
3.4 Probabilistic Join Query Types
284
3.5 Example
285
3.6 Overview of UncertaintyModels and Probabilistic Join Queries
286
4. Approaches for Efficient Join Processing on Uncertain Data
289
4.1 Confidence-Based Join Methods
289
4.2 Probabilistic Similarity Joins
292
4.3 Probabilistic Spatial Join
300
5. Summary
301
References
304
Chapter 10 INDEXING UNCERTAIN DATA
310
1. Introduction
311
2. Data Models and Query Semantics
313
2.1 Uncertain Attribute types
314
3. Uncertainty Index for Continuous Domains
314
3.1 Probability Threshold Indexing
315
3.2 Special Case: Uniform PDFS
318
3.3 2D mapping of intervals
318
3.4 Join Queries
320
3.5 Multi-dimensional Indexing
322
4. Uncertainty Index for discrete domains
322
4.1 Data Model and Problem Definition
323
4.2 Probabilistic Inverted Index
324
4.3 Probabilistic Distribution R-tree
327
5. Indexing for Nearest Neighbor Queries
329
References
333
Chapter 11 RANGEANDNEARESTNEIGHBORQUERIESON UNCERTAIN SPATIOTEMPORAL DATA
336
1. Introduction
336
2. Range Search
340
2.1 Query Definitions
340
2.2 Filter and Refinement
342
2.3 Nonfuzzy Range Search
343
2.4 Fuzzy Range Search
345
2.5 Indexing
348
3. Nearest Neighbor Retrieval
349
3.1 Query Definition
349
3.2 Query Processing
350
3.3 Variations of Nearest Neighbor Retrieval
352
4. Summary
356
References
357
Chapter 12 PROBABILISTIC XML
360
1. Introduction
360
2. Sources of Uncertainty in XML Data
361
3. Modeling Uncertainty using Tags
362
4. Modeling Uncertainty using Semi-structured Data
365
5. Modeling Uncertainty in XML with Independent or Mutually Exclusive Distribution in Point Probability
367
6. Formal Model of Probabilistic Semi-structured Data with Arbitrary Probabilistic Distributions
370
6.1 Motivating Examples
372
6.2 Probabilistic Semi-structured Data Model
374
6.3 Semantics
379
6.4 PXML Algebra and Comparison with PreviousWork
382
6.5 Probabilistic Aggregate Operations
384
6.6 Modeling Interval Uncertainty in Semi-structured Data
386
7. Summary
389
Acknowledgements
390
References
391
Chapter 13 ON CLUSTERING ALGORITHMS FOR UNCERTAIN DATA
394
1. Introduction
394
2. Density Based Clustering Algorithms
396
3. The UK-means and CK-means Algorithms
399
4. UMicro: Streaming Algorithms for Clustering Uncertain Data
400
4.1 The UMicro Algorithm: Overview
402
4.2 Computing Expected Similarity
403
4.3 Computing the Uncertain Boundary
407
4.4 Further Enhancements
407
5. Approximation Algorithms for Clustering UncertainData
408
6. Conclusions and Summary
408
Acknowledgements
409
References
409
Chapter 14 ON APPLICATIONS OF DENSITY TRANS FORMSFOR UNCERTAIN DATA MINING
412
1. Introduction
412
2. Kernel Density Estimation with Errors
414
2.1 Scalability for Large Data Sets
417
3. Leveraging Density Estimation for Classification
421
4. Application of Density Based Approach to Outlier Detection
424
4.1 Outlier Detection Approach
425
4.2 Subspace Exploration for Outlier Detection
426
5. Conclusions
428
Acknowledgements
428
References
429
Chapter 15 FREQUENT PATTERN MINING ALGORITHMS WITH UNCERTAIN DATA
431
1. Introduction
432
2. Frequent Pattern Mining of Uncertain Data Sets
433
3. Apriori-style Algorithms
434
3.1 Pruning Methods for Apriori-Style Algorithms
435
4. Set-Enumeration Methods
438
5. Pattern Growth based Mining Algorithms
438
5.1 Extending the H-mine algorithm
439
5.2 Extending the FP-growth Algorithm
440
5.3 Another Variation of the FP-growth Algorithm
450
6. A Comparative Study on Challenging Cases
450
6.1 Performance Comparison
454
6.2 Scalability Comparison
456
7. Generalization to the PossibleWorlds Model
458
8. Discussion and Conclusions
459
Acknowledgements
460
References
461
Chapter 16 PROBABILISTIC QUERYING AND MINING OF BIOLOGICAL IMAGES
464
1. Introduction
464
1.1 An Illustrative Example
465
2. RelatedWork
468
3. Probabilistic Image Analyses
469
3.1 Probabilistic Segmentation
469
3.2 Measuring Neurite Thickness
472
3.3 Ganglion Cell Features
473
4. Querying Probabilistic Image Data
474
4.1 Range Queries on Uncertain Data
475
4.2 k-NN Queries on Uncertain Data
475
4.3 Adaptive, Piecewise-Linear Approximations
477
4.4 Indexing the APLA
477
4.5 Experimental Results
478
5. Mining Probabilistic Image Data
479
5.1 Defining Probabilistic Spatial Join (PSJ)
480
5.2 Threshold PSJ Query
481
5.3 Top-k PSJ Query
482
5.4 Experimental Results
483
6. Conclusion
484
Acknowledgements
485
References
486
Index
492
Alle Preise verstehen sich inklusive der gesetzlichen MwSt.