Mon, Oct 18, 1999 ------------------------ - Mid-semester Review Sheet - Introduction to Query Optimization - Example involving Selections and Joins - What does a Q.O. need? - search space - cost model - enumeration algorithm - Simplest cost metric: # tuples - Easiest to compute for projections and cartesian products - why? - Notation: V(R,a) = number of distinct values of "a" in R - Assumption: All values of "a" are equally likely - Holds in average case for all distributions (inc. Zipf) - What is Zipf? - ith most common element occurs in proportion to 1/sqrt(i) - Selectivity-Factor Formulas for #(Selections) - Equality test: Use 1/V(R,a) - < or > test: Use 1/3 - "Not Equal to" test: Use (V(R,a)-1)/V(R,a) - AND conditions: Multiply selectivity factors - OR conditions ; simple sum ; max(sum,original relation) ; n(1 - (1-m1/n)(1-m2/n)) formula - Estimation Handout Wed, Oct 20, 1999 ------------------------ - Estimating the size of a join - min, max and med possible values - Two assumptions: - Containment of Value Sets - Preservations of Value Sets - Consider A(X,Y) Join B(Y,Z) - #(A Join B) = (#(A) #(B)) / max(V(A,Y),V(B,Y) - where Y is the "join" attribute(s) - What happens to X and Z? - they are preserved - More estimation problems - Amazing Property of Join Estimation - Preserves Commutativity and Associativity - Introduction to Logical Query Plan Selection