Mon, Oct 18, 1999
------------------------

- Mid-semester Review Sheet

- Introduction to Query Optimization

- Example involving Selections and Joins

- What does a Q.O. need?
	- search space
	- cost model
	- enumeration algorithm

- Simplest cost metric: # tuples

- Easiest to compute for projections and cartesian products
	- why?

- Notation: V(R,a) = number of distinct values of "a" in R

- Assumption: All values of "a" are equally likely

- Holds in average case for all distributions (inc. Zipf)
	- What is Zipf? 
		- ith most common element occurs in proportion to 1/sqrt(i)
	
- Selectivity-Factor Formulas for #(Selections)
	- Equality test: Use 1/V(R,a)
	- < or > test: Use 1/3
	- "Not Equal to" test: Use (V(R,a)-1)/V(R,a)
	- AND conditions: Multiply selectivity factors
	- OR conditions
		; simple sum 
		; max(sum,original relation)
		; n(1 - (1-m1/n)(1-m2/n)) formula

- Estimation Handout	

Wed, Oct 20, 1999
------------------------

- Estimating the size of a join
	- min, max and med possible values

- Two assumptions:
	- Containment of Value Sets
	- Preservations of Value Sets

- Consider A(X,Y) Join B(Y,Z)

- #(A Join B) = (#(A) #(B)) / max(V(A,Y),V(B,Y)
    	- where Y is the "join" attribute(s)

- What happens to X and Z?
	- they are preserved

- More estimation problems

- Amazing Property of Join Estimation
	- Preserves Commutativity and Associativity 

- Introduction to Logical Query Plan Selection