The first thing to do is to figure out "linear in what?"!
A good way to analyze the complexity of such an algorithm is w.r.t. the
size of the set of FDs it processes. Since all we have to do is
to see if a given FD X->Y holds, we could try computing the closure of
X and see if Y occurs in this set. The naive way to do this
is to start with an empty set (as the closure of X) and keep adding
attributes to it (using some FD) until it stops changing. This
is the algorithm presented in Fig. 6.7 in the boat book,
Fig. 15.6 in the cow book, and is reproduced here:
Algorithm Naive-Closure
Answer = {X}
Do
Foreach FD A->B in the given set
If A is a subset of Answer, then Answer = Answer U B
Until Answer doesn't change (how would you determine this?)
It is easy to see that you might need to do two complete "sweeps"
in the worst-case, leading to a quadratic time algorithm. The
intuition behind the linear time algorithm is as follows: You
take some extra space to "preprocess" the given set of FDs so that
each FD gets "fired" at exactly the right moment when you have all the
attributes on the left hand side of it. To do this efficiently,
you need to first precompute two functions, one from
attributes to the FDs
they can help "fire" and another from FDs to the number of attributes that
are needed to "fire" them (a running counter). This is to save us a lot of
overhead in book-keeping. We present these in a pseudo-C
fashion:
- FD* atof( ATTR a) is a function that takes an attribute "a" as input
and spits out a list of FDs for which "a" appears on the left hand side.
Notice that this function can be "designed" in O(nm) time, where "n"
is the number of FDs and "m" is the number of attributes. How?
- int ftoa[FD f] is an array (a function in the
mathematical sense) indexed by FDs (like an FD-id). In other words,
it takes an FD as input and returns the number of attributes on the
left hand side of the FD. This function can be "designed" in O(n) time.
Initially this array will contain the full number of attributes
needed to fire, which we will decrement as we keep adding attributes
to our closure. Neat!
Now, the algorithm can be given as:
Answer = {};
Algorithm Closure
Foreach attribute "a" in X
Answer = Answer U {a};
Foreach FD f in atof(a)
ftoa[f] = ftoa[f] - 1; /* why? */
/* check if it is ready to fire */
if (ftoa[f] == 0) {
Closure(Y) where Y is the right hand side of FD f;
}
Why does this work? If you traverse it carefully, you will see that
this algorithm is being really selective in the order of FDs that it
fires and how it adds attributes. Moreover, once fired, an FD is never
used again. It is thus linear in
the size of the FDs (which is the sum total of the attributes in each
of the FDs). We leave this proof to the reader (it is a simple
complexity analysis of a recurrence equation). Also notice the tail-recursive
nature of the algorithm; the recursion can be elegantly removed if this
is an issue.