CS 4604 Homework #4 Solution Sketches

(10*3 = 30 points) A breakup will be lossless if it was performed along the lines of an FD. In addition, a breakup will reduce redundancy if the FD was violating the conditions of BCNF. You will never go wrong by breaking up along an FD that holds (you may not always gain something from it).

Assume that the broken up relations have attributes called S and T, i.e., S and T are each a set of attributes. The breakup will be lossless only if either of the following two conditions hold (why?):

(S Intersection T) -> S or
(S Intersection T) -> T

For the given problems, the first decompositon is lossy because BC and AD have nothing in common, so both the above two tests will fail. In other words, there is nothing to prevent everything in one relation from pairing up with everything in the other! The second decomposition is lossless because the intersection of S and T is {A}. And the FD A->ABC does hold (and so does A->AD). The third decomposition is lossless because the intersection of S and T is {A}. And the FD A->AB (as does A->ACD) holds.

The latter two decompositions, while being lossless, are not strictly necessary. This is because they do not violate the rules of BCNF and do not get us anything in the way of removing redundancy. However, in certain special situations (e.g., controlling access to particular columns in a table), we might want to break them up in this manner.
(10 points) For the 4NF violations, we need to look at the given MDs and any more MDs that we can derive from the given MDs. Here's a list of MDs that hold in R(A,B,C,D,E):
1. A->->B (given)
2. A->->CDE (pairs with 1)
3. AB->->C (given)
4. AB->->DE (pairs with 3)
5. A->->D (from the FD A->D)
6. AB->->E (from the FD AB->E)
7. A->->BCE (pairs with 5)
8. AB->->CD (pairs with 6)
Let us know determine the key (we need this to check for violations). It is obvious that {A,B,C} is a key for R. This is because these attributes do not appear on the right side of any FD, so they have to be part of any key. In other words, they are necessary. They are also sufficient, because from these attributes we can get all the other attributes. All the above MDs are in violation of 4NF because their left hand sides are not superkeys.

We now have to breakup the given relation into a collection of relations that are in 4NF. But since 4NF is a specialization of BCNF, we need to perform BCNF-style decomposition (using FDs) first, and then perform 4NF-style decompositions using MDs.

The FD A->D is in violation of BCNF, so we breakup according to this FD. We get:
1. R1(A,D)
2. R2(A,B,C,E)
R1 is in BCNF so we don't cut it further. R2 is still violation of BCNF because it has an FD AB->E that is in violation of BCNF (the key is still {A,B,C}). So, we breakup R2 further, giving:
1. R1(A,D)
2. R3(A,B,E)
3. R4(A,B,C)
Both R3 and R4 are now in BCNF and so we move to MDs and 4NF. The MD A->->B holds in both R3 and R4 but the left side (A) is not a superkey in either relation. So, we breakup further (R3 into R5 and R6, and R4 into R7 and R8). This gives:
1. R1(A,D)
2. R5(A,B)
3. R6(A,E)
4. R7(A,B)
5. R8(A,C)
But notice that R5 and R7 are really the same relation, so we need just retain one of them. Cleaning up our act (and renumbering for cuteness) gives us the following four relations as the final answer:
1. R1(A,D)
2. R2(A,B)
3. R3(A,E)
4. R4(A,C)
(10 points) The difference rule can be proven along the following steps. Notice that X, Y, and Z can be any set of attributes. You cannot assume that they are distinct. Further, you cannot assume that they make up all the attriutes of the relation R.
1. X->->Y is given. This non-trivial MD comes along with X->->(R-X-Y).
2. X->->Z is given.
3. We can combine the two MDs above to get X->->(R-X-Y) Union Z
4. The non-trivial MD above comes along with X->->(R - ((R-X-Y) Union Z) - X)
If you stopped here, and concluded that the above expression reduces to (Y-Z), not so fast! If you work it out with a Venn diagram, you will see that the above expression actually comes down to (Y-Z) - ((X Intersect Y) - Z)! But ((X Intersect Y) - Z) is just a part of X, so we can add it back to get X->->(Y-Z).

Here's a worked out example with specific values. Let R be the relation R(A,B,C,D,E,F,G,H). Let X be {A,B,D,E}, Y be {B,C,E,F}, and Z be {D,E,F,G}. If you draw the Venn diagram, you will see that this example is without loss of generality. Notice that H is left out, as it should be (for generality).
1. X->->Y is given, so ABDE->->BCEF.
2. The complement of this is ABDE->->GH.
3. X->->Z is given, so ABDE->->DEFG.
4. We combine the two previous steps to get ABDE->->DEFGH.
5. The complement of this is ABDE->->C.
6. But what we wanted to prove was ABDE->->BC (since Y-Z is {B,C}).
7. We just tack on the B to the right of the MD, from Step 5 because B is already on the left.
Notice that we can prove that ABDE->->ABCDE, since we can tack on all attributes from the left. But the question only wants us to prove X->->(Y-Z), so we add "just the right amount of attributes" to bring the right hand side to equal this amount!