CS 5614 Homework #5

Date Assigned: October 6, 1999
Date Due: October 15, 1999, in class, before class starts

This assignment will consist of choosing an application for a database system, modeling it in E/R, converting E/R to relations, normalization (if necessary), populating a database in ORACLE, and finally running queries on it. You will utilize your ORACLE account "distributed" in class for all aspects of this assignment. If you do not have your account already, please see the instructor. Wei Yu has prepared a self-explanatory ORACLE help web page that will assist you in various aspects of this assignment.

(10 points) Pick an application for this assignment. You can either choose your solution to the last question on Homework #2 or make a different choice. In either case, you should start with an E/R diagram, convert it to relations, declare any FDs/MDs for each relation in your application, ensure that it is normalized (or normalize if it is not). Which normal form are your final relations in? Why did you choose this normal form?
(10 points) Write an SQL database schema for your application, using the CREATE TABLE commands described in class. Make sure your application provides at least 4-5 different relations (do not have more than 6 relations; it will be difficult to grade these). Pick suitable datatypes for each attribute. Browse through the online ORACLE manual to determine the principal options that are available for types (your class notes may or may not cover the ones that you require). Do not forget to specify at least one key for each relation (using the PRIMARY KEY construct). Specify other choices for keys via the UNIQUE construct. Justify your choice of key(s) for each relation. If some attribute should not be NULL, use NOT NULL to specify this aspect. Also provide DEFAULT values wherever applicable. An example of a complete DDL for one relation is:
```
CREATE TABLE Students
(
    sid CHAR(6) PRIMARY KEY,
    ssn CHAR(9) UNIQUE,
    name CHAR(30), 
    birthdate DATE,
    gender CHAR(1) NOT NULL,
    address VARCHAR(255)
);
```
Execute some sample INSERT, DELETE and UPDATE commands on every one of your relations. Convince yourself that things are working just fine. Report any interesting observations that you make.
(20 points) Develop a substantial amount of data for your application and load it into your relations using the bulk loading facility provided in ORACLE. Browse through the ORACLE manual, particularly the SQL*Loader section which describes this facility. There is also an "INFILE" construct that you can use in conjunction with ORACLE DDL commands. This section of the manual also describes file formats, lengths, data types etc.

We are looking at data in the order of 40-50 tuples for each relation in your application. To create the data, either get it from some web source or information repository (making sure that it is public first, of course) or write a program in any scripting/programming language (like Perl, TCL or C) that creates large files of records in a format acceptable to the bulk loader. Either way, realize that you may need to transform data from one form to another to be acceptable for use in ORACLE. It is okay to "cookup" data, so feel free to write some "randomized algorithm" that generates tens of tuples for use in the database.
Sometimes, some of your relations might require only 4-5 tuples; examples are if you are modeling the real-estate business, there might be only a few realtors in a small town like Blacksburg, in which case it is okay to have just a few tuples in that particular relation. In such cases, where you have only a few tuples, state explicitly why you have only a few tuples.
Precautions in "Cooking up": In part (2), notice that you would have specified key(s) for each relation. Make sure that while cooking up data, your fabricated data indeed confirms to such "key restrictions". If you declare that "StudentID" is the key, then do not generate data for different students having the same ID! Also notice that relations will definitely share attributes (if there are no common attributes at all in your schema, then your project is in grave danger, see the instructor immediately!), so make sure that they "agree" on the common attributes. For example, if there are two relations that share an attribute called "name", then make sure that the names do indeed tally! Do not list the name "Mark" as "MARK" in one and "maRK" in another and "Mark" in yet another because then they would be three different names! The bottomline is be consistent. When two things are meant to be the same, make sure your data reflects this fact. If you do not ensure this at this stage, believe us, you will spend more time debugging your application than otherwise!
(30 points) Write five different queries on your database, using the SELECT/FROM/WHERE construct provided in SQL. Your five queries should preferably illustrate several different aspects of database querying, such as:
- Queries over more than one relation (by listing more than one relation in the FROM clause)
- Queries involving aggregate functions, such as SUM, COUNT, AVG etc.
- Queries involving complicated selects and joins
- Queries involving joins that have a NOT for some of the relations (like the NOT Sportscar example discussed in class)
- Queries involving GROUP BY, HAVING or other similar functions.
- Queries that require the use of the DISTINCT or ALL keyword.
Caution: Do not cookup query problems to cover each and everyone of the above aspects! You do not have to illustrate all the above aspects, just the ones that occur "naturally" and seamlessly in your application. So, try to infuse some reality into your project and think of some reasonable queries that people would want to use with your application. For example, in a movie application, writing a query such as "Find all actors whose ages are three times more than their street number" sounds ridiculous!!

You will not lose points if you do not cover all the above aspects.

ORACLE lets you save all your SQL commands in a file (with extension .sql) and allows you to just execute it from the command line prompt by typing the file name (or some such).

You can also add "rem lines" (reminiscent of BASIC!) to the beginning of your .sql file, such as:
```
rem cs5614 project
rem your_name, time etc.
```
(20 points) Give examples of two queries that you will be unable to write. You should think on one of the following two themes: (i) "Current database technology isn't advanced enough for this kind of queries" or (ii) "Even though the information exists in the database and I can answer it by looking at it, it would be impossible to express it in SQL because it lacks some constructs needed.". If it is the latter, explain what kinds of constructs are needed.

Notice that this is a "theory" question; moreover, realize that just because you were unable to figure out how to write a query does not make it an impossible one. Your answer should consist of queries that you will be unable to write, irrespective of the database system chosen. For the more enterprising among you, you can look at the SQL3 standard, which attempts to alleviate some of these concerns. That might give you a clue on where to look for the answer to this question.

This question does not address limitations of the ORACLE implementation of SQL, but limitations of SQL itself. Another common misconception is to list queries that cannot be answered because no pertinent information exists in the database! For example, we are not looking for answers like "I cannot find the students who have gpa more than 3.0 because in my project, I haven't recorded/modeled the gpa information". Notice that the information required to answer the query must exist in the database but still render you unable to write the required query.
(10 points) If you were to create some indices on some of your tables to speed up query processing, which attributes would you place these on? Why? Why not? Notice again, that this is a theory question; you do not have to experiment with creating indices on your recommended attributes (that will be the next assignment! :-)). Look through your course notes for more information on indices.

What to turnin:

A paper copy that details the following:

Your E/R diagram and a list of relation schemas derived from it. Your answer to part (1) above - ref. FDs, MDs, normal forms etc.
List of your SQL CREATE TABLE commands (with PRIMARY KEY, NOT NULL and DEFAULT constraints). Alternatively, you can run the command DESCRIBE to show the schema of the tables.
List of the number of tuples for each relation in your application.
Samples of INSERT, DELETE and UPDATE commands that you tried out and that worked.
Where you got the data from: Either say "I got it from this web address", "I got this from this reference book on apartment homes", "I imported it from this document", or "I cooked it up". Whatever your answer, write a short paragraph on how you transformed it from its native form to the form required by ORACLE. If you wrote a special purpose program or script (in Perl/TCL/something), include that script. Explain how you took special care to make sure that key restrictions and common attribute constraints are not violated. Explain why some relations have only a few tuples (if applicable).
Sample of tuples for each relation (about 6-10 each), just to give us an idea for how they look.
List of the five SQL queries that you tried out (a printout of your .sql file with helpful rems telling us what's going on would suffice) and the answers. If an answer is long (and runs to several pages), please include just enough of the answer in your final report so that we know what is going on and mention that you truncated the final output. Do not include pages and pages of answers, please!
Answers to questions 5 and 6. Answer to question 5 should be a list of "english" queries that will be difficult to express in SQL (you should also explain why).
Most Important: Your user login (which should be the same as the database login), what your database is called, and a complete path name (in case we want to go and demo it out). You should save it in some location in your home directory. If there are space restrictions, please let the instructor know.

What NOT to turnin:

Pages and pages of raw data or printouts of all the data in your database. Please, there is no way we can comb this stuff manually.

Precautions and Recommendations:

This might seem obvious but it is surprising the number of people who miss it. Please *SAVE* a copy of your database in some disk or floppy or somewhere. So do not lose all your hard-earned work, only to start from scratch again!