Monday, Aug 23, 1999
------------------------

- Introduction to CS5614

- Logistics, Recommended Texts, Web page, etc.

- Brief Historical Background of DBs
  (introduced to motivate the various aspects)

  Here's a synopsis: The first two entries are switched
  to better illustrate the sequence of research ideas

1973: Charles W. Bachman wins Turing Award
	- conceptualized that information can be stored and manip.
	- propounded the information-centric view and computer
  	  as a tool to manipulate information
	- designed networked databases, which along with hierarchical
 	  databases, were the precursors to RDBMSs
	- Architect behind the Honeywell IDS networkd DB
	- led to the formation of DBTG (Data Base Task Group)
          (the COBOL/CODASYL camp)

1970: Edgar F. Codd writes paper in CACM
	- proposes relational model
	- separate logical view from physical view
	- uses an algebra of operations to query/manipulate the DB

1975: ACM SIGMOD Meeting (The Great Debate)
	- Relational View:
		Set oriented queries and
		a good formal/mathematical model is important
	- Network View:
		Record oriented queries and
		efficiency is important; not possible
		to get efficiency with `R' 

From this point - 1980: 

	Actual Development of RDBMSs; Two implementations
	- IBM, San Jose, CA
		- System R
		- Jim Gray, K.P. Eswaran (KPE) et.al.
		- Two-phase architecture
			- RDS: Relational Data System
			- RSS: Relational Storage System
                     RSS provides support for links and images
		     to allow backward compatibility with
		     network DBs		    			
		- RDS didn't use some of the facilities for RSS
                  'coz difficult to build optimizers for images etc.
		- KPE forms ESVAL, later formed basis of HP Database
		- Larry Ellison uses published specs to found ORACLE
		- IBM later produces DB2 and SQL/DS

	- UC Berkeley
		- Ingres
		- Michael Stonebraker, Wong, Held et.al.
		- used UNIX File System Support
		- Ingres taken over by Computer Associates	
		- Rob Epstein helps build IDM and found Sybase

1981: Codd wins Turing Award

1989: Ingres group bags ACM Software Systems Award

1992: Stonebraker wins SIGMOD Innovations Award

1998: Gray wins Turing Award (Contributions to Transaction Processing)
      Gray is currently with Microsoft BARC

- Relational Model
	- Basic Idea
	- Example of SQL

- Implementation Decisions that shaped DBMSs
	- IBM's non-standard programmatic interface to SQL
	- Decision to allow duplicate tuples, leading to bag-semantics

- Aspects of Studying DBMSs
	- Design
	- Programming
	- Implementation

- CS5614 will cover all aspects!

- Organizational Diagram of a DB: 
	- (0) Data and Metadata
	- (i) Storage Management, 
	- (ii) Query Optimizer, 
	- (iii) Transcation Processor

- Kinds of Memory Storage

- Emphasis on Secondary Storage 
	- cost
	- non-volatility
	- no addresability probs

- Started on Query Optimizer

-----------------------------------------------------
Acknowledgements and References:
	- Readings in DBs, Chapter 1 (The Roots), Stonebraker
	  and Hellerstein, Eds.
	- Dr. Melli Annamalai, Personal Communication, ORACLE Corp.

Wed, Aug 25, 1999
------------------------

- Review of Relational Model, SQL
	- allows separation of logical organization from
	  physical storage structure
	- enables declarative queries
	  e.g. SELECT Name
	       FROM Students
               WHERE age=30

- 3-Tiered Organization of DBs

- Design of a Disk
	- Platters; magnetic material coated on surfaces
	- Spindle
	- Cylinders
	- Tracks
	- Sectors/Blocks (Indivisible Unit of Read/Write)

- Access Time is a sum of three components
	- Seek Time: Time to locate the right cylinder and position
	  the magnetic arm
	- Rotational Latency: Time taken to rotate the platter 
          to position appropriate sector
	- Transfer Time: Actual time for operation
	- Total Time: Order of Milliseconds

- Buffer Manager	
	- Manages Main Memory and has replacement policies

- Index (Indices)
	- A data strucuture that takes a property of a record
          as input and returns records that satisfy that property
          "quickly"
	- (to some extent) its choice dictates the actual physical
          data layout

- Examples of Indices
	- Pointer (Cursor) Techniques (Originally used extensively in NDBs)
	- B-Trees
	- Hash Tables

- Simple Example of Pointer Based Indices
	- Dense Indices
	- Sparse Indices

- Definition of Sparse
	- At most one pointer from the data structure to each page/block/sector

- Definition of Dense
	- One pointer to each record in the table

- For indexing data that is sorted on disk
	- both dense and sparse are appropriate
	- referred to as primary index

- Advantages of Dense
	- allows certain kinds of queries to be efficiently processed
	  (range queries and pre-computing unions, intersections etc.)

- Advantages of Sparse
	- more efficient use of blocks

- For indexing data that is not already sorted on disk
	- only dense will do
	- referred to as secondary index

- A primary index is one that enforces a particular clustering/layout of data

- A secondary index is one that doesn't put 
  any constraint on the data organization

- Commercial Examples of Primary Indices
	- Sybase: Sparse
	- DB2: Dense
	- Oracle: Choice of Both

- B-Trees
	- Generalization of Binary Search Tree
	- Balanced; All leaves at the same level
	- More than one child per node

- Typically, one block (4096 bytes) can have as many as 340 pointers for
  integer valued keys (4 bytes) and pointers (8 bytes) 	

- A typical B-tree does not have to extend beyond three levels
- A typical B-tree does not have to extend beyond two levels

- B-Tree allows efficient operations:
	- Complexity is typically measured in disk accesses, not just time
	- Insertion: O(h)
	- Deletion: O(h)
	- Splitting a Node: O(h) (why would we want to do this?)
	- Lookup: Theta(h) (i.e. it is both upper-bounded 
           and lower-bounded by h)
	- h is the height of the tree
	- Time Complexity: Multiply the above figures by 
           the number of keys/block
          (assuming linear search within a block)
	- Can be spedup even more if we use binary search within each block

- Two ways to organize B-trees
	- Put data only at the lowest level (provides higher fanout)
	- Put data from the beginning (restricts fanout)

- Moral of the Story from Storage Manager
	- "Algorithms and data structures for secondary storage are
	   qualitatively different from those that operate on main memory"

- Moral of the Story from Query Processor
	- "It is possible to be both efficient and declarative"
	- other corners of this spectrum: PROLOG, Network DBs

Fri, Aug 27, 1999
------------------------

- Review of ACID Properties

- Techniques used to achieve each of the properties:
	A - Commitment
	C - Integrity constraints, active elements etc.
	I - Locks
	D - Logging
	
- Locking
	- choice of record level locks vs. relation (page) level locks
	- tradeoffs, advantages, etc.
	- Examples of systems that use various levels of locks
	- 2PL: most commonly used technique of locking

- Logging
	- performed on nonvolatile storage
	- WAL (Write Ahead Logging)

- Active Database Elements
	- overcome passive nature of RDBMSs
	- Constraints
	- Triggers

- Limitations of Active Elements
	- do not adequately support collaborative and workflow systems
          (that require richer domain semantics and multilevel elements)
	- inapplicable for networked, loosely coupled systems
          (with large periods of disconnect)
	- semantically very limited in expressiveness

- Data Modeling
	- Use a design language: e.g. E/R Modeling

- Introduction to E/R

- Entity Sets and Relationships

- Diagrammatic Notation: Rectangles, Ellipses, Diamonds and Arrows

- Students, Courses, and a many-many relationship

- many-one relationship

- one-one relationship

- multiway relationship

- many-one, one-one as specializations of many-many