1
Design Theory for Relational Databases
Functional DependenciesDecompositionsNormal Forms
2
Redundancy
Dependencies between attributes cause redundancy Ex. All addresses in the same town
have the same zip code
SSN Name Town Zip …….1234 Joe Stony Brook 11733 ……..4321 Mary Stony Brook 117335454 Tom Stony Brook 11733…………………. redundant
3
Redundancy and Other Problems
Set valued attributes result in multiple rows in corresponding table
Example: Person (SSN, Name, Address, Hobbies) A person entity with multiple hobbies yields multiple rows
in table Person• Hence, Name, Address stored redundantly
SSN should be the key, but instead (SSN, Hobby) is key of corresponding relation
• Person can’t describe people without hobbies!
SSN Name Address Hobby1111 Joe 123 Main biking1111 Joe 123 Main hiking …………….
4
Anomalies
Redundancy leads to anomalies: Update anomaly: A change in
Address must be made in several places
Deletion anomaly: Suppose a person gives up all hobbies. Do we:• Set Hobby attribute to null (no, since
Hobby is part of key)• Delete the entire row (no, since we lose
other information in the row) Insertion anomaly: Hobby value must
be supplied for any inserted row (since Hobby is part of key)
5
Decomposition
Solution: use two relations to store Person information Person1 (SSN, Name, Address) Hobbies (SSN, Hobby)
The decomposition is more general: people with hobbies can now be described
No update anomalies: Name and address stored once A hobby can be separately supplied
or deleted
6
Normalization Theory
Anomalies: update anomaly occurs if changing the value
of an attribute leads to an inconsistent database state.
insertion anomaly occurs if we cannot insert a tuple due to some design flaw.
deletion anomaly occurs if deleting a tuple results in unexpected loss of information.
Normalization is the systematic process for removing all such anomalies in database design, based on functional dependencies
7
Functional Dependencies
X ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y. Say “X ->Y holds in R.” Convention: …, X, Y, Z represent sets of
attributes; A, B, C,… represent single attributes.
Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.
8
Splitting Right Sides of FD’s
X->A1A2…An holds for R exactly when each of X->A1
X->A2,…, X->An hold for R.
Example: A->BC is equivalent to A->B and A->C.
There is no splitting rule for left sides. We’ll generally express FD’s with
singleton right sides.
9
Functional Dependencies Dependencies
for this relation: A B A D B,C E,F
Do they all hold in this instance of the relation R?
R A B C D E F a1 b1 c1 d1 e1 f1 a1 b1 c2 d1 e2 f3 a2 b1 c2 d3 e2 f3 a3 b2 c3 d4 e3 f2 a2 b1 c3 d3 e4 f4 a4 b1 c1 d5 e1 f1
• Functional dependencies are specified by the database programmer based on the intended meaning of the attributes.
10
Example: FD’s
Drinkers(name, addr, beersLiked, manf, favBeer)
Reasonable FD’s to assert:1. name -> addr favBeer
Note this FD is the same as name -> addr and name -> favBeer.
2. beersLiked -> manf
11
Example: Possible Data
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. BudSpock Enterprise Bud-Lite A.B. Bud
Because name -> addr Because name -> favBeer
Because beersLiked -> manf
12
Keys of Relations
K is a superkey for relation R if K functionally determines all of R.
K is a key for R if K is a superkey, but no proper subset of K is a superkey (or K is minimal superkey).
13
Example: Superkey
Drinkers(name, addr, beersLiked, manf, favBeer)
{name, beersLiked} is a superkey because together these attributes determine all the other attributes. name -> addr favBeer beersLiked -> manf
14
Example: Key
{name, beersLiked} is a key because neither {name} nor {beersLiked} is a superkey. name doesn’t -> manf; beersLiked
doesn’t -> addr. There are no other keys, but lots of
superkeys. Any superset of {name, beersLiked}.
15
Where Do Keys Come From?
1. Just assert a key K. The only FD’s are K -> A for all
attributes A.
2. Assert FD’s and deduce the keys by systematic exploration.
16
More FD’s From Derivation
Example: “no two courses can meet in the same room at the same time” tells us: hour room -> course.
17
Inferring FD’s We are given FD’s
F={X1 -> A1, X2 -> A2,…, Xn -> An }, and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD’s.
What if we what all possible FD’s that follow from F? Example: If A -> B and B -> C hold,
surely A -> C holds, even if we don’t say so.
Important for design of good relation schemas.
18
Closure of FDs If F is a set of FDs on schema R and f
is another FD on R, then F entails f if every instance r of R that satisfies F also satisfies f Ex: F = {A B, B C} and f is A C
If Streetaddr Town and Town Zip then Streetaddr Zip
The closure of F, denoted F+, is the set of all FDs entailed by F
F and G are equivalent if F entails G and G entails F
19
Computing the Closure F+: Armstrong’s Axioms of FDs Reflexivity: If Y X then X Y
(trivial FD) Name, Address Name
Augmentation: If X Y then X Z YZ If Town Zip then
Town, Name Zip, Name Transitivity: If X Y and Y Z then
X Z
20
Other derived rules
Union: If X Y and X Z, then X YZ Derivation using only Armstrong’s Axioms:
• X YX (augment), YX YZ (augment)• thus X YZ (transitive)
Decomposition: If X YZ, then X Y and X Z Derivation using only Armstrong’s Axioms:
• YZ Y (reflexive), thus X Y (transitive)• YZ Z (reflexive), thus X Z (transitive)
F
AB C AB BCD A D AB BD AB BCDE AB CDE
D E BCD BCDE
21
Generating F+
Thus, AB BD, AB BCD, AB BCDE, and AB CDE are all elements of F+
unionaug trans
aug
decomp
22
Computing the Closure F+ via
Attribute Closure(Y+) Calculating attribute closure is a more efficient way of checking entailment
The attribute closure of a set of attributes Y with respect to a set of functional dependencies, F, (denoted YF
+ or just Y+) is the set of all attributes,
A, such that F entails Y A Checking entailment: Given a set of
FDs, F, then Y A if and only if A Y+
23
Closure Test
F={X1 -> A1, X2 -> A2,…, Xn -> An } Test to compute the closure of Y,
denoted Y +. Basis: Y + = Y. Induction: Look for an FD’s left side
Xi that is a subset of the current Y +. If the FD is Xi -> Ai, add Ai to Y +.
24
Computation of Attribute Closure
AB C (a) A D (b) D E (c) AC B (d)
AB C (a) A D (b) D E (c) AC B (d)
Problem: Compute the attribute closure of Y=AB With respect to the set of FDs:
Initially Y+ = Y = {AB}Solution:
25
Algorithm for Computing F+ via Attribute Closure
1. For each subset of attributes X, compute X +.2. Add non-trivial FDs: X ->A for all A in X + - X.3. Drop non-minimal FDs: XY ->A can be
dropped whenever we discover X ->A4. Some Tricks:
1.No need to compute the closure of the empty set or of the set of all attributes.
2.If we find X + = all attributes, so is the closure of any superset of X.
26
Example – Computing F+ via Attribute Closure
F = {AB C, A D, D E, AC B}
F = {AB C, A D, D E, AC B}
Y Y+ New FDs
A {A,D,E} A->EB {B} (trivial)C {C} (trivial)D {D,E} (original FD)E {E} (trivial) AB {A,B,C,D,E} (AB is a key) AB->CDEAC {A,B,C,D,E} (AC is a key) AC->BDEAD {A,D,E} (not minimal)AE {A,D,E} (not minimal)BC {B,C} (trivial)BD {B,D,E} BD->EBE {B,E} (trivial)CD {C,D,E} CD->ECE {C,E} (trivial)DE {D,E} (trivial)… (and so on)
Y Y+ New FDs
A {A,D,E} A->EB {B} (trivial)C {C} (trivial)D {D,E} (original FD)E {E} (trivial) AB {A,B,C,D,E} (AB is a key) AB->CDEAC {A,B,C,D,E} (AC is a key) AC->BDEAD {A,D,E} (not minimal)AE {A,D,E} (not minimal)BC {B,C} (trivial)BD {B,D,E} BD->EBE {B,E} (trivial)CD {C,D,E} CD->ECE {C,E} (trivial)DE {D,E} (trivial)… (and so on)
27
Finding All Implied FD’s in a Projected Schema
Motivation: “normalization,” the process where we break a relation schema into two or more schemas.
Example: ABCD with FD’s AB ->C, C ->D, and D ->A. Decompose into ABC, AD What FD’s hold in ABC ?
• Not only AB ->C, but also C ->A !
Same as previous algorithm, but restrict to those FD’s that involve only attributes of the projected schema
28
Example: Projecting FD’s
ABC with FD’s A ->B and B ->C. Project onto AC. A +=ABC ; yields A ->B, A ->C.
• We do not need to compute AB + or AC +. B +=BC ; yields B ->C. C +=C ; yields nothing. BC +=BC ; yields nothing.
Resulting FD’s: A ->B, A ->C, andB ->C.
Projection onto AC : A ->C. Only FD that involves a subset of {A,C }.
29
Normal Forms & Relation Decomposition
30
Relational Schema Design
Goal of relational schema design is to avoid anomalies and redundancy. Update anomaly : one occurrence of a fact
is changed, but not all occurrences. Deletion anomaly : valid fact is lost when a
tuple is deleted Insertion anomaly : cannot insert a valid
fact when some information is unknown (i.e., extraneous attributes in primary key)
31
Example of Bad Design
Drinkers(name, addr, beersLiked, manf, favBeer)
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway ??? WickedAle Pete’s ???Spock Enterprise Bud ??? Bud
Data is redundant, because each of the ???’s can be figuredout by using the FD’s name -> addr favBeer andbeersLiked -> manf.
32
This Bad Design AlsoExhibits Anomalies
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. Bud
• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.• Insertion anomaly: Cannot insert Picard if we do not know what beers he likes
33
Decompositions Do we need to decompose a relation?
Several normal forms for relations. If schema in these normal forms certain problems don’t arise
The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)
What problems does decomposition cause? Lossless-join property: get original relation by
joining the resulting relations Dependency-preservation property: enforce
constraints on original relation by enforcing some constraints on resulting relations
Queries may require a join of decomposed relations!
34
Boyce-Codd Normal Form
We say a relation R is in BCNF if whenever X ->Y is a nontrivial FD that holds in R, X is a superkey. Remember: nontrivial means Y is not
contained in X. Remember, a superkey is any
superset of a key (not necessarily a proper superset).
35
Example
Drinkers(name, addr, beersLiked, manf, favBeer)FD’s: name->addr favBeer, beersLiked->manf
Only key is {name, beersLiked}. In each FD, the left side is not a
superkey. Any one of these FD’s shows
Drinkers is not in BCNF
36
Another Example
Beers(name, manf, manfAddr)FD’s: name->manf, manf->manfAddr Only key is {name} . name->manf does not violate BCNF,
but manf->manfAddr does.
37
Decomposition into BCNF
Given: relation R with FD’s F. Look among the given FD’s for a BCNF
violation X ->Y. If any FD following from F violates BCNF,
then there will surely be an FD in F itself that violates BCNF.
Compute X +. Not all attributes, or else X is a superkey.
38
Decompose R Using X -> Y
Replace R by relations with schemas:
1. R1 = X +.
2. R2 = X + (R – X +)
1. Project given FD’s F onto the two new relations.
39
Example: BCNF Decomposition
Drinkers(name, addr, beersLiked, manf, favBeer)
F = name->addr, name -> favBeer,beersLiked->manf
Pick BCNF violation name->addr. Close the left side: {name}+ = {name,
addr, favBeer}. Decomposed relations:
Drinkers1(name, addr, favBeer) Drinkers2(name, beersLiked, manf)
40
Example -- Continued
We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.
Projecting FD’s is easy here. For Drinkers1(name, addr, favBeer),
relevant FD’s are name->addr and name->favBeer. Thus, {name} is the only key and
Drinkers1 is in BCNF.
41
Example -- Continued
For Drinkers2(name, beersLiked, manf), the only FD is beersLiked->manf, and the only key is {name, beersLiked}.
Violation of BCNF. beersLiked+ = {beersLiked, manf},
so we decompose Drinkers2 into: Drinkers3(beersLiked, manf) Drinkers4(name, beersLiked)
42
Example -- Concluded
The resulting decomposition of Drinkers :1. Drinkers1(name, addr, favBeer)2. Drinkers3(beersLiked, manf)3. Drinkers4(name, beersLiked)
Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.
Top Related