Database Management


What is Normalization and how does it affect databases? Normalization is a series of steps, called normal forms, used to ensure that data anomalies are not created when information is inserted, updated or deleted. Without normalizing, changes made to the database to reflect changes in reality will create inconsistencies in the data, affectively making the data useless.

As I stated earlier, Normalization is a series of steps or normal forms. First is zero normal form, often written as 0NF. At this point the table has multi-valued cells. An example of a multi-valued cell is when you try to store 2 different majors for a student in the same cell. To fix this problem the determinant of the multi-valued cell should be stored as a separate attribute and linked to the primary key to create a composite primary key.

Where most tables start out at, however, is in first normal form. The next step in normalization is to move into second normal form or 2NF. To be in 2NF all partial functional dependencies must be removed. Partial functional dependencies occur when there is a composite primary key and only one part of the key is needed to determine one or more other attributes. There are two ways to fix partial functional dependencies. One is to create a table with an atomic primary key. By definition a table with an atomic primary key is already in second normal form. The second way is to project out the partial functional dependency into its own table, effectively creating two tables from one.

The next step is to move into third normal form or 3NF. In order to reach 3NF all transitive functional dependencies must be removed. Similar to partial functional dependencies, transitive functional dependencies occur when a non-key attribute can determine another non-key attribute. A very common occurrence of this is when you have both the state and the zip code listed in a table. Neither are keys but knowing the zip code can determine the state. Removing transitive functional dependencies is done by projecting out the transitive dependency to another table while leaving the determinant as a foreign key in the original table.

Many businesses will be satisfied with their database being in 3NF and stop there. Others may require you go into what is called Boyce-Codd Normal Form or BCNF. In order to violate the rules for BCNF three situations must exist.

1.Two or more candidate keys must exist.

2.One or more key attributes must overlap.

3.At least one key must be composite.

To fix BCNF problems the overlapping composite key must be projected out to a new table.

At this point the database will accurately represent reality provided your data is accurate. There is however 3 more stages of normalization that are so remote they almost never exist. 4NF is removing any multi-valued dependencies. 5NF is removing any join dependencies and finally Domain Key Normal Form or DKNF states that every constant is a consequence of domain and key constraints. DKNF is theoretically the holy grail of normalization where insertion, deletion, or updates can never exist. The process for reaching these last there normal forms is difficult and largely academic. In most cases it is enough just to know that they exist.

As an aside, many database designers may not remove all violations of all normal form rules. For example, state and zip codes, though a transitive dependency, will rarely be projected out because it is largely unnecessary to do so. Creating additional tables requires more CPU at runtime to join tables back together and the likelihood of a new state being added is remote.