Database Normalization

Normalization is the process of organizing a database to remove redundant, duplicated data, and to group related items together to allow efficient storage, retrieval and modification of data.

We need to build a data model for our demo application, then convert the model into a schema of relational database tables and associated objects.


The demo application is based on Oracle Education's training database, known as the Scott/Tiger schema. The schema contains tables used to hold sales order and employee information. The demo application schema has some additional tables used to log errors, import CSV data, and determine which days in a period are business working days.

Install Schema DDL View the schema creation DDL here    


The following example illustrates the process of converting non-relational order data stored in a spreadsheet into the normalized relational database tales used by our demo application.



Order Spreadsheet
Order ID Order Date Cust ID Name Item ID Product ID Description Price Qty Total
601 01-MAY-86 106 Shape Up 1 200376 SB ENERGY BAR-6 PACK 2.40 1 2.40
604 15-JUN-86 106 Shape Up 1 100890 ACE TENNIS NET 58.00 3 174.00
2 100861 ACE TENNIS RACKET II 42.00 2 84.00
3 100860 ACE TENNIS RACKET I 44.00 10 440.00
610 07-JAN-87 101 TKB Sport Shop 1 100860 TENNIS RACKET I 35.00 1 35.00
2 100870 TENNIS BALLS-3 PACK 2.80 3 8.40
3 100890 ACE TENNIS NET 58.00 1 58.00

Note:

  • An order can have one or more associated products
  • The columns containing product information are repeated
  • The product columns have been wrapped to fit on the page, but imagine them being all on a single row for each order



First Normal Form


A relation is in first normal form if it conforms to the following rule:

  • Each row, or record, in your database table must be uniquely identified by a Primary Key, with no repeating groups of fields.


In our example, the Product ID, Description, Price, Qty and Total columns are repeated several times for each order.


There are several problems with this organization of the data:

  • A customer may order 1,2,10 or more products per order. Your database table would need multiple sets of columns to store the product information for each order
  • You will be restricted as to how many products a customer can order at one time depending on how many columns you created in your table
  • You could potentially waste a lot of space in your database if you provide columns for 10 products per order, but customers typically order one or two items each time


To make the database comply with First Normal Form we must split the repeating product information into separate rows, each uniquely identified by a key comprising one or more columns.



Table in First Normal Form
ORDID (P. Key #1) ORDERDATE CUSTID NAME ITEMID (P. Key #2) PRODID DESCRIPTION PRICE QTY TOTAL
601 01-MAY-86 106 Shape Up 1 200376 SB ENERGY BAR-6 PACK 2.40 1 2.40
604 15-JUN-86 106 Shape Up 1 100890 ACE TENNIS NET 58.00 3 174.00
604 15-JUN-86 106 Shape Up 2 100861 ACE TENNIS RACKET II 42.00 2 84.00
604 15-JUN-86 106 Shape Up 3 100860 ACE TENNIS RACKET I 44.00 10 440.00
610 07-JAN-87 101 TKB Sport Shop 1 100860 TENNIS RACKET I 35.00 1 35.00
610 07-JAN-87 101 TKB Sport Shop 2 100870 TENNIS BALLS-3 PACK 2.80 3 8.40
610 07-JAN-87 101 TKB Sport Shop 3 100890 ACE TENNIS NET 58.00 1 58.00

Each row can be identified by a unique Primary Key comprising the ORDID plus ITEMID.



Second Normal Form


A relation is in second normal form if it conforms to the following rules:

  • It is in first normal form
  • Each column, or field in the record must depend on the whole Primary Key and not just part of it


The columns ORDERDATE, CUSTID and NAME are duplicated in several rows, as they are dependent on only part of the Primary Key, ORDID. To resolve this problem, we need to create a new table ORD that will hold a single row for each unique order.


ORD
ORDID (P. Key) ORDERDATE CUSTID NAME
601 01-MAY-86 106 Shape Up
604 15-JUN-86 106 Shape Up
610 07-JAN-87 101 TKB Sport Shop

There are still a number of problems with this new table:

  • The customers' names are repeated on multiple rows
  • If a customer's name changes, and not all rows are updated, there will be inconsistencies in the database


The remaining columns will be placed in a new table called ITEM.


ITEM
ORDID (P. Key #1) ITEMID (P. Key #2) PRODID DESCRIPTION ACTUALPRICE QTY TOTAL
601 1 200376 SB ENERGY BAR-6 PACK 2.40 1 2.40
604 1 100890 ACE TENNIS NET 58.00 3 174.00
604 2 100861 ACE TENNIS RACKET II 42.00 2 84.00
604 3 100860 ACE TENNIS RACKET I 44.00 10 440.00
610 1 100860 TENNIS RACKET I 35.00 1 35.00
610 2 100870 TENNIS BALLS-3 PACK 2.80 3 8.40
610 3 100890 ACE TENNIS NET 58.00 1 58.00


Third Normal Form


A relation is in third normal form if it conforms to the following rules:

  • The database must conform to the second normal form rules
  • No column must depend on any other column except the Primary Key


If a column is uniquely identified through one or more other columns in addition to the Primary Key, this is known as transitive dependence, and breaks the rules of third normal form.

The name of the customer associated with each order depends directly on the CUSTID, not on the Primary Key, ORDID. We need to create a new table to hold the customer information, which will eliminate the duplication of customer names.


CUSTOMER
CUSTID (P. Key) NAME
101 TKB Sport Shop
106 Shape Up

The ORD table now contains the following:

ORD
ORDID (P. Key) ORDERDATE CUSTID
601 01-MAY-86 106
604 15-JUN-86 106
610 07-JAN-87 101

CUSTID on the ORD table is said to be a Foreign Key, linking to CUSTID on the CUSTOMER table.

The product information that is not wholly related to the Primary Key of the Item must be moved to a new table named PRODUCT, to eliminate the duplicates of description.


PRODUCT
PRODID (P. Key) DESCRIPTION
100860 ACE TENNIS RACKET I
100861 ACE TENNIS RACKET II
100870 ACE TENNIS BALLS-3 PACK
100890 ACE TENNIS NET
200376 SB ENERGY BAR-6 PACK

The ITEM table now contains the following columns:


ITEM
ORDID (P. Key #1) ITEMID (P. Key #2) PRODID ACTUALPRICE QTY
601 1 200376 2.40 1
604 1 100890 58.00 3
604 2 100861 42.00 2
604 3 100860 44.00 10
610 1 100860 35.00 1
610 2 100870 2.80 3
610 3 100890 58.00 1

The column PRODID on ITEM is a Foreign Key, relating to the Primary Key of the Product table.


Calculated Values


This is not a normalization problem, but you can remove the TOTAL column from the ITEM table to reduce the risk of data inconsistencies.

The TOTAL column in the ITEM table is derived by multiplying QTY by ACTUALPRICE. If you included a total column on the Item table, you would need to re-calculate and store its value every time a price or quantity was changed.



Normalized Database Tables

The following diagram shows the database tables in Third Normal Form, with their respective joins, primary and foreign keys. Columns have been added to illustrate how the database can be developed further.

Entity Relationship Diagram

You can see the completed sales order data model here. Sales Data Model



In the next blog post, we will create a useful procedure that our demo application can use to log error messages.

Previous Blog Next Blog