In our earlier article on database normalization, we discussed how normalization is put into action by following a certain set of rules – called forms.
Here we will go over an example of the most basic form – first normal form, or 1NF for short.
The rules for First Normal Form:
In order to be in First Normal Form, each and every table in a database must follow these rules – we will also explain further what they mean with actual examples:
1. Each column in a given table can contain only ONE value of a given type. This is also described as being atomic or indivisible. 2. There are no repeating columns containing related data.
Examples of tables violating First Normal Form
To further clarify what we mean when we say that each column can contain only ONE value of a given type, let’s go through a simple example. Suppose we have a table called People, where each entry is a different person. And, for each person, we have an ethnicity column. Some people can have more than one ethnicity – for example someone can be half Caucasian and half Asian. This means for that particular person the entry would be “Caucasian, Asian”.
Having multiple values for ethnicity violates the rule of 1st normal form which states that a column can contain only one value of a given type – where the type in this case is “ethnicity”.
Example of a non-atomic column
If a column is atomic it means that it can not be further subdivided. Clearly the ethnicity column in our example is NOT atomic – an entry like “Caucasian, Asian” could be divided into two separate entries – “Asian” and “Caucasian”.
So, how can we get the People table to satisfy first normal form? Well, we could just create a new table called Person Ethnicity. And we could have two columns in that table – one named Person ID and the other Ethnicity. The Person ID could simply be a foreign key linking to a primary key in the Person table. And Ethnicity would be a column holding only one ethnicity at a time. So, for someone named Martha who is half Asian and half Caucasian, she would have TWO separate entries in the Person_Ethnicity table. One would be for Asian and the other for Caucasian, and this keeps the ethnicity atomic.
What is meant by no repeating columns containing related data
To further clarify what we mean by the second rule of first normal form – that there are no repeating columns containing related data – let’s go through another simple example. Suppose we have a table called Movies that holds the data for different movies and also has columns for each actor in the movie – so something like actor1, actor2, actor3, etc.
Here is what the Movies table we described would look like:
|Movie_name||Actor 1||Actor 2|
|Jerry Maguire||Tom Cruise||Cuba Gooding Jr.|
|Jurassic Park||Sam Neill||Laura Dern|
|Shawshank Redemption||Morgan Freeman||Tim Robbins|
|Top Gun||Tom Cruise||Kelly McGillis|
But, there is a problem with this Movies table. Clearly, these movies had more than just two actors. So, if we need to add actors, we will need to add more columns for each actor – something like actor3, actor4, and so on. This is obviously a bad idea because some movies might have hundreds of actors and some might have only 10 actors. This means that there could be a lot of wasted space and empty entries for the movies with fewer actors. So, this is the problem that the second rule – that there are no repeating columns containing related data allowed – in first normal form looks to solve.
How to convert tables to First Normal Form
We’ve identified the problem, but now how to solve it? Well, it’s pretty simple, we can just create a separate table for all the actors – call it Actors. Then we can have an entry for a movie ID that would act as a foreign key to link each actor to a movie. So, it could look like this:
|MovieID||Actor First_Name||Actor Last_Name|
And, we would change the Movies table to something like this:
Note that Movie ID is a foreign key pointing back to the Movies table. And with those changes, we are now in First Normal Form!