Ideally, normalize it so hard that an
understanding of abnormal is abolished. Unfortunately, we do not live in an
ideal world. The database administrator desires normalization while the
developer craves denormalization. Good design is simple to represent
denormalized.
C.J. Date said that database design is common sense formalized. Expounding upon
that, you do not need to know relational theory to be capable of good design.
Hopefully, you enjoy what you are doing enough to take an interest, but it is
by no means mandatory. Critical thinking will often get you farther than theory
alone.
With that said, normalization is great theory. The naysayer will whine about
performance costs. You may be reading more tables, but you are doing it more
efficiently which just as often results in… wait for it… less I/O and/or
quicker retrieval times. The naysayer is generally considering only one side of
performance: reading. A well normalized model yields more joins than a flatter
model, and joins are expensive. However, more and smaller tables may improve
write operations, overall. Additionally, a denormalized model often requires
more supporting indices. This further degrades write performance, may increase
fragmentation, requires more disk space and--getting overly
nitpicky--complicates the optimizer’s job by forcing consideration of
additional plans. I am not saying that denormalization does not have a place;
just that it is typically in an analytical/reporting environment. Nor am I
saying that normalized design does result in poor read performance. Poor
implementation results in poor performance; don’t blame normalization (you can
blame DBMSs for some of this (SQL should better
provide for relational algebra operations)). If someone is using performance as
an excuse not to normalize their OLTP model, there is a strong chance that they
are an idiot.
There is no need to take everything to the fifth normal form; a good DBA knows
when to be a bit forgiving—but you best have some sound reasoning backing you
up. For most applications, the third normal form is sufficient. Start with your
ideal model and adjust it around your implementation. Your concessions should
be small to none for OLTP models. You can present it to your application
developers however they want it; don’t let them break your rules unless you
cannot accommodate their needs.
A well normalized model results in a database containing tables with an average
of six to eight fields. Outside of averages, trust your intuition. If something
looks good and feels correct (you kept it simple, right?), it probably is—move
on.
There are standard structures that are much easier not to normalize. Names and
addresses, for instance. Most of the time, we turn a blind eye and throw some
columns into a table because it is much easier than normalizing. This does not
mean that names and addresses should be in one table; obviously, if the entity
associated with the address can have multiple addresses, that needs to be factored into your design.
Consider the selectivity of your data. If it is low, maybe normalizing saves
some space. If not, maybe we can let this slide (like addresses and suite numbers).
Also consider your methodology. If you are using an integer as your surrogate
key, what will normalizing area codes do? If you store the area code as an
integer, it actually adds the domain times two and creates an additional join.
If the area code is three characters, that is smaller than the key. Don’t get
carried away with normalizing your design, but remain vigilant.
I once implemented a project where normalizing the names and migrating
surrogate keys saved a lot of space and did not sacrifice performance (quite
the opposite). Look at the Social Security
Administrations Death Master File; millions of names with only a couple
hundred thousand unique first names and surnames. This is an anomaly (not the
names, normalizing the names). The important thing is being aware of all the
factors and accounting for them in your design.