Working as a data migration and integration specialist, I usually have to work with systems that have been designed and built by others. I have little or no control over the names that have been used for things such as database tables, columns, and a myriad of other types of object.
Sometimes, this is not a problem; however, it is all to often the case that, in their haste to put together a system on time and to budget, everything that the designers have learned about standards, appears to have been cast aside. There is usually a cost for these omissions further down the line.
We’ll talk, here, in the context of a relational database; however these basic rules should apply, whatever the nature of your system and storage.
In this article, we’ll look at nomenclature – the devising or choosing of names for things.
As with most things, you need standards, and this is probably the best place to start. Without giving something a name, it is difficult to do much else with it.
What’s the Problem?
Here’s a part-definition of a couple of simple example tables that you may find in a typical system.
Customer CustomerId Integer primary key, CustomerName Varchar(80), AddressLine1 Varchar(40), AddressLine2 Varchar(40), ... Supplier SupplierId Integer primary key, SupplierName Varchar(80), AddressLine1 Varchar(40), AddressLine2 Varchar(40), ...
Now let’s take a look where things are not so elegant. This may, at first glance, seem a little exaggerated; however, for many readers, this will be an all too familiar sight. I’ll leave the reader to spot the problems.
Customer CustomerId Integer primary key, CustomerName Varchar(80), AddressLine1 Varchar(40), AddressLine2 Varchar(40), ... Suppliers Id Integer primary key, Name Varchar(80), Address_Line_1 Varchar(40), Address_Line_2 Varchar(30), ...
Let’s now take a look at what’s going wrong in this second example.
Often, vendors place limitations on how you may specify names when using their product. These limitations may limit you to the length of the name or the characters that you may use. This invariably means that one standard may not fit all; however, you can usually align your standards across your vendors so that you end up with something that is both pragmatic and easy to understand. Some vendors will, of course, make their own recommendations and you should take these in to account.
In the early days of computing, when resources were limited, name lengths were usually limited in length meaning that you often had to deal with names that were cryptic rather than descriptive. This is, however, not the case today.
Oracle, for example, has historically placed a 30 character limitation on table, column and other object names. Although this may seem restrictive when compared to some other products, it is important to get the right balance so that your names are descriptive rather than unwieldy.
Consistency is our watchword. Whatever decisions you take, just be consistent.
Characters and Case
Make a decision on the characters and case that you are going to use. Here are some of the common choices. Choose one and stick with it.
- All upper case
- All lower case
- Underscores between words
Singular vs. Plural
Is it Customer or Customers? Decide if you’re using singular names or plural names. If it’s Customer, then it’s not Suppliers.
My preference is for always using singular names.
There is argument for mixing plural and singular. Consider tables and rows. Tables are sets whereas rows are singletons. It therefore makes sense to some that their table names are plural and their column names are singular.
For me it seems more straightforward to use the uninflected noun, which in English, is in the singular. This removes the need to worry about irregular plurals, words that do not have a plural at all, and long discussions on semantics.
Consider the following two SQL statements. Which reads better?
select Customer.CustomerName from Customer where Customer.AddressLine1 = '742 Evergreen Terrace'; select Customers.CustomerName from Customers where Customers.AddressLine1 = '742 Evergreen Terrace';
There are often some lively debates on the subject of singular vs. plural, so I’ll bow out now, only to say that this is the way that I do it rather than this necessarily being the only way.
Do not use redundant prefixes (or suffixes). There is no need to prefix your table names with “TBL_” or a need to prefix column names with their table names. If it’s a table, the data dictionary will tell us that it is a table. If the column belongs to the Customer table, it will tell us that too.
There are times when you will want prefixes (see Disambiguation); just use them wisely.
Sometimes, you need to disambiguate and this is often achieved using prefixes (see Redundant Prefixes).
In my good table-definition examples above, I have prefixed both the Id and Name columns with the table name. I have disambiguated Id as it is likely to be used as a foreign key, so it is almost certainly going to have a name-clash if referred to in other tables.
Although it may seem obvious that in our Customer table, the column called Name will refer to the Customer’s name, it does have the feeling of vagueness and there may be other types of name that could be used in the context of a customer, for example, the name of a contact (person) within an organisation. There are also some words that, depending on your software vendor, may be reserved words. Name has the feeling of one of those words. Although some vendors provide methods for you to use reserved words for your names, I always steer clear of these as you usually get tripped-up at some point.
In our simple Customer table, we have started to define a postal address, starting with AddressLine1 and AddressLine2. Setting aside discussions on relational database normalisation, this is a good starting point for naming the components that will make up a postal address. It is likely that we will have multiple postal addresses for a single entity, so we’ll achieve this using our disambiguation rules. Our revised Customer table might look something like this: –
Customer CustomerId Integer primary key, CustomerName Varchar(80), ShippingAddressLine1 Varchar(40), ShippingAddressLine2 Varchar(40), ... BillingAddressLine1 Varchar(40), BillingAddressLine2 Varchar(40), ...
Don’t abbreviate unnecessarily and if you are abbreviating, try to use recognised abbreviations rather than making up your own.
There’s nothing worse than a Customer table that is called CSTMR. There is no need for it.
Try to use abbreviations only where they really are universally recognised, or you really do need to reduce the length of a name. Try to consider these in advance so that you do not end up with a system that sometimes uses the full word and other times the abbreviated word. If you really do need to abbreviate Invoice to Inv, try to do it in a consistent manner.
Too Many Cooks
One of the reasons that your naming standards get out of control is that there is no gate keeper. Often, the creation of new database tables and columns are left to the developers, and they often add them as and when they need to use them. They all have their own standards (or not) and mayhem invariably prevails.
Maintain a data dictionary.
Once you’ve defined your perfect definition of AddressLine1, maintain it in a dictionary together with your disambiguated names. This should be your reference for creating any new tables or columns.
AddressLine1 Varchar(40) ShippingAddressLine1 BillingAddressLine1 ...
If you’re using a relational database, this has already been done for you. Use it.
Stand on the Shoulders of Giants
Remember that you do not need to invent all of this stuff yourself. Others may have already invested their time in determining the perfect structure for a postal address. This type of information is usually freely available and backed-up by a recognised standard.
If you need to add a column to support Email addresses, the column should be called EmailAddress (if using CamelCase), not Email. I’ve no idea if a column named Email is intended to hold an Email Address or a Customer’s Email preferences. Have you?
And, of course, the correct maximum length for an Email Address is 254 characters. Not 40, 80, 100 or 255.