Typical Legacy Data Problems
The table below was originally in an article entitled "Challenges With Legacy Data" by Scott Ambler on IBM's DeveloperWorks website.
The original article concentrates largely on integration of existing legacy databases into an object-orientated environment rather than data migration, but the typical problems found when working with legacy data are the same and the information remains relevant.
|
Problem |
Example |
Potential impact |
|
A single column being used for several purposes |
Additional information for an inventory item is stored in the Notes column. Additional information will be one or more of: a lengthy description of the item, storage requirements, or safety requirements when handling the item. |
|
|
The purpose of a column is determined by the value of one or more other columns |
If the value of DateType
is 17, then PersonDate represents the date
of birth of the person. If the value is 84, then PersonDate
is the person's date of graduation from high school. If the value is between
35 and 48, then it is the date the person entered high school. |
|
|
Incorrect data values |
The AgeInYears column for a person contains the value -3. Or the AgeInYears column contains 7 although the BirthDate is August 14 1967 and the current date is October 10 2001. |
|
|
Inconsistent/incorrect data formatting |
The name of a person is stored in one table in the format
"Firstname Surname" and in another table
in the format "Surname, Firstname". |
|
|
Missing data |
The date of birth of a person has not been recorded
in some records. |
|
|
Missing columns |
You need a middle name of a person but a column for it does not exist. |
|
|
Additional columns |
The Social Security number for a person is stored in the database and you don't need it. |
|
|
Multiple sources for the same data |
Customer information is stored in three separate legacy databases. |
|
|
Important entities, attributes, and relationships
hidden and floating in text fields |
A notes text field contains the information ("Clark and Lois Kent, Daily Planet Publications"). |
|
|
Data values that stray from their field descriptions and business rules |
The maiden name column is being used to store a person's fabric preference for clothing. |
|
|
Various key strategies for the same type of entity |
One table stores customer information using the Social Security number as the key, another uses the ClientID as the key, and another uses a surrogate key. |
|
|
Unrealized relationships between data records |
A customer has a summer home. Both pieces of data are recorded in your database, but there is no relationship stored in the database regarding this fact. |
|
|
One attribute is stored in several fields |
The Person class requires a single name field
whereas it is stored in the columns FirstName
and Surname in your database. |
|
|
Inconsistent use of special characters |
|
|
|
Different data types for similar columns |
A customer ID is stored as a number in one table and a string in another. |
|
|
Different levels of detail |
An object requires the total sales for the month but
your database stores individual totals for each order, or an object requires
the weight of individual components of an item, such as the doors and engine
of a car, but your database only records the aggregate weight. |
|
|
Different modes of operation |
Some data is a read-only snapshot of information whereas other data is read write. |
|
|
Varying timeliness of data |
The Customer data is current, Address
data is one day out of date, and the data pertaining to countries and states
is accurate to the end of the previous quarter because you purchase that
information from an external source. |
|
|
Varying default values |
Your object uses a default of Green for a given value yet another application has been using Yellow, resulting in a preponderance (in the opinion of your users) of Yellow values stored in the database. |
|
|
Various representations |
The day of the week is stored as T, Tues, 2, and Tuesday in four separate columns. |
|
Scott W. Ambler is President of Ronin International, a consulting firm specializing in object-oriented software process mentoring, architectural modeling, and Enterprise JavaBeans (EJB) development. He has authored or co-authored several books about object-oriented development, including the recently released The Object Primer 2nd Edition. He can be reached at scott.ambler@ronin-intl.com and at his Web site at www.ambysoft.com.