Slowly
Changing Dimensions
Dimensions that change over time are called Slowly
Changing Dimensions. For instance, a product price changes over time; People
change their names for some reason; Country and State names may change over
time. These are a few examples of Slowly Changing Dimensions since some changes
are happening to them over a period of time.
Slowly
Changing Dimensions are often categorized into three types namely Type1, Type2
and Type3. The following section deals with how to capture and handling these
changes over time.
The
"Product" table mentioned below contains a product named, Product1
with Product ID being the primary key. In the year 2004, the price of Product1
was $150 and over the time, Product1's price changes from $150 to $350. With
this information, let us explain the three types of Slowly Changing Dimensions.
Product
Price in 2004:
Product
ID(PK) Year Product Name Product Price
1 2004
Product1 $150
1.SCD TYPE1(Slowly Changing Dimension) : contains current
data.
2.SCD TYPE2(Slowly Changing Dimension) : contains current
data + complete historical data.
3.SCD TYPE3(Slowly Changing Dimension) : contains current
data + one type historical data.
Type 1: Overwriting the old values.
In the
year 2005, if the price of the product changes to $250, then the old values of
the columns "Year" and "Product Price" have to be updated
and replaced with the new values. In this Type 1, there is no way to find out
the old value of the product "Product1" in year 2004 since the table
now contains only the new price and year information.
Product
Product
ID(PK) Year Product Name Product Price
1 2005
Product1 $250
Type 2: Creating an additional
record.
In this
Type 2, the old values will not be replaced but a new row containing the new
values will be added to the product table. So at any point of time, the
difference between the old values and new values can be retrieved and easily be
compared. This would be very useful for reporting purposes.
Product
Product
ID(PK) Year Product Name Product Price
1 2004
Product1 $150
1 2005
Product1 $250
The
problem with the above mentioned data structure is "Product ID"
cannot store duplicate values of "Product1" since "Product
ID" is the primary key. Also, the current data structure doesn't clearly
specify the effective date and expiry date of Product1 like when the change to
its price happened. So, it would be better to change the current data structure
to overcome the above primary key violation.
Product
Product
ID(PK) Effective
DateTime(PK)
Year Product Name Product Price Expiry
DateTime
1
01-01-2004 12.00AM 2004 Product1 $150 12-31-2004 11.59PM
1
01-01-2005 12.00AM 2005 Product1 $250
In the
changed Product table's Data structure, "Product ID" and "Effective
DateTime" are composite primary keys. So there would be no violation of
primary key constraint. Addition of new columns, "Effective DateTime"
and "Expiry DateTime" provides the information about the product's
effective date and expiry date which adds more clarity and enhances the scope
of this table. Type2 approach may need additional space in the data base, since
for every changed record, an additional row has to be stored. Since dimensions
are not that big in the real world, additional space is negligible.
Type 3: Creating new fields.
In this
Type 3, the latest update to the changed values can be seen. Example mentioned
below illustrates how to add new columns and keep track of the changes. From
that, we are able to see the current price and the previous price of the
product, Product1.
Product
Product
ID(PK) Current
Year
Product
Name
Current
Product
Price Old Product
Price Old
Year
1 2005
Product1 $250 $150 2004
The
problem with the Type 3 approach, is over years, if the product price continuously
changes, then the complete history may not be stored, only the latest change
will be stored. For example, in year 2006, if the product1's price changes to
$350, then we would not be able to see the complete history of 2004 prices,
since the old values would have been updated with 2005 product information.
Product
Product
ID(PK) Year Product
Name
Product
Price Old
Product
Price Old
Year
1 2006
Product1 $350 $250 2005
Example: In order to store data, over the years, many
application designers in each branch have made their individual decisions as to
how an application and database should be built. So source systems will be
different in naming conventions, variable measurements, encoding structures,
and physical attributes of data. Consider a bank that has got several branches
in several countries, has millions of customers and the lines of business of
the enterprise are savings, and loans. The following example explains how the
data is integrated from source systems to target systems.
Example
of Source Data
System
Name Attribute Name Column Name Datatype Values
Source
System 1 Customer Application Date CUSTOMER_APPLICATION_DATE NUMERIC(8,0)
11012005
Source
System 2 Customer Application Date CUST_APPLICATION_DATE DATE 11012005
Source
System 3 Application Date APPLICATION_DATE DATE 01NOV2005
In the
aforementioned example, attribute name, column name, datatype and values are
entirely different from one source system to another. This inconsistency in
data can be avoided by integrating the data into a data warehouse with good
standards.
Example
of Target Data(Data Warehouse)
Target
System Attribute Name Column Name Datatype Values
Record #1
Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #2
Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #3
Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
In the
above example of target data, attribute names, column names, and datatypes are
consistent throughout the target system. This is how data from various source
systems is integrated and accurately stored into the data warehouse.
Comments
Post a Comment