Slowly Changing Dimensions (SCD) Types

Slowly Changing Dimensions in Data Warehouse is an important concept that is used to enable the historic aspect of data in an analytical system. As you know, the data warehouse is used to analyze historical data, it is essential to store the different states of data.

In data warehousing, we have fact and dimension tables to store the data. Dimensional tables are used to analyze the measures in the fact tables. In a data environment, data is initiated at operational databases and data will be extracted-transformed-loaded (ETL) to the data warehouse to suit the analytical environment.

Customer, Product are examples for Dimensional tables. These dimension attributes are modified over time and in the data warehouse, we need to maintain the history. In operational systems, we may overwrite the modified attributes as we may not need the historical aspects of data. Since our primary target in data warehousing is to analyze data with the perspective of history, we may not be able to simply overwrite the data and we need to implement special techniques to maintain the history considering analytical and volume aspects of the data warehouse. This implementation is done using Slowly Changing Dimensions in Data Warehouse.

What are the types of SCD?

Very simply, there are 6 types of Slowly Changing Dimension that are commonly used, they are as follows:

SCD TypeSummary
Type 0Fixed Dimension
Type 1No History
Update record directly, there is no record of historical values, only current state is maintained.
Type 2Row Versioning
Track changes as version records with current flag & active dates and other metadata
Type 3Previous Value column
Track change to a specific attribute, add a column to show the previous value, which is updated as further changes occur
Type 4A new dimension will be added.
Show current value in dimension table but track all changes in separate table
Type 6Hybrid SCD
Utilize techniques from SCD Types 1, 2 and 3 to track change
  • In reality, only types 0, 1 and 2 are widely used, with the others reserved for very specific requirements.

Examples

We have a very simple ‘customer’ dimension, with just 2 attributes – Customer Name and Country:

simple customer dimension

However, Bob has just informed us that he has now moved to the US and we want to update our dimension record to reflect this. We can see how the different SCD types will handle this change and the pro/cons of each method.

Type 0

Our table remains the same. This means our existing reports will continue to show the same figures, maybe it is a business requirement that each customer is always allocated to the country they signed up from.

All future transactions associated to Bob will also be allocated to the ‘United Kingdom’ country.

Type 1

The table is updated to reflect Bob’s new country:

customer dimension table

All fact records associated with Bob will now be associated with the ‘United States’ country, regardless of when they occurred.

We often just want to see the current value of a dimension attribute – it could be that the only dimension changes that occur are corrections to mistakes, maybe there is no requirement for historical reporting.

Type 2

In order to support type 2 changes, we need to add four columns to our table:

· Surrogate Key – the original ID will no longer be sufficient to identify the specific record we require, we therefore need to create a new ID that the fact records can join to specifically.

· Current Flag – A quick method of returning only the current version of each record

· Start Date – The date from which the specific historical version is active

· End Date – The date to which the specific historical version record is active

With these elements in place, our table will now look like:

customer dimension table with surrogate key

This method is very powerful – you maintain the history for the entire record and can easily perform change-over-time analysis. However, it also comes with more maintenance overhead, increased storage requirement and potential performance impacts if used on very large dimensions.

Type 2 is the most common method of tracking change in data warehouses.

Type 3

Here, we add a new column called “Previous Country” to track what the last value for our attribute was.

add new columns to customer dimension

Note how this will only provide a single historical value for Country. If the customer changes his name, we will not be able to track it without adding a new column. Likewise, if Bob moved country again, we would either need to add further “Previous Previous Country” columns or lose the fact that he once lived in the United Kingdom.

Type 4

There is no change to our existing table here, we simply update the record as if a Type 1 change had occurred. However, we simultaneously maintain a history table to keep track of these changes:

Our Dimension table reads:

tracking dimension table

Whilst our Type 4 historical table is created as:

Historical data in customer dimension table

Depending on your requirements, you may place both ID and Surrogate Key onto the fact record so that you can optimise performance whilst maintaining functionality.

Separating the historical data makes your dimensions smaller and therefore reduces complexity and improves performance if the majority of uses only need the current value.

However, if you do require historical values, this structure adds complexity and data redundancy overheads. It is generally assumed that the system will use Type 1 or Type 2 rather than Type 4.

Type 6

The ‘Hybrid’ method simply takes SCD types 1, 2 and 3 and applies all techniques. We would maintain a history of all changes whilst simultaneously updating a “current value” column on all records.

maintaining history of changes

This gives you the ability to provide an element of change comparison without additional calculation, whilst still maintaining a full, detailed history of all changes in the system.

Conclusion

Slowly Changing Dimensions in Data Warehouse are used to perform different analyses. This article provides details of how to implement Different types of Slowly Changing Dimensions such as Type 0, Type 1, Type 2, Type 3, Type 4 and Type 6. Type 2 and Type 6 are the most commonly used dimension in a data warehouse.

Leave a comment