Category Archives: SSAS

SSAS Terminology

SSAS (SQL Server Analysis Services) is a Microsoft BI (Business Intelligence) technology used for online analytical processing (OLAP) and data mining. Here are some important SSAS terminology:

  1. Cube: A cube is a multidimensional structure that contains measures and dimensions. It organizes data into a structure that is optimized for querying and analysis.
  2. Dimension: Dimensions are the categories by which data is analyzed. They provide the context for measures in a cube. Examples of dimensions include Time, Geography, Product, etc.
  3. Measure: Measures are the numeric values that are being analyzed. Examples of measures include Sales Amount, Quantity Sold, Profit Margin, etc.
  4. Hierarchy: Hierarchies are logical structures that organize dimension attributes into levels of detail. For example, a Time dimension might have a hierarchy with levels such as Year, Quarter, Month, and Day.
  5. Attribute: Attributes are the individual members within a dimension hierarchy. For example, within a Time dimension, attributes might include Year, Quarter, Month, and Day.
  6. Fact Table: A fact table is a central table in a star schema or snowflake schema that contains the quantitative data for analysis. Measures in a cube are derived from the columns in the fact table.
  7. MDX (Multidimensional Expressions): MDX is the query language used to query multidimensional data stored in SSAS cubes. It is similar in syntax to SQL but is specifically designed for querying OLAP data.
  8. Aggregation: Aggregations are pre-calculated summary values stored within the cube to improve query performance. Aggregations are created based on the measures and dimensions defined in the cube.
  9. Partition: A partition is a physical storage container within a cube that holds a subset of data from the underlying data source. Partitions allow for parallel processing and efficient querying.
  10. Data Source View (DSV): A data source view defines the schema and relationships between the tables and views used as the data source for an SSAS project.
  11. KPI (Key Performance Indicator): KPIs are business metrics used to evaluate the performance of an organization. SSAS allows for the definition and analysis of KPIs within cubes.
  12. Mining Model: In SSAS, data mining models are used to identify patterns and relationships in data. These models can be created using algorithms such as Decision Trees, Clustering, Neural Networks, etc.

Types of Dimensions in Data warehouse

What is Dimension?

Dimension table contains the data about the business. The primary keys of the dimension tables are used in Fact tables with Foreign key relationship. And the remaining columns in the dimension is normal data which is the information about the Objects related to the business.Eg: Product,Customer,Orders,Company,Date etc.

  1. Slowly Changing Dimensions (SCD) : 

Slowly changing dimensions refer to how data in your data warehouse changes over time. Slowly changing dimensions have the same natural key but other data columns that may or may not change over time depending on the type of dimensions that it is. 

Slowly changing dimensions are important in data analytics to track how a record is changing over time. The way the database is designed directly reflects whether historical attributes can be tracked or not, determining different metrics available for the business to use. 

For example, if data is constantly being overwritten for one natural key, the business will never be able to see how changes in that row’s attributes affect key performance indicators. If a company continually iterates on a product and its different features, but doesn’t track how those features have changed, it will have no idea how customer retention, revenue, customer acquisition cost, or other marketing analytics were directly impacted by those changes. 

Different Types of SCD:

SCD TypeSummary
Type 0Ignore any changes and audit the changes.
Type 1Overwrite the changes
Type 2History will be added as a new row.
Type 3History will be added as a new column.
Type 4A new dimension will be added

I will post detailed article on SCD types separately.

2. Rapidly Changing Dimensions: 

A dimension is a fast changing or rapidly changing dimension if one or more of its attributes in the table changes very fast and in many rows. Handling rapidly changing dimension in data warehouse is very difficult because of many performance implications.

Example: Patient table with Age, BMI and Weight

3. Conformed Dimension

A conformed dimension is the dimension that is shared across multiple data mart or subject area. Company may use the same dimension table across different projects without making any changes to the dimension tables.

Conformed dimension example would be Customer dimension, i.e. both marketing and sales department can use Customer dimension for their reporting purpose.

4. Junk Dimension

A junk dimension is a grouping of typically low cardinality attributes, so you can remove them from main dimension.

For example, attributes such as flags, weights, BMI (body mass index) etc

5. Degenerated Dimension

A degenerated dimension is a dimension that is derived from fact table and does not have its own dimension table.

For example, receipt number does not have dimension table associated with it. Such details are just for information purpose.

5. Role Playing Dimension

Dimensions which are often used for multiple purposes within the same database are called role-playing dimensions.

For example, you can use a date dimension for “date of sale”, as well as “date of delivery”, or “date of hire”.

Star Schema vs Snowflake Schema: differences

What is a star schema?

A star schema is a database schema used to store data in a star format. This schema consists of a central table, called the “fact table,” and a number of directly connected other tables, called “dimension tables.”

The fact table contains information about metrics or measures, while the dimension tables contain information about descriptive attributes. The star schema is very simple and easy to understand, making it ideal for cloud data warehousing and business intelligence applications.

What is a snowflake schema?

A snowflake schema is a type of database schema that is used to store data in a more complex format than the star schema. The snowflake schema consists of a central table, which is called the “fact table,” and a number of other tables, which are called “dimension tables.” As with other schemas, the fact table contains information about events or facts, while the dimension tables contain information about the dimensions of those events or facts. 

Key differences between star schema and snowflake schema:

  1. A star schema has denormalized dimension tables, while a snowflake schema has normalized dimension tables.
  2. A star schema is easier to design and implement than a snowflake schema.
  3. A star schema can be more efficient to query than a snowflake schema, because there are fewer JOINs between tables.
  4. A star schema can require more storage space than a snowflake schema, because of the denormalized data.


How to choose between star and snowflake schemas?

Deciding which schema is best for your data warehouse design is contingent on various factors, such as your data characteristics, query needs, storage constraints, and performance expectations. Generally speaking, a star schema is recommended if you have a few dimensions with low cardinality and limited levels of hierarchy. Additionally, if you need fast and simple queries with aggregated data and have ample storage space to tolerate some data redundancy, then this schema is ideal. On the other hand, a snowflake schema is best suited when you have many dimensions with high cardinality and multiple levels of hierarchy. This type of schema is also helpful when you need complex and detailed queries with granular data and want to avoid data inconsistency. Moreover, a hybrid approach that combines both schemas can be used depending on the specific data warehouse scenarios and trade-offs. For instance, a star schema can be used for simpler and more stable dimensions while a snowflake schema can be used for more complex and dynamic ones. Finally, you can also use a constellation schema that has multiple fact tables sharing some common dimension tables to support different business processes or analytical needs.

SSAS STORAGE MODES

In SQL Server Analysis Services 2008, we have three storage mode options available to us: Relational Online Analytical Processing (ROLAP), Multidimensional Online Analytical Processing (MOLAP) and Hybrid Online Analytical Processing (HOLAP). There are advantages and disadvantages to each, so I figured I’d take a few minutes to give a quick overview describing the storage modes and laying out some of the pros and cons of each.

Relational Online Analytical Processing (ROLAP)

The ROLAP storage mode allows the detail data and aggregations to be stored in the relational database. If you plan on using ROLAP, you need to make sure that your database is carefully designed or you’ll run into some bad performance issues.

Pros:

  • Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time.
  • Also, since the data is kept in the relational database, it allows for much larger amounts of data, which can mean better scalability.
  • Low latency.

Cons:

  • With all the data being stored in the relational database, query performance is going to be much slower than MOLAP.
  • You must maintain a permanent connection with the relational database to use ROLAP.

Multidimensional Online Analytical Processing (MOLAP)

MOLAP is the default and thus most frequently used storage mode. With MOLAP storage, the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database.

Pros:

  • Since the data is stored on the OLAP server in optimized format, queries (even complex calculations) are faster than ROLAP.
  • The data is compressed so it takes up less space.
  • And because the data is stored on the OLAP server, you don’t need to keep the connection to the relational database.
  • Cube browsing is fastest using MOLAP.

Cons:

  • Because you don’t have a real time connection to the relational database, you need to frequently process the cube to update your data.
  • If there’s a large amount of data, processing is going to take longer.
  • There’s also an additional amount of storage since a copy of the relational database is kept on the OLAP server.
  • High latency.

Hybrid Online Analytical Processing (HOLAP)

HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occur. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP.

Pros:

  • HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements.
  • Cubes are smaller than MOLAP since the detail data is kept in the relational database.
  • Processing time is less than MOLAP since only aggregations are stored in multidimensional format.
  • Low latency since processing takes place when changes occur and detail data is kept in the relational database.

Cons:

  • Query performance can head downhill fast when more detail data is queried from the relational database.