Dimensional Structures: A Look Into the Outliers that Often Get Overlooked When Designing Data Structures

 

Think Star Schema is the only way to go with your EDW? Think again…StarFlake is a Best in Breed Approach that Might Make BI on top of the EDW that much more capable and robust.  Read on… I will cover the following concepts:

 

  • Alternative dimensional structures: snowflake schemas and starflake schemas
  • Slowly changing dimensions
  • Mini-dimensions
  • Heterogeneous star schemas (dimensional subtypes)
  • Dealing with non–hierarchically structured data in the underlying ER model:
    – Many-to-many relationships
    – Recursive relationships
    – Subtypes and supertypes

    Figure 1: Snowflake (the anti-lord of all things good in the data world)

    Kimball (1996) argues that “snowflaking” is undesirable because it adds unnecessary complexity, reduces query performance, and doesn’t substantially reduce storage space.

    However, performance impact of snowflaking depends on the DBMS / OLAP structures in place.  Advantages of the snowflake : explicitly shows the hierarchical structure of each dimension, which can help in understanding how the data can be analyzed.

    Thus, whether a snowflake or a star schema is used at the physical level, views should be used to enable the user to see the structure of each dimension as required.

    Being a star schema girl myself, I am quite interested in the power of the starflake model and thus being a starflake girl <she is reminded of the Tori Amos song ‘Cornflake Girl’ with this reference>.

     

    Figure 2: Starflake model

    A starflake schema is a star schema in which shared hierarchical segments are separated out into sub-dimension tables. These represent “highest common factors” between dimensions. A starflake schema is formed by collapsing classification entities from the top of each hierarchy until they reach either a branch entity or a component entity. If a branch entity is reached, a subdimension table is formed. Collapsing then begins again after the branch entity. When a component entity is reached, a dimension table is formed.

     

     

    Dimensional Design Trade-Offs

    The alternative dimensional structures considered here represent different trade-offs between complexity and redundancy:

    • The star schema is the simplest structure, as contains the least number of tables—eight tables in the example. However, it also has the highest level of data redundancy, as the dimension tables all violate third normal form (3NF). This maximizes understanding and simplifies queries to pairwise joins between tables.
    • The snowflake schema has the most complex structure and consists of more than five times many tables as the star schema representation (41 in the example). This will require multitable joins to satisfy queries.
    • The starflake schema has a slightly more complex structure than the star schema—nine tables in the example. However, while it has redundancy within each table (the dimension tables and sub-dimension tables all violate 3NF), redundancy between dimensions is eliminated, thus reducing the possibility of inconsistency between them. All of these structures are semantically equivalent: they all contain the same data and support the same set of queries. As a result, views may be used to construct any of these structures from any other.
  •  

    Advertisements

    3 thoughts on “Dimensional Structures: A Look Into the Outliers that Often Get Overlooked When Designing Data Structures

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s