Objectives
In this practice, you define the basic elements of the data warehouse database design, starting with the fundamentals, such as facts and dimensions, and fact data within a star model.
Determining Table Content
When identifying the data for the warehouse, you determine table contents what goes into them based on the user requirements. You also need to consider the context of the data that is to be used. You must:
The Grain of the Fact Table
The granularity, or grain, is the level of detail or summarization in the warehouse. A low level of granularity means greater detail; a high level of granularity mean less detail.
Maintaining a low level of grain in the warehouse is expensive and requires more disk space and more processing in access operations. Since the data may even exist at the transaction level, there is less demand during the ETT process than a higher level of granularity.
A high level of granularity requires less space and less resources for access but may prevent users from accessing the level of detail that they require to answer their business questions.
You must consider the following factors as you determine the grain of the warehouse:
The Time Dimension
Time is a critical element in the data warehouse. The question is, where should the time be stored? To answer the question you must consider the following:
A row whose key is an SQL date, may be populated. with additional time qualifiers needed to perform business analysis, such as: workday, fiscal period, and special events.
Time key on the fact table (usually in SQL DATE format) could be enough if queries limit themselves to day, month, or year-type analyses.
Building the Star Model
The star model can reduce table joins at the cost of heavy denormalization. There are some considerations regarding this denormalized model. Denormalization causes data duplication. Applying changes to duplicated data requires additional work.
When denormalizing data, you must consider the cost of additional storage required.
Note: The normalized model is most suitable for operational applications.
The following elements need to be defined to develop the star model and are the answers to the questions listed:
Star Model Advantages
The model allows complex queries to be expressed by end users, because the data is arranged in a way that is easy to understand and the relationships between entities are very clear.
The model restricts the numerical measurements of the business to the fact table, and only to the fact table.
Star Model Disadvantages
Snowflake Model
The snowflake model is a variation of the star model, where a flat dimension table is decomposed or normalized into a tree structure, with potentially many nesting levels or hierarchies. The secondary dimension table is often referred to as an outrigger table.
Snowflake Model Advantages
Snowflake Model Disadvantages
Constellation Configuration
The constellation configuration consists of a central star surrounded by other stars. The central star comprises base level or atomic data. The surrounding stars are typically summary data. The surrounding stars can share dimension attributes with the atomic star.
Constellation Configuration Advantages
Constellation Configuration Disadvantages
Requires the scheduling of updates and maintenance of the summary stars to remain consistent with the central star.
Instructions
In this practice you:
1. Using the metadata for the source system, which is available in Appendix A, develop a fully attributed physical model from the logical model.
Note: This includes adding all keys columns to both the dimensions and fact table.
2. Identify what indexes are needed on which columns of the model.
Topic 8 Solutions
1. Using the metadata for the source system, which is available in Appendix A, develop a fully attributed physical model from the logical model.
2. Identify what indexes are needed on which columns of the model.
The indexes you create
are:
Table |
Index |
Reason |
Products |
Unique on item_id |
Primary key |
Channels |
Unique on channel_id |
Primary key |
Customers |
Concatenated unique on ship_to_id and customer_id |
Primary keys |
Time |
Unique on time_id |
Primary key |
Sales |
Concatenated unique on channel_id, time_id, customer_id, product_id, ship_to_id |
Primary keys |
You may decide to create other indexes on other non-key columns, depending upon the users query criteria requirements.
Objectives
The star model is an ideal starting point to familiarize you with the database design concepts of the data warehouse. However, most subject area data is not effectively stored or accessed through the star. Alternative models must be employed to deliver the appropriate performance levels, data requirements, and data management techniques such as maintaining history.
You discover as you evaluate the data requirements for your warehouse, that data relationships are not limited to facts and dimension table relationships. There are additional modeling techniques and table definitions to store this data.
In this practice you consider other models for the data warehouse beyond the star model. The simple star is not recommended for all enterprise warehouses; more typically a snowflake or constellation model is implemented. The advantages of other models are discussed along with peculiar instances of fact and dimension tables.
Changing Dimension Attributes
A data warehouse stores a historical perspective of a business. In designing the warehouse the history that should be stored in the warehouse should be decided. The business users drive the definition of time in the warehouse, the availability and reliability of data determines actual implementation.
How to keep track of data that changes over time is an integral part of the database design process. The loading and refreshing processes of the data warehouse must include housekeeping routines to manage changes to already stored dimension attributes. For example, a person may exist as a customer record in the Customer dimension. How do you reflect changes to the customer information?
Several techniques are available to manage changing data:
Note: In some data warehousing references you find this subject referred to as Slowly Changing Dimensions.
One Dimension Compared to Several Dimensions
How do you arrange attributes with many-to-many relationship? Your choices are to put all the attributes in one dimension table, or split the attributes into multiple dimension tables. ultimately the meaning and relevance of the data drives the design, not the size of the tables.
One Dimension
Using a single dimension structure:
Multiple Dimensions
Using multiple dimensions:
Bracketed Dimensions
To enhance performance and analytical capabilities, it may be helpful to create bracketed dimensions for categorical attributes. Creating groups of values for attributes with many unique values, such as income, reduces the cardinality and creates ranges that may be more meaningful to the end-user.
Bracketed dimensions are typically used in support of very complex analytical models because they:
Models for Hierarchical Data
Flat Model with Single Hierarchy
The flat model is also known as the horizontal model.
Flat Model with Multiple Hierarchies
Multiple hierarchies can be stored in a single dimension using the flat model approach. The definition of the relationship between the data in this model must be well maintained and stored in the metadata.
All the levels of the hierarchy have been collapsed into one table.
Recursive Model
The recursive model is also known as a vertical model. In this model, the dimension key contains all possible values of the entire hierarchical structure.
Instructions
In this practice, you:
A. Expanding the Star to a Snowflake.
1. Using the star model created earlier for the Sales and Marketing application, identify which dimensions could be normalized.
2. Evaluate the merits of normalizing each dimension, determining whether or not it
should be changed. Justify your design decision for each dimension.3. Outline the revised physical model, adding dimensions where needed.
B. Developing a Model for Promotion Analysis.
Your company plans to develop a warehouse to support the promotion tracking needs of Global Computing Company.
1. List five questions typically posed when performing promotion analysis.
2. Identify the subjects that are needed to answer these questions.
3. Are there any facts for this model?
4. Draw the proposed logical model.
Topic 9 Solutions
A. Expanding the Star to a Snowflake
1. Using the star model created earlier for the Sales and Marketing application, identify which dimensions could be normalized.
2. Evaluate the merits of normalizing each dimension, determining whether or not it
should be changed. Justify your design decision for each dimension.Although all dimensions are relatively small, the following dimensions can be normalized
- Customer: Account, Segment
- Time: Month
- Product: Family, Class, Package
3. Outline the revised physical model, adding dimensions where needed.
B. Developing a Model for Promotion Analysis
Your company plans to develop a warehouse to support the promotion tracking needs of Global Computing Company.
1. List five questions typically posed when performing promotion analysis.
The following is a sample of questions:
- What products sold on deal?
- Who bought the promoted items?
- What is the highest price that the market will pay?
- Are promotions seasonal?
- Which promotions were not successful?
- Which products are bought only on deal?
- Which products are never bought on deal?
- Where are promotions effective?
2. Identify the subjects that are needed to answer these questions.
- Promoted Price
- Customer
- Product
- Time
- Regular Price
- Location
3. Are there any facts for this model?
No