Topic 8: Data Warehousing Models
Practice

Objectives

 

In this practice, you define the basic elements of the data warehouse database design, starting with the fundamentals, such as facts and dimensions, and fact data within a star model.

 

Determining Table Content

 

When identifying the data for the warehouse, you determine table contents what goes into them based on the user requirements. You also need to consider the context of the data that is to be used. You must:

 

The Grain of the Fact Table

 

The granularity, or grain, is the level of detail or summarization in the warehouse. A low level of granularity means greater detail; a high level of granularity mean less detail.

 

Maintaining a low level of grain in the warehouse is expensive and requires more disk space and more processing in access operations. Since the data may even exist at the transaction level, there is less demand during the ETT process than a higher level of granularity.

 

A high level of granularity requires less space and less resources for access but may prevent users from accessing the level of detail that they require to answer their business questions.

 

You must consider the following factors as you determine the grain of the warehouse:

 

The Time Dimension

 

Time is a critical element in the data warehouse. The question is, where should the time be stored? To answer the question you must consider the following:

 

A row whose key is an SQL date, may be populated. with additional time qualifiers needed to perform business analysis, such as: workday, fiscal period, and special events.

 

Time key on the fact table (usually in SQL DATE format) could be enough if queries limit themselves to day, month, or year-type analyses.

 

Building the Star Model

 

The star model can reduce table joins at the cost of heavy denormalization. There are some considerations regarding this denormalized model. Denormalization causes data duplication. Applying changes to duplicated data requires additional work.

 

When denormalizing data, you must consider the cost of additional storage required.

 

Note: The normalized model is most suitable for operational applications.

 

The following elements need to be defined to develop the star model and are the answers to the questions listed:

 

Star Model Advantages

 

The model allows complex queries to be expressed by end users, because the data is arranged in a way that is easy to understand and the relationships between entities are very clear.

The model restricts the numerical measurements of the business to the fact table, and only to the fact table.

 

Star Model Disadvantages

 

 

Snowflake Model

 

The snowflake model is a variation of the star model, where a flat dimension table is decomposed or normalized into a tree structure, with potentially many nesting levels or hierarchies. The secondary dimension table is often referred to as an outrigger table.

 

Snowflake Model Advantages

 

 

Snowflake Model Disadvantages

 

 

Constellation Configuration

 

The constellation configuration consists of a central star surrounded by other stars. The central star comprises base level or atomic data. The surrounding stars are typically summary data. The surrounding stars can share dimension attributes with the atomic star.

 

Constellation Configuration Advantages

 

Constellation Configuration Disadvantages

 

Requires the scheduling of updates and maintenance of the summary stars to remain consistent with the central star.

 

 

Instructions

 

In this practice you:

 

1. Using the metadata for the source system, which is available in Appendix A, develop a fully attributed physical model from the logical model.

Note: This includes adding all keys columns to both the dimensions and fact table.

 

 

2. Identify what indexes are needed on which columns of the model.

 

 

 

 

Topic 8 Solutions

 

1. Using the metadata for the source system, which is available in Appendix A, develop a fully attributed physical model from the logical model.

 

 


 

2. Identify what indexes are needed on which columns of the model.

 

 

The indexes you create are:

Table

Index

Reason

Products

Unique on item_id

Primary key

Channels

Unique on channel_id

Primary key

Customers

Concatenated unique on ship_to_id and customer_id

Primary keys

Time

Unique on time_id

Primary key

Sales

Concatenated unique on channel_id, time_id, customer_id, product_id, ship_to_id

Primary keys

You may decide to create other indexes on other non-key columns, depending upon the users query criteria requirements.

 

Topic 9 : Further Considerations for Physical Models
Practice

Objectives

 

The star model is an ideal starting point to familiarize you with the database design concepts of the data warehouse. However, most subject area data is not effectively stored or accessed through the star. Alternative models must be employed to deliver the appropriate performance levels, data requirements, and data management techniques such as maintaining history.

 

You discover as you evaluate the data requirements for your warehouse, that data relationships are not limited to facts and dimension table relationships. There are additional modeling techniques and table definitions to store this data.

 

In this practice you consider other models for the data warehouse beyond the star model. The simple star is not recommended for all enterprise warehouses; more typically a snowflake or constellation model is implemented. The advantages of other models are discussed along with peculiar instances of fact and dimension tables.

 

Changing Dimension Attributes

 

A data warehouse stores a historical perspective of a business. In designing the warehouse the history that should be stored in the warehouse should be decided. The business users drive the definition of time in the warehouse, the availability and reliability of data determines actual implementation.

 

How to keep track of data that changes over time is an integral part of the database design process. The loading and refreshing processes of the data warehouse must include housekeeping routines to manage changes to already stored dimension attributes. For example, a person may exist as a customer record in the Customer dimension. How do you reflect changes to the customer information?

 

Several techniques are available to manage changing data:

 

Note: In some data warehousing references you find this subject referred to as Slowly Changing Dimensions.

 

One Dimension Compared to Several Dimensions

 

How do you arrange attributes with many-to-many relationship? Your choices are to put all the attributes in one dimension table, or split the attributes into multiple dimension tables. ultimately the meaning and relevance of the data drives the design, not the size of the tables.

 

One Dimension

 

Using a single dimension structure:

 

Multiple Dimensions

 

Using multiple dimensions:

 

Bracketed Dimensions

 

To enhance performance and analytical capabilities, it may be helpful to create bracketed dimensions for categorical attributes. Creating groups of values for attributes with many unique values, such as income, reduces the cardinality and creates ranges that may be more meaningful to the end-user.

 

Bracketed dimensions are typically used in support of very complex analytical models because they:

 

 

Models for Hierarchical Data

 

Flat Model with Single Hierarchy

 

The flat model is also known as the horizontal model.

 

Flat Model with Multiple Hierarchies

 

Multiple hierarchies can be stored in a single dimension using the flat model approach. The definition of the relationship between the data in this model must be well maintained and stored in the metadata.

All the levels of the hierarchy have been collapsed into one table.

 

Recursive Model

 

The recursive model is also known as a vertical model. In this model, the dimension key contains all possible values of the entire hierarchical structure.

 

 

 

Instructions

 

In this practice, you:

 

A. Expanding the Star to a Snowflake.

 

1. Using the star model created earlier for the Sales and Marketing application, identify which dimensions could be normalized.

2. Evaluate the merits of normalizing each dimension, determining whether or not it
should be changed. Justify your design decision for each dimension.

3. Outline the revised physical model, adding dimensions where needed.

 

 

B. Developing a Model for Promotion Analysis.

 

Your company plans to develop a warehouse to support the promotion tracking needs of Global Computing Company.

1. List five questions typically posed when performing promotion analysis.

2. Identify the subjects that are needed to answer these questions.

3. Are there any facts for this model?

4. Draw the proposed logical model.

 

 

 

 

Topic 9 Solutions

 

A. Expanding the Star to a Snowflake

 

1. Using the star model created earlier for the Sales and Marketing application, identify which dimensions could be normalized.

 

 

 

2. Evaluate the merits of normalizing each dimension, determining whether or not it
should be changed. Justify your design decision for each dimension.

Although all dimensions are relatively small, the following dimensions can be normalized

3. Outline the revised physical model, adding dimensions where needed.


B. Developing a Model for Promotion Analysis

 

Your company plans to develop a warehouse to support the promotion tracking needs of Global Computing Company.

1. List five questions typically posed when performing promotion analysis.

The following is a sample of questions:

2. Identify the subjects that are needed to answer these questions.

3. Are there any facts for this model?

No

1