Topic 5: Metadata Management
Practice

Objectives

 

In this practice, the process of developing and managing metadata is explored. The concept of metadata and the role of the different types are defined.

 

The nature of metadata is primarily two-fold:

 

Metadata Contents

 

Business

 

Primarily used by business users, business metadata describes the relationships between data elements and transformations of the data fields. These elements include:

 

Technical

 

For the technical users, technical metadata contains the complete description of all of the data required to implement and distribute the use of the warehouse. Technical metadata describes the characteristics of the data from all technical standpoints. Each software tool in the warehousing environment requires its own metadata. These elements include:

 

Metadata Sources

 

Where does metadata come from? Typically, much of the information you need to get started on the design and development of metadata for your warehouse already exists within your enterprise legacy system. Metadata may reside in:

 

Metadata Ownership and Stewardship

 

Data steward responsibilities are:

 

Instructions

 

In this practice, you consider your own warehouse environment, and propose a high level strategy for metadata management.

 

Part I - Metadata Specification

 

1. Specify the elements and the sources that are to be included in the business (user) metadata.

2. Specify the elements for the technical metadata, indicating the ‘use by’ function.

 

Part II - Metadata Management

 

1. Specify the software tools that you may use to develop and maintain the following functions in your warehouse.

 

2. Propose a stewardship program for metadata considering:

a) What department or organization would own the responsibility?

b) What roles are appropriate in your environment?Discuss ways to maintain the business (user) metadata considering software
repositories and elements.

c) Who could fill what role?

d) What are the barriers or challenges to implementing this strategy?

 

 

Topic 5 Solutions

 

Part I - Metadata Specification

 

1. Specify the software tools that you may use to develop and maintain the following functions in your warehouse.

 

(Hint: Consider carefully what a user might want to know about data, such as names, content, meaning, and derivations.)

 

An example list would include:

 

2. Specify the elements for the technical metadata, indicating the use by function.

 

(Hint: Remember technical metadata is used for ETT, management, and monitoring.)

 

An example list would include:

 

 

Part II - Metadata Management

1. Specify the software tools that you may use to develop and maintain your warehouse.

 

Note: Oracle has partnerships with many third party vendors who supply products that support all areas of data warehousing. Particularly relevant to components such as data mining and ETT. Some of these companies are Sagent, Informatica, PowerBuilder, and Prism.

 

 

2. Propose a stewardship program for metadata considering:

a) What department or organization would own the responsibility?

b) What roles are appropriate in your environment?Discuss ways to maintain the business (user) metadata considering software
repositories and elements.

c) Who could fill what role?

d) What are the barriers or challenges to implementing this strategy?

 

As you consider your response to these questions, think about your organization and what group is managing the metadata at the current time. Typically it is the database administrator who looks after the metadata for the entire environment, or a group of business analysts, or even a group of application developers.

In order to make the right decision and propose a program that is going to work for your organization you must consider:

The current organization responsibilities and how additional responsibilities are going to affect the organization and management.

The current internal communication lines and allegiances that are going to be affected.

Providing the right access to those who need the metadata, when they need it and in a format they need.

The implications of additional responsibilities for existing headcount or the possibility of hiring new personnel.

How you are going to justify the costs of this to the sponsors of the warehouse.

 

The Role of Business Processes in Data Warehouse Design

 

Practice Objective:

 

The goal of this lesson is to show how specific business needs can influence the data warehouse design. You also learn how to translate business issues into subject areas.

 

INSTRUCTIONS

 

In this practice you are given several business scenarios to examine. For each you:

·         Evaluate different business activities.

·         Recommend the database type to support the activity.

 

1.       Specify the type of application (transaction processing or decision support), needed to
support the following activities:

 

a.   Telecommunications

–        Telephone call detail recording

–        Customer Service such as ordering service and disconnecting lines

–        Studies of minutes of call use by customer group

–        Segmentation of customers by minutes of call use

–        Product bundling analysis

–        Customer payment processing

 

b.   Retail Banking

–        Depositing money

–        Recording checks cleared

–        Longitudinal analysis of average daily balance

–        Telephone banking recording

–        Cost of providing walk-up service

–        Analyzing customer balances

–        Valuing credit portfolio

 

c.   Airlines

–        Reserving a flight

–        Assigning seats

–        Planning meals

–        Evaluating meals consumed

–        Providing customer lists per flight

–        Purchasing new routes

–        Identifying cost effective routes

–        Householding flight preferences

 

2.   Identify the types of data needed to support the following activities:

 

a.   Telecommunications

–        Determining which products should be provided in which area based on expected customer demand

 

b.   Retail Banking

–        Evaluating the long term customer and household value

 

c.   Airlines

–        Optimizing capacity and pricing based on customer demand        

 

 

Topic 3 Solutions

 

1.       Specify the type of application (transaction processing or decision support), needed to
support the following activities:

 

a.   Telecommunications

–        Telephone call detail recording

–        Transaction Processing

–        Customer Service such as ordering service and disconnecting lines

–        Transaction Processing

–        Studies of minutes of call use by customer group

–        Decision Support

–        Segmentation of customers by minutes of call use

–        Decision Support

–        Product bundling analysis

–        Decision Support

–        Customer payment processing

–        Transaction Processing

 

b.   Retail Banking

–        Depositing money

–        Transaction Processing

–        Recording checks cleared

–        Transaction Processing

–        Longitudinal analysis of average daily balance

–        Decision Support

–        Telephone banking recording

–        Transaction Processing

–        Cost of providing walk-up service

–        Decision Support

–        Analyzing customer balances

–        Decision Support

–        Valuing credit portfolio

–        Decision Support

 

c.   Airlines

–        Reserving a flight

–        Transaction Processing

–        Assigning seats

–        Transaction Processing

–        Planning meals

–        Decision Support and Transaction Processing

–        Evaluating meals consumed

–        Decision Support

–        Providing customer lists per flight

–        Transaction Processing

–        Purchasing new routes

–        Decision Support

–        Identifying cost effective routes

–        Decision Support

–        Householding flight preferences

–        Decision Support

 

 

2.   Identify the types of data needed to support the following activities:

 

a.   Telecommunications

–        Determining which products should be provided in which area based on expected customer demand

 

·         Products

·         Customers

·         Dialing Codes, Country Codes, Area Code

·         Time

·         Purchases

·         Length of time using a product

·         Time between establishment of line and purchasing products

 

 

b.   Retail Banking

–        Evaluating the long term customer and household value

 

·         Customer

·         Customer related to the Household

·         External householding profiles

·         Bank accounts

·         Bank balances

·         Promotions and purchases

·         Length of time with bank

·         Average balance over time

·         Number of transactions

·         Products and Product types

 

c.   Airlines

–        Optimizing capacity and pricing based on customer demand        

 

·         Routes

·         Aircraft

·         Number of seats per aircraft type

·         Customers

·         Flight cost

·         Time

 

Topic 6 : Identifying the Data
Practice

This practice draws on the following case study.

 

GLOBAL COMPUTING COMPANY

 

Background

 

 

The development and design of a data warehouse is inextricably linked to the business needs of an enterprise. Global Computing Company (GCC) is a fictitious company that has a need for an analytical data warehouse.

 

Global Computing Company has been in business since 1990. Like every hardware and software distributor, the company is searching for new business in an already saturated market.

 

Distribution channels have been traditionally through direct sales and catalog orders. Recently, GCC started offering products through the Internet.

 

You explore GCC’s business needs more fully as you progress through the design process.

 

Throughout the remaining practices in this course, you refer to GCC’s issues and requirements to help you make critical design decisions.

 

Business Situation

 

It is January 2001 and the situation at Global Computing requires some changes. Sales have been lower than they have been in the past two years. The flagship product, the Sentinel Desktop, is dated and is not selling well.

 

The vice president of sales and marketing cannot understand what is driving the business. Sales show unexpected spikes. Lots of sales are being generated through the Internet, but it is unclear whether this channel is introducing new clients to GCC or whether it is acquiring other channels.

 

The president is concerned because the aggregate margins are shrinking and therefore profits are declining.

 

Technology is changing and domestic suppliers and manufacturers are more difficult to find. Components are available but assembly drives the cost of the end product thereby adding to the shrinking margins.

 

Business Objectives

 

In strategic discussions, managers articulate the objectives and goals for the company, for the supply side–manufacturing and operations–and for the demand side–sales and marketing. GCC has chosen to focus on the demand side for the initial warehouse implementation.

 

Information Requirements

 

Through interviews and the requirements-building process, you have identified the following information needs:

 

These provide a guideline to your design both in the data that should be included and the arrangement of that data to support this analysis.

 

Global Computing Company Value Proposition

 

After much discussion and evaluation, the management of Global Computing Company identified its greatest business opportunities that could be addressed through a data warehouse.

The proof-of-concept deliverable for GCC addresses the following issues:

 

Objective

 

In this practice you learn how to identify source data for your data warehouse. While discovering the source databases, you adjust your warehouse to schema to accommodate the limitations and constraints of the data that is available.

 

In this practice, you:

 

Next you will design the physical model for the data warehouse for Global Computing Company.

 

Review

 

Creating the logical database design involves:

 

Note: Although indexes are not strictly part of logical design, in reality they are usually identified at this stage.

 

Begin your design by identifying the subject areas of your business to be implemented in the first (current) increment. Consider only the business view of the information requirements. Your perspective needs to be independent of any implementation constraints or concerns at this stage.

Subject areas are building blocks that are uniquely identifiable and usable across the business. These subjects need to be founded on long-term, stable business concepts.

 

Note: The subject areas become the dimensions of the star model.

 

The logical design may consider implementation consequences. Changes need to be made to the logical model for technical reasons, such as:

 

GCC Subject Areas

 

From the Global Computing Company business case, the following subject areas have been identified for the conceptual model:

 

Eliminating the subject areas that do not pertain to the first increment of the warehouse, results in the following entities:

 

Although it may be tempting to create the entities for the warehouse from existing systems, you need to define the entities from the subject areas identified by your specific analysis of business requirements. These entities become the dimensions of your warehouse model.

 

The next step in creating your conceptual model is to identify the relationship within each entity and between entities. The relationships are important in this model, as your starting point.

 

Identifying the Attributes

 

Now that you have identified the entities and relationships between them from the outlined business requirements you can now determine in your what sort of conceptual model, information about each entity is of interest. The attributes are the descriptors in the entity. Attributes may also become primary and foreign key values.

 

The data warehouse may contain:

 

Key values are determined in your next step.

 

Note: The attributes of entities ultimately become the table columns in the logical model.

 

Identifying the Critical Business Measures

 

The critical measures of the business are the facts by which the business is evaluated. Consider different ways the business might need to be viewed as you define these measures. For example consider:

 

The key performance measures represent the direct results of a business event and not an analysis about what the event means.

 

Note: These measures typically become the attributes of the fact table.

 

Defining the Keys

 

Your entity relationship model now contains the:

 

Map Source System to Subject Area

 

As you consider which system can provide the warehouse with data, conduct a life cycle mapping of sources to the subject areas. For example, looking at the Order Entry System for Global Computing Company, you see that the Customer subject area is created, updated, and read through this system.

 

You should assess each source system for its viability to act as an accurate, continuing source of data by evaluating the source system’s:

 

These measures indicate the long term stability of the system as a data provider for the warehouse.

 

 

Instructions

 

Use the schema provided in Appendix A to:

 

1. Identify the subject areas for the GCC data warehouse.

 

(Hint: Consider all the information needs of GCC in your entity relationship diagram.)

 

2. Select the subjects and relationships that satisfy the needs of the value proposition.

 

3. Draw a conceptual model of the GCC enterprise warehouse.

 

4. Outline the fact table and all dimensions of the logical model.

 

Note: At this time you are not concerned with identifying the attributes on this model.

Note: Make sure that you have included a Time dimension in your model.

 

 

5. Identify the attributes for the fact table and each dimension table in your data
warehouse model.

 

6. Identify the source data available in the OLTP model.

 

 

 

Topic 6 Solutions

 

1. Identify the subject areas for the GCC data warehouse.

 

(Hint: Consider all the information needs of GCC in your entity relationshipdiagram.)

 

2. Select the subjects and relationships that satisfy the needs of the value proposition,

 

 

3. Draw a conceptual model of the GCC enterprise warehouse.

 

One conceptual model of the GCC warehouse application is as follows:

 

 

Note: You may produce a slightly different version of this model, which is not a problem as long as the main subject areas are identified.

 

 

4. Outline the fact table and all dimensions of the logical model.

 

Note: At this time you are not concerned with identifying the attributes on this model. Make sure that you have included a Time dimension in your model.

 

 


 

5. Identify the attributes for the fact table and each dimension table in your data warehouse model.

 

Note: Bold indicates a key column.

 


 

7. Identify the source data available in the OLTP model.

 

 

 

 

 

1