Objectives
In this practice, the process of developing and managing metadata is explored. The concept of metadata and the role of the different types are defined.
The nature of metadata is primarily two-fold:
Metadata Contents
Business
Primarily used by business users, business metadata describes the relationships between data elements and transformations of the data fields. These elements include:
Technical
For the technical users, technical metadata contains the complete description of all of the data required to implement and distribute the use of the warehouse. Technical metadata describes the characteristics of the data from all technical standpoints. Each software tool in the warehousing environment requires its own metadata. These elements include:
Metadata Sources
Where does metadata come from? Typically, much of the information you need to get started on the design and development of metadata for your warehouse already exists within your enterprise legacy system. Metadata may reside in:
Metadata Ownership and Stewardship
Data steward responsibilities are:
Instructions
In this practice, you consider your own warehouse environment, and propose a high level strategy for metadata management.
Part I - Metadata Specification
1. Specify the elements and the sources that are to be included in the business (user) metadata.
2. Specify the elements for the technical metadata, indicating the use by function.
Part II - Metadata Management
1. Specify the software tools that you may use to develop and maintain the following functions in your warehouse.
2. Propose a stewardship program for metadata considering:
a) What department or organization would own the responsibility?
b) What roles are appropriate in your environment?Discuss ways to maintain the business (user) metadata considering software
repositories and elements.c) Who could fill what role?
d) What are the barriers or challenges to implementing this strategy?
Topic 5 Solutions
Part I - Metadata Specification
1. Specify the software tools that you may use to develop and maintain the following functions in your warehouse.
(Hint: Consider carefully what a user might want to know about data, such as names, content, meaning, and derivations.)
An example list would include:
- Name of the data field
- Business definition
- Use
- Access frequency
- Person or department who requested it
- Dependencies
- Update frequency
- Source process
2. Specify the elements for the technical metadata, indicating the use by function.
(Hint: Remember technical metadata is used for ETT, management, and monitoring.)
An example list would include:
- Field names
- Size
- Format
- Update frequency
- Formula for calculation
- Cardinality
- Source field or fields
- Process that transforms or calculates the data
- Frequency of access
- Relationships
- Valid values
Part II - Metadata
Management
1. Specify the software tools that you may use to develop and maintain your warehouse.
Note: Oracle has partnerships with many third party vendors who supply products that support all areas of data warehousing. Particularly relevant to components such as data mining and ETT. Some of these companies are Sagent, Informatica, PowerBuilder, and Prism.
2. Propose a stewardship program for metadata considering:
a) What department or organization would own the responsibility?
b) What roles are appropriate in your environment?Discuss ways to maintain the business (user) metadata considering software
repositories and elements.c) Who could fill what role?
d) What are the barriers or challenges to implementing this strategy?
As you consider your response to these questions, think about your organization and what group is managing the metadata at the current time. Typically it is the database administrator who looks after the metadata for the entire environment, or a group of business analysts, or even a group of application developers.
In order to make the right decision and propose a program that is going to work for your organization you must consider:
The current organization responsibilities and how additional responsibilities are going to affect the organization and management.
The current internal communication lines and allegiances that are going to be affected.
Providing the right access to those who need the metadata, when they need it and in a format they need.
The implications of additional responsibilities for existing headcount or the possibility of hiring new personnel.
How you are going to justify the costs of this to the sponsors of the warehouse.
Practice
Objective:
The
goal of this lesson is to show how specific business needs can influence the
data warehouse design. You also learn how to translate business issues into
subject areas.
In
this practice you are given several business scenarios to examine. For each you:
·
Evaluate
different business activities.
·
Recommend
the database type to support the activity.
1.
Specify the type of application
(transaction processing or decision support), needed to
support the following activities:
a.
Telecommunications
Telephone
call detail recording
Customer
Service such as ordering service and disconnecting lines
Studies
of minutes of call use by customer group
Segmentation
of customers by minutes of call use
Product
bundling analysis
Customer
payment processing
b.
Retail Banking
Depositing
money
Recording
checks cleared
Longitudinal
analysis of average daily balance
Telephone
banking recording
Cost
of providing walk-up service
Analyzing
customer balances
Valuing
credit portfolio
c.
Airlines
Reserving
a flight
Assigning
seats
Planning
meals
Evaluating
meals consumed
Providing
customer lists per flight
Purchasing
new routes
Identifying
cost effective routes
Householding
flight preferences
2.
Identify
the types of data needed to support
the following activities:
a.
Telecommunications
Determining
which products should be provided in which area based on expected customer
demand
b.
Retail
Banking
Evaluating
the long term customer and household value
c.
Airlines
Optimizing
capacity and pricing based on customer demand
Topic
3 Solutions
1.
Specify the type of application
(transaction processing or decision support), needed to
support the following activities:
a.
Telecommunications
Telephone
call detail recording
Transaction
Processing
Customer
Service such as ordering service and disconnecting lines
Transaction
Processing
Studies
of minutes of call use by customer group
Decision
Support
Segmentation
of customers by minutes of call use
Decision
Support
Product
bundling analysis
Decision
Support
Customer
payment processing
Transaction
Processing
b.
Retail Banking
Depositing
money
Transaction
Processing
Recording
checks cleared
Transaction
Processing
Longitudinal
analysis of average daily balance
Decision
Support
Telephone
banking recording
Transaction
Processing
Cost
of providing walk-up service
Decision
Support
Analyzing
customer balances
Decision
Support
Valuing
credit portfolio
Decision
Support
c.
Airlines
Reserving
a flight
Transaction
Processing
Assigning
seats
Transaction
Processing
Planning
meals
Decision
Support and Transaction Processing
Evaluating
meals consumed
Decision
Support
Providing
customer lists per flight
Transaction
Processing
Purchasing
new routes
Decision
Support
Identifying
cost effective routes
Decision
Support
Householding
flight preferences
Decision
Support
2.
Identify
the types of data needed to support
the following activities:
a.
Telecommunications
Determining
which products should be provided in which area based on expected customer
demand
·
Products
·
Customers
·
Dialing
Codes, Country Codes, Area Code
·
Time
·
Purchases
·
Length
of time using a product
·
Time
between establishment of line and purchasing products
b.
Retail
Banking
Evaluating
the long term customer and household value
·
Customer
·
Customer
related to the Household
·
External
householding profiles
·
Bank
accounts
·
Bank
balances
·
Promotions
and purchases
·
Length
of time with bank
·
Average
balance over time
·
Number
of transactions
·
Products
and Product types
c.
Airlines
Optimizing
capacity and pricing based on customer demand
·
Routes
·
Aircraft
·
Number
of seats per aircraft type
·
Customers
·
Flight
cost
·
Time
This practice draws on the following case study.
GLOBAL COMPUTING COMPANY
Background
The development and design of a data warehouse is inextricably linked to the business needs of an enterprise. Global Computing Company (GCC) is a fictitious company that has a need for an analytical data warehouse.
Global Computing Company has been in business since 1990. Like every hardware and software distributor, the company is searching for new business in an already saturated market.
Distribution channels have been traditionally through direct sales and catalog orders. Recently, GCC started offering products through the Internet.
You explore GCCs business needs more fully as you progress through the design process.
Throughout the remaining practices in this course, you refer to GCCs issues and requirements to help you make critical design decisions.
Business Situation
It is January 2001 and the situation at Global Computing requires some changes. Sales have been lower than they have been in the past two years. The flagship product, the Sentinel Desktop, is dated and is not selling well.
The vice president of sales and marketing cannot understand what is driving the business. Sales show unexpected spikes. Lots of sales are being generated through the Internet, but it is unclear whether this channel is introducing new clients to GCC or whether it is acquiring other channels.
The president is concerned because the aggregate margins are shrinking and therefore profits are declining.
Technology is changing and domestic suppliers and manufacturers are more difficult to find. Components are available but assembly drives the cost of the end product thereby adding to the shrinking margins.
Business Objectives
In strategic discussions, managers articulate the objectives and goals for the company, for the supply sidemanufacturing and operationsand for the demand sidesales and marketing. GCC has chosen to focus on the demand side for the initial warehouse implementation.
Information Requirements
Through interviews and the requirements-building process, you have identified the following information needs:
These provide a guideline to your design both in the data that should be included and the arrangement of that data to support this analysis.
Global Computing Company Value Proposition
After much discussion and evaluation, the management of Global Computing Company identified its greatest business opportunities that could be addressed through a data warehouse.
The proof-of-concept deliverable for GCC addresses the following issues:
Objective
In this practice you learn how to identify source data for your data warehouse. While discovering the source databases, you adjust your warehouse to schema to accommodate the limitations and constraints of the data that is available.
In this practice, you:
Next you will design the physical model for the data warehouse for Global Computing Company.
Review
Creating the logical database design involves:
Note: Although indexes are not strictly part of logical design, in reality they are usually identified at this stage.
Begin your design by identifying the subject areas of your business to be implemented in the first (current) increment. Consider only the business view of the information requirements. Your perspective needs to be independent of any implementation constraints or concerns at this stage.
Subject areas are building blocks that are uniquely identifiable and usable across the business. These subjects need to be founded on long-term, stable business concepts.
Note: The subject areas become the dimensions of the star model.
The logical design may consider implementation consequences. Changes need to be made to the logical model for technical reasons, such as:
GCC Subject Areas
From the Global Computing Company business case, the following subject areas have been identified for the conceptual model:
Eliminating the subject areas that do not pertain to the first increment of the warehouse, results in the following entities:
Although it may be tempting to create the entities for the warehouse from existing systems, you need to define the entities from the subject areas identified by your specific analysis of business requirements. These entities become the dimensions of your warehouse model.
The next step in creating your conceptual model is to identify the relationship within each entity and between entities. The relationships are important in this model, as your starting point.
Identifying the Attributes
Now that you have identified the entities and relationships between them from the outlined business requirements you can now determine in your what sort of conceptual model, information about each entity is of interest. The attributes are the descriptors in the entity. Attributes may also become primary and foreign key values.
The data warehouse may contain:
Key values are determined in your next step.
Note: The attributes of entities ultimately become the table columns in the logical model.
Identifying the Critical Business Measures
The critical measures of the business are the facts by which the business is evaluated. Consider different ways the business might need to be viewed as you define these measures. For example consider:
The key performance measures represent the direct results of a business event and not an analysis about what the event means.
Note: These measures typically become the attributes of the fact table.
Defining the Keys
Your entity relationship model now contains the:
Map Source System to Subject Area
As you consider which system can provide the warehouse with data, conduct a life cycle mapping of sources to the subject areas. For example, looking at the Order Entry System for Global Computing Company, you see that the Customer subject area is created, updated, and read through this system.
You should assess each source system for its viability to act as an accurate, continuing source of data by evaluating the source systems:
These measures indicate the long term stability of the system as a data provider for the warehouse.
Instructions
Use the schema provided in Appendix A to:
1. Identify the subject areas for the GCC data warehouse.
(Hint: Consider all the information needs of GCC in your entity relationship diagram.)
2. Select the subjects and relationships that satisfy the needs of the value proposition.
3. Draw a conceptual model of the GCC enterprise warehouse.
4. Outline the fact table and all dimensions of the logical model.
Note: At this time you are not concerned with identifying the attributes on this model.
Note: Make sure that you have included a Time dimension in your model.
5. Identify the attributes
for the fact table and each dimension table in your data
warehouse model.
6. Identify the source data available in the OLTP model.
Topic 6 Solutions
1. Identify the subject areas for the GCC data warehouse.
(Hint: Consider all the information needs of GCC in your entity relationshipdiagram.)
2. Select the subjects and relationships that satisfy the needs of the value proposition,
3. Draw a conceptual model of the GCC enterprise warehouse.
One conceptual model of the GCC warehouse application is as follows:
Note: You may produce a slightly different version of this model, which is not a problem as long as the main subject areas are identified.
4. Outline the fact table and all dimensions of the logical model.
Note: At this time you are not concerned with identifying the attributes on this model. Make sure that you have included a Time dimension in your model.
5. Identify the attributes for the fact table and each dimension table in your data warehouse model.
Note: Bold indicates a key column.
7. Identify the source data available in the OLTP model.