Henry
Lau
Microsoft Corporation
October
1998
Summary:
Provides
database administrators and developers with valuable information on Microsoft
SQL Server 7.0 index selection, disk I/O subsystem tuning, and SQL Server 7.0
performance tuning tools. (47 printed pages)
Audience
Principles of Performance Tuning SQL Server
Performance Items for SQL Server Configuration
Components That Consume CPU and Disk I/O Resources
Disk I/O Performance
SQL Server Indexes
SQL Server Performance Tuning Tools
Miscellaneous Performance Topics
Finding More Information
This
performance tuning guide is designed to help database administrators configure
Microsoft® SQL Server™ for maximum performance and
help determine the cause of poor performance in a SQL Server environment. It
also provides SQL Server application developers guidance on how to use SQL
Server indexes and SQL Server tools for analyzing the input/output (I/O)
performance efficiency of SQL queries.
SQL
Server 6.x vs. 7.0 Performance Tuning
Comparative Information
SQL
6.x considerations |
SQL
Server 7.0 enhancements and philosophies |
There
were many configuration options to consider during performance tuning. |
One
major enhancement to SQL Server 7.0 is that the database engine has become
largely self-configuring, self-tuning, and self-managing. LazyWrtier
and Read-Ahead Manager are self-tuning. Max Async
I/O is likely the only sp_configure option that
will need to be initially configured when dealing with servers with larger
amounts of storage. This reduction in tuning requirements saves valuable
administrative time that can be applied to other tasks. Readers familiar with
the earlier version of this document, "Microsoft SQL Server 6.5: I/O
Performance Tuning Quick Reference," will notice that there are not
nearly as many sp_configure options that need to be
manually adjusted by the database administrator in order to achieve good SQL
Server performance. While it is still possible to manually configure and
adjust many of the sp_configure options that were
available in previous versions of SQL Server, it is recommended that database
administrators allow SQL Server to automatically configure and tune all sp_configure options that SQL Server provides defaults
for. This allows SQL Server to automatically adjust the configuration of the
database server as factors affecting the database server change. (Examples of
such factors include RAM and CPU utilization for SQL Server and other
applications running off the database server.) |
Manual
tuning of LazyWriter was sometimes necessary. |
SQL
Server 7.0 automatically configures and tunes the LazyWriter.
There is no longer the need to manually tune "free buffer" and
"max lazywriter io."
Free buffer and LazyWriter I/O activity can still
be monitored with the new SQL Server 7.0 Performance Monitor objects
described later. |
Manual
tuning of Checkpoint was sometimes necessary. |
In
versions of SQL Server prior to the 7.0 version, "recovery
interval" was also used to tune the checkpoint process. SQL Server 7.0
automatically tunes the recovery interval option. The SQL Server 7.0 default
of 0 for recovery interval indicates that SQL Server will take responsibility
for automatically monitoring and tuning recovery interval. This default
setting will maintain recovery times less than one minute for all databases
as long as there are no exceptionally long-running transactions present on
the system. For more information, search on the strings "Checkpoints and
the Active Portion of the Log" and "troubleshooting recovery
interval" in SQL Server Books Online. (See the section "Finding More Information" at the end of this
document for information on installing SQL Server Books Online.) |
SQL
Server 6.x log pages shared RAM cache with data pages. Manual tuning
of the Log Manager was sometimes necessary. |
SQL
Server Log Manager has changed significantly in SQL Server 7.0 from previous
versions of SQL Server. SQL Server 7.0 Log Manager manages its own log cache.
There is no longer the dependency on the SYSLOGS table that existed in
earlier versions of SQL Server. This separating of the log file management
from the data cache management brings enhanced performance for both
components. SQL Server Log Manager is also capable of performing disk I/O in
larger byte sizes than before. The larger I/O size combined with the
sequential nature of SQL Server logging help to make disk I/O performance
very good for the Log Manager. SQL Server 7.0 automatically tunes the
performance of SQL Server Log Manager. There is no longer the need to
manually tune the sp_configure option logwrite sleep. This option has been removed from SQL
Server 7.0. For more information, search in SQL Server Books Online for the
string "logwrite sleep." |
·
Let
SQL Server take care of most of the tuning work.
SQL
Server 7.0 has been dramatically enhanced in order to create a largely
auto-configuring and self-tuning database server. Take advantage of SQL
Server's auto-tuning settings. This helps SQL Server run at peak performance
even as user load and queries change over time.
·
RAM
is a limited resource.
A
major part of any database server environment is the management of random
access memory (RAM) buffer cache. Access to data in RAM cache is much faster
than access to the same information from disk. But RAM is a limited resource.
If database I/O (input/output operations to the physical disk subsystem) can be
reduced to the minimal required set of data and index pages, these pages will
stay in RAM longer. Too much unneeded data and index information flowing into
buffer cache will quickly push out valuable pages. The driving focus of
performance tuning is to reduce I/O so that buffer cache is best utilized.
·
Create
and maintain good indexes.
A
key factor in maintaining minimum I/O for all database queries is to ensure
that good indexes are created and maintained.
·
Monitor
disk I/O subsystem performance.
The
physical disk subsystem must provide a database server with sufficient I/O
processing power in order for the database server to run without disk queuing. Disk queuing results in bad performance. This document
describes how to detect disk I/O problems and how to resolve them.
·
Application
and Query Tuning.
This
becomes especially important when a database server will be servicing requests
from hundreds or thousands of connections through a given application. Because
applications typically determine the SQL queries that will be executed on a
database server, it is very important for application developers to understand
SQL Server architectural basics and how to take full advantage of SQL Server
indexes to minimize I/O.
·
Take
advantage of the powerful combination of SQL Server Profiler and Index Tuning
Wizard.
SQL
Server Profiler can be used to monitor and log a SQL Server's workload. This
logged workload can then be submitted to SQL Server Index Tuning Wizard so that
index changes can be made to help performance if necessary. Regular use of SQL
Profiler and Index Tuning Wizard helps SQL Server perform well as the overall
query workload changes over time.
·
Take
advantage of SQL Server Performance Monitor to detect bottlenecks.
SQL
Server 7.0 provides a revised set of Performance Monitor objects and counters,
which are designed to provide helpful information for monitoring and analyzing
the operations of SQL Server. This document describes key Performance Monitor
counters to watch.
·
Take
advantage of SQL Server Query Analyzer and Graphical ShowPlan.
SQL
Server 7.0 introduces Graphical ShowPlan, an easy
method to analyze problematic SQL queries. Statistics I/O is another important
aspect of Query Analyzer that this document will describe.
The default
of 32 is sufficient for lower-end disk subsystems. With a higher-end RAID
(Redundant Array of Inexpensive Disks) storage subsystem attached to a database
server that is capable of very high disk I/O transfer rates, the setting of 32
may do a disservice to the RAID subsystem because the RAID subsystem is capable
of completing many more simultaneous disk transfer requests than 32. If, in
addition to this, SQL Server write activity dictates that more disk transfer
capability is needed, max async I/O should be set
higher.
Note On
Microsoft Windows® 95/98 platforms asynchronous I/O is not supported, so this
option is not applicable.
A good
value for max async I/O is one that allows Checkpoint
to be "fast enough." The goal is to make Checkpoint fast enough to
finish before another checkpoint is needed (based upon desired recovery
characteristics), but not so fast that the system is seriously perturbed by the
event (disk queuing, which will be discussed in further detail later in this
document).
A rule of
thumb for setting max async I/O for SQL Servers
running on larger disk subsystems is to multiply the number of physical drives
available to do simultaneous I/O by 2 or 3. Then watch Performance Monitor for
signs of disk activity or queuing issues. The negative impact of setting this
configuration option too high is that it may cause Checkpoint to monopolize
disk subsystem bandwidth that is required by other SQL Server I/O operations,
such as reads.
To set this
value, execute the following command in SQL Server Query Analyzer: "sp_configure 'max async io', <value>", where <value> is expressed
as the number of simultaneous disk I/O requests that the SQL Server system will
be able to submit to Windows during a checkpoint operation, which in turn
submit the requests to the physical disk subsystem. (See the section "Disk I/O Performance" later in this document for
more information.) This configuration option is dynamic (that is, it does not
require a stop and restart of SQL Server to take effect).
For more
information, search in SQL Server Books Online for the strings "I/O
architecture" and "max async I/O
option."
SQL Server
maintains a pool of Windows threads that are used to service batches of SQL
Server commands being submitted to the database server. The total number of
these threads (referred to in SQL Server terminology as "worker
threads") available to service all incoming command batches is dictated by
the setting for the sp_configure option max worker
threads. If the number of connections actively submitting batches is greater
than the number specified for max worker threads, worker threads will be shared
among connections actively submitting batches. The default of 255 will work
well for many installations. Note that the majority of connections spend most
of their time waiting for batches to be received from the client.
Worker
threads take on most of the responsibility of writing out dirty 8-kilobyte (KB)
pages from the SQL Server buffer cache. Worker threads schedule their I/O
operations asynchronously for maximum performance.
For more
information, search on the strings "max worker threads option,"
"sql server threads," "Optimizing
Server Performance Using Memory Configuration Options," "sql server memory pool," "Transaction
Recovery," "write ahead transaction log," and "Freeing and
Writing Buffer Pages" in SQL Server Books Online.
SQL Server LazyWriter helps to produce free buffers, which are 8-KB
data cache pages without any data contained in them. As LazyWriter
flushes each 8-KB cache buffer out to disk, it needs to initialize the cache
page's identity so that other data may be written into the free buffer. LazyWriter aims to produce free buffers during periods of
low disk I/O, such that disk I/O resources are readily available for use and
there will be minimal impact on other SQL Server operations.
SQL Server
7.0 automatically configures and manages the level of free buffers. Monitor
"SQL Server: Buffer Manager—Free Buffers" to see if this value dips.
Optimally, LazyWriter keeps this counter level
throughout SQL Server operations, which means that LazyWriter
is keeping up with the user demand for free buffers. It is bad for the
Performance Monitor object "SQL Server: Buffer Manager—Free Buffers"
to hit zero, indicating there were times when the user load demanded a higher
level of free buffers than the SQL Server's LazyWriter
was able to provide.
If LazyWriter is having problems keeping the free buffer steady, or at least above zero, it could mean that the disk
subsystem is not able to provide LazyWriter with the
disk I/O performance that LazyWriter needs. (Compare
drops in free buffer level to any disk queuing to see if this is true.) The
solution is to add more physical disk drives (also commonly referred to as
"spindles") to the database server disk subsystem in order to provide
more disk I/O processing power. "SQL Server: Buffer Manager—Lazy
Writes/sec" indicates the number of 8-KB pages being physically written
out to disk.
Monitor the
current level of disk queuing in Performance Monitor by looking at the counters
for (Logical or Physical) Disk: Average Disk Queue or Current Disk Queue, and
ensure the disk queue is less than 2 for each physical drive associated with
any SQL Server activity. For database servers that employ hardware RAID
controllers and disk arrays, remember to divide the number reported by
Logical/Physical Disk counters by the number of actual hard drives associated
with that logical drive letter or physical hard drive number (as reported by
Disk Administrator), because Windows and SQL Server are unaware of the actual
number of physical hard drives attached to a RAID controller. It is very
important to be aware of the number of drives associated with the RAID array
controller in order to properly interpret the disk queue numbers that
Performance Monitor is reporting.
Adjust LazyWriter disk I/O request behavior with the use of max async I/O. The sp_configure
option max async I/O controls the number of 8-KB disk
write requests (including requests coming in from LazyWriter,
CheckPoint, and the worker threads) that SQL Server
can simultaneously submit to Windows and, in turn, to the disk I/O subsystem.
If disk queuing occurs at unacceptable levels, decrease max async
I/O. If it is imperative that SQL Server maintains its currently configured
level of max async I/O, add more disks to the disk
subsystem until disk queuing comes down to acceptable levels.
For more
information, search for the strings "freeing and writing buffer
pages" and "write-ahead transaction log" in SQL Server Books
Online.
Checkpoint
writes dirty pages out to the SQL Server data files. Dirty pages are any buffer
cache pages that have been modified since being brought into the buffer cache.
A buffer written to disk by Checkpoint still contains the page and users can
read or update it without rereading it from disk, which is not the case for
free buffers created by LazyWriter.
Checkpoint
aims to let worker threads and LazyWriter do the
majority of the work writing out dirty pages. Checkpoint does this by trying an
extra checkpoint wait before writing out a dirty page if possible. This
provides the worker threads and LazyWriter more time
with which to write out the dirty pages. The conditions
under which this extra wait time for a dirty page occurs is detailed in SQL
Server Books Online under the section "Checkpoints and the Active Portion
of the Log." The main idea to remember is that Checkpoint aims to even out
SQL Server disk I/O activity over a longer time period with this extra
checkpoint wait.
To make
Checkpoint more efficient when there are a large number of pages to flush out
of cache, SQL Server will sort the data pages to be flushed in the order the
pages appear on disk. This will help minimize disk arm movement during cache
flush and potentially take advantage of sequential disk I/O. Checkpoint also
submits 8-KB disk I/O requests asynchronously to the disk subsystem. This
allows SQL Server to finish submitting required disk I/O requests faster
because Checkpoint doesn't wait for the disk subsystem to report back that the
data has been actually written to disk.
It is
important to watch disk queuing on hard drives associated with SQL Server data
files to notice if SQL Server is sending down more disk I/O requests than the
disk(s) can handle, and if this is true, more disk I/O capacity must be added
to the disk subsystem so that it can handle the load.
Adjust
Checkpoint's dirty page flushing behavior with the use of max async I/O. The sp_configure
option max async I/O controls the number of 8-KB
cache flushes that Checkpoint can simultaneously submit to Windows (and, in
turn, to the disk I/O subsystem). If disk queuing occurs at unacceptable
levels, decrease max async I/O. If it is imperative
that SQL Server maintains its currently configured level of max async I/O, add more disks to the disk subsystem until disk
queuing comes down to acceptable levels.
If, on the
other hand, it is necessary to increase the speed with which SQL Server
executes Checkpoint and the disk subsystem is already powerful enough to handle
the increased disk I/O while avoiding disk queuing, increase max async I/O to allow SQL Server to send more disk I/O
requests at the same time, potentially improving I/O performance. Watch the
disk queuing counters carefully after changing max async
I/O. Be sure to watch disk read queuing in addition to disk write queuing. If
max async I/O is set too high for a given disk
subsystem, Checkpoint may tend to queue up a large number of disk write I/O
requests and this can cause SQL Server read activity to be blocked. DBCC
PERFMON provides Reads Outstanding as one of its outputs and is a good counter
to watch in Performance Monitor to look for queued SQL Server reads. Physical
Disk and Logical Disk objects in Performance Monitor provide the Average Disk
Read Queue Length counter, which can also be used to monitor queued disk read
I/O requests. If disk read queuing is being caused by Checkpoint, the choices
are to either decrease max async I/O or to add more
hard drives so that the checkpoint and read requests can be simultaneously
handled.
Like all
other major relational database management system (RDBMS) products, SQL Server
ensures that all write activity (insert, update, and delete) performed on the
database will not be lost if something were to interrupt SQL Server's online
status (such as power failure, disk drive failure, fire in the data center, and
so on). One thing that helps guarantee recoverability is the SQL Server logging
process. Before any implicit (single SQL query) or explicit transaction
(defined transaction that issues a BEGIN TRAN/COMMIT, or ROLLBACK command
sequence) can be completed, SQL Server's Log Manager must receive a signal from
the disk subsystem that all data changes associated with that transaction have
been written successfully to the associated log file. This rule guarantees that
if the SQL Server is abruptly shut down for whatever reason and the
transactions written into the data cache are not yet flushed to the data files
(remember that flushing data buffers are Checkpoint or LazyWriter's
responsibility), the transaction log can be read and reapplied in SQL Server
upon turning on the SQL Server. Reading the transaction log and applying the
transactions to SQL Server after a server stoppage is referred to as recovery.
Because SQL
Server must wait for the disk subsystem to complete I/O to SQL Server log files
as each transaction is completed, it is important that the disks containing SQL
Server log files have sufficient disk I/O handling capacity for the anticipated
transaction load.
The method
of watching out for disk queuing associated with SQL Server log files is
different from SQL Server database files. Use the Performance Monitor counters
"SQL Server: Databases <database instance>: Log Flush Waits
Times" and "SQL Server: Databases <database instance>: Log
Flush Waits/sec" to see if there are log writer requests waiting on the disk
subsystem for completion.
It is okay
to use a caching controller for SQL Server log files (in fact, it's a necessity
for the highest performance) if the controller guarantees that data entrusted
to it will be written to disk eventually, even if the power fails, for example.
For more information on caching controllers, refer to the section later in this
document titled "Effect of on-board cache of hardware RAID
controllers."
For more
information, search in SQL Server Books Online for the strings
"Transaction Recovery," "SQL Server: Log Manager Object,"
and "optimizing transaction log performance."
SQL Server
7.0 Read-Ahead Manager is completely self-configuring and self-tuning.
Read-Ahead Manager is tightly integrated with the operations of SQL Server
Query Processor. Query Processor communicates situations that would benefit
from read-ahead scans to Read-Ahead Manager. Large table scans, large index
range scans, and probes into clustered and nonclustered
index binary trees (or B-trees) are situations that would benefit from a
read-ahead. This is because read-aheads occur with
64-KB I/Os, which provide higher disk throughput potential
for the disk subsystem than do 8-KB I/Os. When it is
necessary to retrieve a large amount of data from SQL Server, read-ahead is the
best way to do it.
Read-Ahead
Manager benefits from the simpler and more efficient Index Allocation Map (IAM)
storage structure. The IAM is SQL Server 7.0's new method of recording the
location of extents (8 pages of SQL Server data or index information for a
total of 64 KB of information per extent). The IAM is an 8-KB page that tightly
packs information (through a bitmap) about which extents within the range of
extents covered by the IAM contain required data. The compact nature of IAM
pages make them fast to read and tends to keep regularly used IAM pages in
buffer cache.
Combining
the query information from Query Processor and quickly retrieving the location
of all extents that need to be read from the IAM page(s), Read-Ahead Manger can
construct multiple sequential read requests. Sequential 64-KB disk reads
provide extremely good disk I/O performance.
"SQL
Server: Buffer Manager—Read-Ahead Pages" is the important Performance
Monitor counter to watch with respect to SQL Server Read-Ahead Manager. More
information can be found by executing the command DBCC PERFMON (IOSTATS). Among
the information is "RA Pages Found in Cache" and "RA Pages
Placed in Cache." If the page is already hashed (that is, the application
read it in first and read-ahead wasted a read), it counts as a "page found
in cache." If the page is not already hashed (that is, a successful read-ahead),
it counts as a "page placed in cache."
One caveat
about Read-Ahead Manager is that too much read-ahead can be detrimental overall
to performance because it can fill cache with pages that were not needed,
requiring additional I/O and CPU that could have been used for other purposes.
The solution is a general performance tuning goal, and that is to make sure
that all SQL queries are tuned such that a minimal number of pages are brought
into buffer cache. This would include making sure you use the right index for
the right job. Save clustered indexes for efficient range scans and define nonclustered indexes to help quickly locate single rows or
smaller rowsets. Note that if you only plan to have
one index in a table and that index is for the purposes of fetching single rows
or smaller rowsets, you should make the index
clustered because clustered indexes will be faster than nonclustered
indexes (but not by the same dramatic scale as is the case for range scans).
For more
information, search in SQL Server Books Online for the strings "reading
pages," "DBCC PERFMON," "Table and Index
Architecture," "Heap Structures," and "read-ahead
pages."
When
configuring a SQL Server that will only contain a few gigabytes (GB) of data
and not sustain heavy read nor write activity, it is not as important to be
concerned with the subject of disk I/O and balancing of SQL Server I/O activity
across hard drives for maximum performance. But to build larger SQL Server
databases that will contain hundreds of gigabytes of data and/or will sustain
heavy read and/or write activity, it is necessary to drive configuration around
maximizing SQL Server disk I/O performance by load-balancing across multiple
hard drives.
One of the
most important aspects of database performance tuning is I/O performance
tuning. SQL Server is certainly no exception to this philosophy. Unless SQL
Server is running on a machine with enough RAM to hold the entire database, I/O
performance will be dictated by how fast reads and writes of SQL Server data
can be processed by the disk I/O subsystem.
A good rule
of thumb to remember is that the typical Wide Ultra SCSI-3 hard drive is
capable of providing Windows and SQL Server with about 75 nonsequential
(random) and 150 sequential I/O operations per second. Advertised transfer
rates in terms of megabytes (MB) for these hard drives range around 40
MB/second. Keep in mind that it is much more likely for a database server to be
constrained by the 75/150 I/O transfers per second than the 40 MB/second
transfer rate. This is illustrated by the following calculations:
(75 random
I/O operations per second) X (8-KB transfer) = 600 KB per second
The
preceding calculation indicates by doing strictly random read or write SQL
Server operations on a given hard drive (single page reads and writes), it is
reasonable to expect at most 600 kilobytes (0.6 megabytes) per second I/O
processing capability from that hard drive. This is much lower than the
advertised 40 MB per second I/O handling capacity of the drive. SQL Server
worker threads, Checkpoint, and LazyWriter perform
I/O in 8-KB transfer sizes.
(150
sequential I/O operations per second) X (8-KB transfer) = 1,200 KB per second
The
preceding calculation indicates by doing strictly sequential read or write SQL
Server operations on a given hard drive (single page reads and writes), it is
reasonable to expect at most 1,200 kilobytes (1.2 megabytes) per second I/O
processing capability from that hard drive.
(75 random
I/O operations per second) X (64-KB transfer) = 4,800 KB (4.8 MB) per second
The
preceding calculation illustrates a worst-case scenario for read-aheads, assuming all random I/O. Note that even in the
completely random situation, the 64-KB transfer size still provides much better
disk I/O transfer rate from disk (4.8 MB/second) than the single page transfer
rates (0.6 and 1.2 MB/second):
(150
sequential I/O operations per second) X (64-KB transfer) = 9,600 KB (9.6 MB)
per second
The
preceding calculation indicates that by doing strictly
sequential read or write SQL Server operations on a given hard drive, it
is reasonable to expect at most 9.6 MB per second I/O processing capability
from that hard drive. This is much better than the random I/O case. SQL Server
Read-Ahead Manager performs disk I/O in the 64-KB transfer rate and will
attempt to arrange its reads such that read-ahead scans are done sequentially
(often referred to as "serially" or "in disk order"). While
Read-Ahead Manager aims to perform I/O operations sequentially, page splitting
will tend to cause extents to be read nonsequentially
versus sequentially. This is one reason why it is important to eliminate and
prevent page splitting.
Log Manager
will write sequentially to the log files in sizes ranging up to 32 kilobytes.
The terms
sequential and nonsequential (random) have been used
quite a bit to refer to hard disk operations. It is worthwhile to take a moment
to explain what these terms basically mean in relation to a disk drive. A
single hard drive consists of a set of drive platters. Each of these drive
platters provides services for read/write operations with a set of arms with
read/write heads that can move across the platters and read information from
the drive platter or write data onto the platters. With respect to SQL Server,
there are two important points to remember about hard drives:
·
The
read/write heads and associated disk arms need to move in order to find and
operate on the location of the hard drive platter that SQL Server + Windows
asked it to. If the data is located on nonsequential
locations on the hard drive platter, it takes significantly more time for the
hard drive to move the disk arm and read/write head to all of the necessary
hard drive platter locations. This contrasts with the sequential case, in which
all of the required data is located on one continuous physical section of the
hard drive platter, so that the disk arm and read/write heads move a minimal
amount to perform the necessary disk I/O operations. The time difference
between the nonsequential versus sequential case is
significant, about 50 milliseconds per nonsequential
seek versus approximately 2-3 milliseconds for sequential seeks. Note that
these times are rough estimations and will vary based upon how far apart the nonsequential data is spread around on the disk, how fast
the hard disk platters can spin (RPM), and other physical attributes of the
hard drive. The main point to remember is that sequential I/O is good for SQL
Server performance.
·
It
has been mentioned in this document that a typical hard drive supports about 75
nonsequential and 150 sequential I/Os per second. It is important to remember that it takes
almost as much time to read or write 8 kilobytes as it does to read or write 64
kilobytes. Within the range of 8 KB to about 64 KB it remains true that disk
arm + read/write head movement accounts for the majority of the time spent for
a single disk I/O transfer operation. So, mathematically speaking, it is
beneficial to try to perform 64-KB disk transfers as much as possible when more
than 64 KB of SQL data needs to be transferred, because a 64-KB transfer is
essentially as fast as an 8-KB transfer and eight times the amount of SQL
Server data is processed for each transfer. Remember that Read-Ahead Manager
does its disk operations in 64-KB chunks (referred to as a SQL Server extent).
Log Manager performs sequential writes in larger I/O sizes as well. The main
points to remember are that making good use of Read-Ahead Manager and
separating SQL Server log files from other nonsequentially
accessed files are good things for SQL Server performance.
Readers
interested in more detail about physical hard drives should refer to the Compaq
white paper "Disk Subsystem Performance and Scalability," the
location of which is mentioned in the "Finding More Information" section at the end of
this document.
A typical
hard disk provides a maximum transfer rate of about 40 megabytes per second or
75 nonsequential/150 sequential disk transfers per
second. Typical RAID controllers have an advertised transfer rate of about 40
megabytes per second or (very approximately) 2,000 disk transfers per second.
Peripheral Component Interconnect (PCI) buses have an advertised transfer rate
of about 133 megabytes per second and higher. The actual transfer rates
achievable for a device will differ from the advertised rate, but that is not
important for our discussion here. What is important to understand is how to
use these transfer rates as a rough starting point for determining the number
of hard drives to associate with each RAID controller and, in turn, how many
drives + RAID controllers can be attached to a PCI bus without I/O bottleneck
problems.
In the
earlier section titled "How Advertised Disk Transfer Rates Relate to SQL
Server," it was calculated that the maximum amount of SQL Server data
that can be read from or written to a hard drive in one second is 9.6 MB.
Assuming a RAID controller can handle 40 MB per second, it is possible to
roughly calculate the number of hard drives that should be associated with one
RAID controller by dividing 40 by 9.6 to get approximately 4. This means that
at most 4 drives should be associated with that one controller when SQL Server
is doing nothing but sequential I/O of 64 KB. Similarly, it was previously
calculated that with all nonsequential I/O of 64 KB,
the maximum data sent up from the hard drive to the controller would be 4.8
MB/second. Dividing 40 MB/second by 4.8 MB/second gives us the result of about
8. This means that at most 8 hard drives should be associated with the single
controller in the nonsequential 64-KB scenario. The
random 8-KB data transfer scenario requires the most drives. Divide 40 by 0.6
to determine that about 66 drives are needed to saturate a RAID controller
doing 100 percent random 8-KB reads and writes. This is not a realistic
scenario because read-ahead and log writing will use transfer sizes greater
than 8 KB and it is very unlikely that a SQL Server will perform 100 percent
random I/O.
Another way
to figure out how many drives should be associated with a RAID controller is to
look at disk transfers per second instead of looking at megabytes per second.
If a hard drive is capable of 75 nonsequential
(random) I/Os per second, it follows that about 26
hard drives working together could theoretically produce 2,000 nonsequential I/Os per second,
enough to hit the maximum I/O handling capacity of a single RAID controller. On
the other hand, it would only take about 13 hard drives working together to
produce 2,000 sequential I/Os per second and keep the
RAID controller running at maximum throughput, because a single hard drive can
sustain 150 sequential I/Os per second.
Moving
onto the PCI bus.
Note that RAID controller and PCI bus bottlenecks are not nearly as common as
I/O bottlenecks related to hard drives. But for the sake of illustration, let's
assume that it is possible to keep a set of hard drives associated with a RAID
controller busy enough to push 40 MB/second of throughput through the
controller. The next question would be "How many RAID controllers can be
safely attached to the PCI bus without risking a PCI bus I/O bottleneck?"
To make a rough estimation, divide the I/O processing capacity of the PCI bus
by the I/O processing capacity of the RAID controller: 133 MB/second divided by
40 MB/second provides the result that approximately three RAID controllers can
be attached to a single PCI bus. Note that most large servers come with more
than one PCI bus, so this would increase the number of RAID controllers that
could be installed in a single server.
These
calculations help illustrate the relationship of the transfer rates of the
various components that comprise a disk I/O subsystem (hard drives, RAID
controllers, and PCI bus) and are not meant to be taken literally. This is
because the calculations assume all sequential or all nonsequential
data access, which is not likely to ever be the case in a production database
server environment. In reality, a mixture of sequential, nonsequential,
8-KB, and 64-KB I/O will occur. And other factors will make it difficult to
estimate exactly how many I/O operations can be pushed through a set of hard
drives at one time. On-board read/write caching available for RAID controllers
increases the amount of I/O that a set of drives can effectively produce. How
much more is hard to estimate for the same reason that it is hard to place an
exact number on the number of 8-KB versus 64-KB I/O a SQL Server environment
will need.
But
hopefully this section has helped to foster some insight into what advertised
transfer rates really mean to SQL Server.
When
scaling databases past a few gigabytes it is important to have at least a basic
understanding of RAID (Redundant Array of Inexpensive Disks) technology and how
it relates to database performance.
The
benefits of RAID are:
·
Performance. Hardware RAID controllers divide
read/writes of all data from Windows and applications (like SQL Server) into
slices (usually 16–128 KB) that are spread across all disks participating in
the RAID array. Splitting data across physical drives like this has the effect
of distributing the read/write I/O workload evenly across all physical hard
drives participating in the RAID array. This increases disk I/O performance
because the hard disks participating in the RAID array as a whole are kept
equally busy, instead of some disks becoming a bottleneck due to uneven
distribution of the I/O requests.
·
Fault tolerance. RAID provides protection from hard disk
failure and accompanying data loss with two methods: mirroring and parity.
Mirroring is implemented by
writing information onto two sets of drives, one on each side of the mirrored
pairs of drives. If there is a drive loss with mirroring in place, the data for
the lost drive can be rebuilt by replacing the failed drive and rebuilding the
data from the failed drive's matching drive on the other side of the mirrorset. Most RAID controllers provide the ability to do this
failed drive replacement and rebuilding from the other side of the mirrored
pair while Windows and SQL Server are online (commonly referred to as "Hot
Plug" capable drives). One advantage of mirroring is that it is the
best-performing RAID option when fault tolerance is required. Each SQL Server
write in the mirroring situation costs two disk I/O operations, once to each
side of the mirrorset. The other advantage is that
mirroring provides more fault tolerance than parity RAID implementations.
Mirroring can sustain at least one failed drive and may be able to survive
failure of up to half of the drives in the mirrorset
without forcing the system administrator to shut down the server and recover
from file backup. The disadvantage of mirroring is cost. The disk cost of
mirroring is one drive for each drive worth of data. RAID 1 and its hybrid,
RAID 0+1 (sometimes referred to as RAID 10 or 0/1) are implemented through
mirroring.
Parity is implemented by
calculating recovery information about data written to disk and writing this
parity information on the other drives that form the RAID array. If a drive
should fail, a new drive is inserted into the RAID array and the data on that
failed drive is recovered by taking the recovery information (parity) written on
the other drives and using this information to regenerate the data from the
failed drive. RAID 5 and its hybrids are implemented through parity. The
advantage of parity is cost. To protect any number of drives with RAID 5, only
one additional drive is required. Parity information is evenly distributed
among all drives participating in the RAID 5 array. The disadvantages of parity
are performance and fault tolerance. Due to the additional costs associated
with calculating and writing parity, RAID 5 requires four disk I/O operations
for each Windows NT/SQL Server write as compared to two disk I/O operations for
mirroring. Read I/O operation costs are the same for mirroring and parity.
Also, RAID 5 can sustain only one failed drive before the array must be taken
offline and recovery from backup media must be performed to restore data.
General
Rule of Thumb: Be sure to stripe across as many disks as necessary to achieve
solid disk I/O performance. Performance Monitor will indicate if there is a
disk I/O bottleneck on a particular RAID array. Be ready to add disks and
redistribute data across RAID arrays and/or small computer system interface
(SCSI) channels as necessary to balance disk I/O and maximize performance.
Many
hardware RAID controllers have some form of read and/or write caching. Take
advantage of this available caching with SQL Server because it can
significantly enhance the effective I/O handling capacity of the disk
subsystem. The principle of these controller-based caching mechanisms is to
gather smaller and potentially nonsequential I/O
requests coming in from the host server (hence, SQL Server) and try to batch
them together with other I/O requests for a few milliseconds so that the
batched I/Os can form larger (32–128 KB) and maybe
sequential I/O requests to send to the hard drives. This, in keeping with the
principle that sequential and larger I/O is good for performance, helps produce
more disk I/O throughput given the fixed number of I/Os
that hard disks are able to provide to the RAID controller. It is not that the
RAID controller caching magically allows the hard disks to process more I/Os per second, the RAID controller cache is just using some
organization to arrange incoming I/O requests to make best possible use of the
underlying hard disks' fixed amount of I/O processing ability.
These RAID
controllers usually protect their caching mechanism with some form of backup
power. The backup power can help preserve the data written in cache for some
period of time (perhaps days) in case of a power outage. And in production
environments, provide the database server even more protection by providing
adequate uninterruptible power supply (UPS) protection to the server so that
the RAID controller has even more protection and battery backup time, which it
can utilize to flush data to disk in case power to the server is disrupted.
RAID 1 and
RAID 0+1 offers the best data protections and best performance among RAID
levels but will cost more in terms of disks required. When cost of hard disks
is not a limiting factor, RAID 1 or RAID 0+1 are the
best RAID choices in terms of both performance and fault tolerance.
RAID 5
provides fault tolerance at the best cost but has half the write performance of
RAID 1 and 0+1 because of the additional I/O that RAID 5 has to do reading and
writing parity information onto disk. RAID 5 is not as fault tolerant as RAID 1
and 0+1.
Best disk
I/O performance is achieved with RAID 0 (disk striping with no fault tolerance
protection), But since there is no fault tolerance with RAID 0, this RAID level
can only be typically used for development database servers or other testing
environments.
Many RAID
array controllers provide the option of RAID 0+1 (also referred to as RAID 1/0
and RAID 10) over physical hard drives. RAID 0+1 is a hybrid RAID solution. On
the lower level, it mirrors all data just like normal RAID 1. On the upper
level, the controller stripes data across all of the drives (like RAID 0).
Thus, RAID 0+1 provides maximum protection (mirroring) with high performance
(striping). These striping and mirroring operations are transparent to Windows
and SQL Server because they are managed by the RAID controller. The difference
between RAID 1 and RAID 0+1 is on the hardware controller level. RAID 1 and
RAID 0+1 require the same number of drives for a given amount of storage. For
more specifics on RAID 0+1 implementation of specific RAID controllers, contact
the hardware vendor that produced the controller.
Figure 1
illustrates the difference between RAID 0, RAID 1, RAID 5, and RAID 0+1. Note
that in order to hold four disks worth of data, RAID 1
(and RAID 0+1) needs eight disks, whereas Raid 5 needs five disks. Be sure to
involve the appropriate hardware vendors to learn more about RAID
implementation specific to the hardware running the database server.
Figure 1. Common RAID levels
This is a
very handy feature that allows disks to be added dynamically to a physical RAID
array while SQL Server is online, as long as there are hot plug slots
available. Many hardware vendors offer hardware RAID controllers that are
capable of providing this functionality. Data is automatically re-striped
across all drives evenly, including the newly added drive, and there is no need
to shut down SQL Server or Windows. It is good to take advantage of this
functionality by leaving hot plug hard drive slots free in the disk array
cages. Thus, if SQL Server is regularly overtaxing a RAID array with I/O
requests (this will be indicated by Disk Queue Length for the Windows logical
drive letter associated with that RAID array), it is possible to install one or
more new hard drives into the hot plug slot while SQL Server is still running.
The RAID controller will redistribute some existing SQL data to these new
drives so that SQL data is evenly distributed across all drives in the RAID array.
Then, the I/O processing capacity of the new drives (75 nonsequential/150
sequential I/Os per second, per drive) is added to
the overall I/O processing capacity of the RAID array.
In
Performance Monitor, Logical and Physical Disk Objects provide effectively the
same information. The difference is that Logical Disks in Performance Monitor
are associated with what Windows sees as a logical drive letter. Physical Disks
in Performance Monitor are associated with Windows sees as a single physical
hard disk.
To enable
Performance Monitor counters, use the command diskperf.exe from the command
line of a Windows command prompt window. Use "diskperf
–y" so that Performance Monitor will report Logical and Physical disk
counters. This works when using hard drives or sets of hard drives and RAID
controllers, without the use of Windows NT software RAID.
When
utilizing Windows NT software RAID, use "diskperf
–ye" so that that Performance Monitor will report Physical counters across
the Windows NT stripesets correctly. When "diskperf –ye" is used in conjunction with Windows NT stripesets, Logical counters will not report correct
information and need to be disregarded. If Logical disk
counter information is required in conjunction with Windows NT stripesets, use "diskperf
–y" instead. With "diskperf –y"
and Windows NT stripesets, Logical disk counters will
be reported correctly but Physical disk counters will not report correct information
and need to be disregarded.
Note that
the effects of the diskperf command do not occur
until Windows NT has been restarted.
Also note
that hardware RAID controllers present multiple physical hard drives that
compose a single RAID mirrorset or stripeset to Windows, as one single physical disk. Disk
Administrator is used to associate logical drive letters to the single physical
disk and doesn't need to be concerned with how many hard disks are really
associated with the single hard physical disk that the RAID controller has
presented to it.
But from a
performance tuning perspective, it is very important to be aware of how many
physical hard drives are associated with a RAID array because this information
will be needed when determining the number of disk I/O requests that Windows
and SQL Server are sending to each physical hard drive. Divide the number of
disk I/O requests that Performance Monitor reports as being associated with a
hard drive by the number of actual physical hard drives known to be in that
RAID array.
In order to
get a rough estimate of I/O activity per hard drive in a RAID array, it is also
important to multiply the number of disk write I/Os
reported by Performance Monitor by either 2 (RAID 1 and 0+1) or 4 (RAID 5).
This will give a more accurate account of the number of actual I/O requests
being sent to the physical hard drives, because it is at this physical level
that the I/O capacity numbers for hard drives apply (75 nonsequential
and 150 sequential per drive). But don't expect to be able to calculate exactly
how much I/O is hitting the hard drives this way, when the hardware RAID
controller is using caching, because caching can significantly change the
amount of I/O that is actually hitting the hard drives for the reasons just
explained.
It is best
to concentrate on disk queuing versus actual I/O per disk because, after all,
why worry about I/O if it is not causing a problem? Windows can't see the
number for physical drives in a RAID array, so to assess disk queuing per
physical disk accurately, it is important to divide the Disk Queue Length by
the number of physical drives participating in the hardware RAID disk array
that contains the logical drive being observed. Keep this
number under 2 for hard drives containing SQL Server files.
For more
information about SQL Server and RAID, search in SQL Server Books Online for
the strings "RAID Levels and SQL Server," "Comparing Different
Implementations of RAID Levels," "Monitoring Disk Activity,"
"Performance Monitoring Example: Identifying Bottlenecks,"
"About Hardware-based Solutions," and "RAID."
Windows NT
provides fault tolerance to hard disk failure by providing mirrorsets
and stripesets (with or without fault tolerance)
through the Windows NT operating system instead of a hardware RAID controller.
Windows NT Disk administrator is used to define either mirrorsets
(RAID 1) or stripesets with parity (RAID 5). Windows
NT Disk Administrator also allows the definition of stripesets
with no fault tolerance (RAID 0).
Software
RAID will utilize more CPU resources, because Windows NT is the component
managing the RAID operations versus the hardware RAID controller. Thus,
performance with the same number of disk drives and Windows NT software RAID
may be a few percent less than the hardware RAID solution if the system
processors are near 100 percent utilized. But Windows NT software RAID will
generally help a set of drives service SQL Server I/O better overall than those
drives would have been able to separately, reducing the potential for an I/O
bottleneck, leading to higher CPU utilization by SQL Server and better
throughput. And software RAID can provide a better-cost solution for providing
fault tolerance to a set of hard drives.
For more
information about configuring Windows NT software RAID, refer to Chapter 4,
"Planning a Reliable Configuration," in Windows NT Server Online
Help. Also search in SQL Server Books Online for the strings "About
Windows NT-based Disk Mirroring and Duplexing" and "About Windows
NT-based Disk Striping and Striping with Parity."
When
dealing with smaller SQL Server databases located on a few disk drives, disk
I/O parallelism will likely not come into play. But when dealing with large SQL
Server databases stored on many disk drives, performance will be enhanced by
using disk I/O parallelism to make optimal use of the I/O processing power of
the disk subsystem.
The
simplest techniques for creating disk I/O parallelism is to create a single
"pool of drives" that serves all SQL Server database files, excluding
transaction log files. The pool may be a single RAID array that is represented
in Windows NT as a single physical drive. Or a larger pool may be set up using
multiple RAID arrays and SQL Server files/filegroups.
A SQL Server file can be associated with each RAID array and the files can be
combined into a SQL Server filegroup. Then a database
can be built on the filegroup so that the data will
be spread evenly across all of the drives and RAID controllers. The "drive
pool" methodology depends on RAID to divide data across all physical
drives to help ensure parallel access to that data during database server
operations.
This pool
methodology simplifies SQL Server I/O performance tuning because database
administrators know that there is only one physical location to create database
objects. The single pool of drives can be watched for disk queuing and, if
necessary, more hard drives can be added to the pool to prevent disk queuing.
This technique helps optimize for the common case, where it is not known which
parts of databases may see the most usage. It is better not to have a portion
of the total available I/O capacity segregated away on some other disk
partition just because 5 percent of the time SQL Server might be doing I/O to
it. The "single pool of drives" methodology may help make all
available I/O capacity always available for SQL Server operations.
Note that
SQL Server log files should always be physically separated onto different hard
drives from all other SQL Server database files. For SQL Servers with very busy
databases, transaction log files should be physically separated from each
other. Transaction logging is primarily sequential write I/O. There is a lot of
I/O performance benefit associated with separating transaction logging activity
from other nonsequential disk I/O activity. That
allows the hard drives containing the log files to concentrate on sequential
I/O. Note that there are times when the transaction log will need to be read as
part of SQL Server operations such as replication, rollbacks, and deferred
updates. SQL Servers that participate in replication should pay particular
attention to making sure that all transaction log files have sufficient disk
I/O processing power because of the reads that need to occur.
There is
additional administration involved with physically separating SQL Server
objects from the rest of their associated database through SQL Server files and
filegroups. For the purposes of investigating very
active tables and indexes, this may be very worthwhile. By separating table or
index away from all other database objects, accurate assessments can be made of
the I/O requirements of that object. This is not as easy to do when all
database objects are placed within one big drive pool. This type of physical
I/O separation may be appropriate during database development and benchmarking
so that database I/O information can be gathered and applied to capacity
planning for the production database server environment.
Here are
the areas of SQL Server activity that can be separated across different hard
drives, RAID controllers, PCI channels (or combinations of the three):
·
Transaction
log files
·
Tempdb
·
Database
files
·
Tables
associated with a lot of query or write activity
·
Nonclustered indexes associated
with a lot of query or write activity
The
physical separation of SQL Server I/O activities is quite convenient with the
use of hardware RAID controllers, RAID hot plug drives, and online RAID
expansion. The approach that provides the most flexibility is arranging the
RAID controllers so that a separate RAID SCSI channel is provided for each of
the separate SQL activities mentioned above. Each RAID SCSI channel should be
attached to a separate RAID hot plug cabinet to take full advantage of online
RAID expansion (if available through the RAID controller). Windows logical
drive letters are associated to each RAID array and SQL Server files may be
separated between distinct RAID arrays based on known I/O usage patterns.
With this
configuration it is possible to relate disk queuing back to a distinct RAID
SCSI channel and its drive cabinet as Performance Monitor reports the queuing
behavior during load testing or heavy production loads. If a RAID controller
and drive array cabinet support online RAID expansion and slots for hot plug
hard drives are available in the cabinet, disk queuing on that RAID array is
resolved by simply adding more drives to that RAID array until Performance
Monitor reports that disk queuing for that RAID array has reached acceptable
levels (less than 2 for SQL Server files). This can be done while SQL Server is
online.
Tempdb is a database
created by SQL Server to be used as a shared working area for a variety of activities,
including temporary tables, sorting, subqueries, and aggregates with GROUP BY
or ORDER BY, queries using DISTINCT (temporary worktables have to be created to
remove duplicate rows), cursors, and hash joins. It is good to enable tempdb I/O operations to occur in parallel to the I/O
operations of related transactions. Because tempdb is
a scratch area and very update intensive, RAID 5 would not be as good a choice
for tempdb as RAID 1or 0+1. Because tempdb is rebuilt every time the database server is
restarted, RAID 0 is a possibility for tempdb on
production SQL Server machines. RAID 0 provides best RAID performance for tempdb with the least number of physical drives. The main
concern with using RAID 0 for tempdb in a production
environment is that SQL Server would need to be stopped and restarted if any
physical drive failure were to occur in the RAID 0 array, and this would not
necessarily be the case if tempdb were placed on a
RAID 1 or 0+1 array.
To move the
tempdb database, use the ALTER DATABASE command to
change the physical file location of the SQL Server logical file name
associated with tempdb. For example, to move tempdb and its associated log to the new file locations
e:\mssql7 and c:\temp, use the following commands:
alter database tempdb modify file (name='tempdev',filename=
'e:\mssql7\tempnew_location.mDF')
alter database tempdb modify file (name='templog',filename=
'c:\temp\tempnew_loglocation.mDF')
The master, msdb, and model databases are not used much during
production compared to user databases, so it is typically not necessary to
consider them in I/O performance tuning considerations. The master database is
normally used just for adding new logins, databases, devices, and other system
objects.
Nonclustered indexes reside in
B-tree structures, which can be separated from their related database tables
with the ALTER DATABASE command. In the example below, the first ALTER DATABASE
creates a filegroup. The second ALTER DATABASE creates
a file with a separate physical location associated with the filegroup. At this point, indexes can be created on the filegroup as illustrated in the following code with the
creation of the index called index1. SP_HELPFILE reports files and filegroups present for a given database. SP_HELP <tablename> has a section in its output, which provides
information on a table's indexes and their filegroup
relationships. For more information, search in SQL Server Books Online for the
strings "ALTER DATABASE" and "sp_helpfile."
alter database testdb add filegroup testgroup1
alter database testdb add file (name = 'testfile',
filename = 'e:\mssql7\test1.ndf') to filegroup testgroup1
create table test1(col1 char(8))
create index index1 on test1(col1) on testgroup1
sp_helpfile
sp_help test1
For more
information, search in SQL Server Books Online for the strings "Files and Filegroups," "Placing Indexes on Filegroups,"" Monitoring Disk Activity,"
"Physical Database Files and Filegroups,"
and "Adding and Deleting Data and Transaction Log Files."
I/O characteristics
of the hardware devices on the server have been discussed. Now the discussion
will move to how SQL Server data and index structures are physically placed on
disk drives. Just enough about these structures will be described so that the
knowledge can apply to disk I/O performance.
SQL Server data and
index pages are both 8 kilobytes in size. SQL Server data pages contain all of
the data associated with a rows of a table, except
text and image data. In the case of text and image data, the SQL Server data
page that contains the row associated with the text/image column will contain a
pointer to a binary tree (or B-tree) structure of one or more 8-KB pages that
contain the text/image data.
SQL Server index
pages contain only the data from columns that comprise a particular index. This
means that index pages effectively compress information associated with many
more rows into an 8-KB page than an 8-KB data page does. An important I/O
performance concept to visualize is that the I/O performance benefit of indexes
comes from this information compression. This is true if the columns
picked to be part of an index forms a relatively low percentage of the rowsize of the table. When an SQL query asks for a set of
rows from a table in which columns in the query match certain values in the
rows, SQL Server can save I/O operations and time by reading the index pages to
look for the values and then access only the rows in the table required to
satisfy the query instead of having to perform I/O operations to scan all rows
in the table to locate the required rows. This is true if the indexes defined
are selected well.
There are two types
of SQL Server indexes, and both are built upon B-tree structures formed out of
8-KB index pages. The difference is at the bottom of the B-tree structures,
which are referred to as the leaf level in SQL Server documentation. The upper
parts of index B-tree structures are referred to as nonleaf
levels of the index. A B-tree structure built for every single index is defined
on a SQL Server table.
Figure 2 illustrates
the structural difference between nonclustered and
clustered indexes. Key points to remember are that in the nonclustered
index case the leaf level nodes contain only the data that participates in the
index, along with pointers to locate quickly the remaining row data on the
associated data page. In the worst-case scenario, each row access from the nonclustered index will require an additional nonsequential disk I/O to retrieve the row data. In a
best-case scenario, many of the required rows will be on the same data page and
thus allow retrieval of several required rows with each data page fetched. In
the clustered index case, the leaf level nodes of the index are the actual data
rows for the table. Therefore, no pointer jumps are required for retrieval of
table data. Range scans based on clustered indexes will perform well because
the leaf level of the clustered index (hence, all rows of that table) is
physically ordered on disk by the columns that comprise the clustered index
and, due to this fact, will perform I/O in 64-KB extents. And hopefully, if
there is not a lot of page splitting on the clustered index B-tree (nonleaf and leaf levels), these 64-KB I/Os
will be physically sequential. The dotted lines indicate that there are other
8-KB pages present in the B-tree structures but not shown.
Figure
2.
Clustered and nonclustered index B-tree structures
There can only be
one clustered index per table. There is a simple physical reason for this.
While the upper parts (commonly referred to in SQL Server documentation as nonleaf levels) of the clustered index B-tree structure are
organized just like the nonclustered index B-tree
structures, the bottom level of the clustered index B-tree are the actual 8-KB
data pages associated with the table. There are two performance implications
here:
1.
Retrieval
of SQL data based on key search with a clustered index requires no pointer jump
(with a likely nonsequential change of location on
the hard disk) to get to the associated data page because the leaf level of the
clustered index is already the associated data page.
2.
The
leaf level of the clustered index is sorted by the columns that comprise the
clustered index. Because the leaf level of the clustered index contains the
actual 8-KB data pages of the table, this means the row data of the entire
table is physically arranged on the disk drive in the order determined by the
clustered index. This provides a potential I/O performance advantage when
fetching a significant number of rows from this table (at least greater than 64
KB) based on the value of the clustered index, because sequential disk I/O is
being used (unless page splitting is occuring on this
table, which will be discussed later in the section titled "Importance of FILLFACTOR and PAD_INDEX"). That is
why it is important to pick the clustered index on a table based on a column
that will be used to perform range scans to retrieve a large number of rows.
Nonclustered indexes are most
useful for fetching few rows with good selectivity from large SQL Server tables
based on a key value. As mentioned before, nonclustered
indexes are binary trees formed out of 8-KB index pages. The bottom, or leaf
level, of the binary tree of index pages contains all the data from the columns
that comprised that index. When a nonclustered index
is used to retrieve information from a table based on a match with the key
value, the index B-tree is traversed until a key match is found at the leaf
level of the index. A pointer jump is made if columns from the table are needed
that did not form part of the index. This pointer jump will likely require a nonsequential I/O operation on the disk. It might even
require the data to be read from another disk, if the table and its
accompanying index B-tree(s) are large in size. If multiple pointers lead to
the same 8-KB data page, less of an I/O performance penalty will be paid
because it is only necessary to read the page into data cache once. For each
row returned for a SQL query that involves searching with a nonclustered
index, one pointer jump is required. These pointer jumps are the reason that nonclustered indexes are better suited for SQL queries that
return only one or a few rows from the table. Queries that require a lot of
rows to be returned are better served with a clustered index.
For more
information, search in SQL Server Books Online for the string "nonclustered index."
A special situation
with nonclustered indexes is the covering index. The
definition of a covering index is a nonclustered
index that is built upon all of the columns required to satisfy an SQL query,
both in the selection criteria and the WHERE predicate. Covering indexes can
save a huge amount of I/O, and hence bring a lot of performance to a query. But
it is necessary to balance the costs of creating a new index (with its
associated B-tree index structure maintenance) against of the I/O performance
gain the covering index will bring. If a covering index will greatly benefit a
query or set of queries that will be run very often on SQL Server, the creation
of that covering index may be worth it.
1.
Select col1,col3 from table1 where col2 = 'value'.
2.
Create index
indexname1 on table1(col2,col1,col3).
–or–
Use Create Index
Wizard in SQL Server Enterprise Manager to create the index. Select Tools/Wizards
from the SQL Server Enterprise Manager menu bar, left-click the + icon
beside Database to expose the database wizards, and then double-click Create
Index Wizard to start the wizard.
The index created,
called "indexname1" in this example, is a covering index because it
includes all columns from the SELECT statement and the WHERE predicate. This
means that during the execution of this query, SQL Server does not need to
access the data pages associated with table1. SQL Server can obtain all of the
information required to satisfy the query by using the index called indexname1.
Once SQL Server has traversed the B-tree associated with indexname1 and found
the range of index keys where col2 is equal to "value," SQL Server
knows that it can fetch all required data (col1,col2,col3) from the leaf level
(bottom level) of the covering index. This provides I/O performance in two
ways:
·
SQL
Server is obtaining all required data from an index page, not a data page, so
the data is more compressed and SQL Server saves disk I/O operations.
·
The
covering index has organized all of the required data by col2 physically on the
disk. This allows the hard drives to return all of the
index rows associated with the where predicate (col2 = "value") in
sequential order. This gives us better I/O performance. In essence, a covering
index, from a disk I/O standpoint, becomes a clustered index for this query and
any other query that can be completely satisfied by the columns in the covering
index.
In general, if the
covering index is small in terms of the number of bytes from all the columns in
the index compared to the number of bytes in a single row of that table and it
is certain that the query taking advantage of the covered index will be
executed frequently, it may make sense to use a covering index. But before
building a lot of covered indexes, consider the next section, which describes how
SQL Server 7.0 can intelligently and automatically create covered indexes for
queries on the fly.
SQL Server 7.0's new
query processor provides index intersection. Index intersection allows the
query processor to consider multiple indexes from a given table, build a hash
table based on those multiple indexes, and utilize the hash table to reduce I/O
for a given query. The hash table that resulted from the index intersection has
become, in essence, a covering index and provides the same I/O performance
benefits that covering indexes do. Index intersection provides greater
flexibility for database user environments in which it is difficult to
predetermine all of the queries that will be run against the database. A good
strategy to follow in this case would be to define single-column, nonclustered indexes on all columns that will be frequently
queried and let index intersection handle situations were a covered index is
needed.
For more
information, search in SQL Server Books Online for the strings "query
tuning recommendations" and "designing an index."
1.
Select col3 from
table1 where col2 = 'value'
2.
Create index
indexname1 on table1(col2)
Create index indexname2
on table1(col3)
–or–
Use Create Index
Wizard in SQL Server Enterprise Manager to create the indexes. Select Tools/Wizards
from the SQL Server Enterprise Manager menu bar, left-click the + icon
beside Database to expose the database wizards, and then double-click Create
Index Wizard to start the wizard.
1.
In
the preceding example, "indexname1" and "indexname2" are nonclustered and single-column indexes created on the SQL
Server table called "table1." When the query is executed, the query
processor recognizes a situation where index intersection using the two indexes
would be advantageous. The query optimizer will automatically hash the two
indexes together to save I/O in executing the query. No query hints were
required for this to happen. Queries that are handled by covering indexes
(whether by explicitly declared covering indexes or index intersection) are
referred to as "covered queries."
How indexes are
chosen significantly affects the amount of disk I/O generated and,
subsequently, performance. The previous sections described why nonclustered indexes are good for retrieval of a small
number of rows and clustered indexes are good for range-scans. Here is some
additional information:
·
Try
to keep indexes as compact (fewest number of columns and
bytes) as possible. This is especially true for clustered indexes
because nonclustered indexes will use the clustered
index as its method for locating row data. For more information, search in SQL
Server Books Online for the strings "using clustered indexes,"
"Index Tuning Recommendations," and "Design an Index."
·
In
the case of nonclustered indexes, selectivity is
important, because if a nonclustered index is created
on a large table with only a few unique values, usage of that nonclustered index will not save I/O during data retrieval.
In fact, using the index would likely cause much more I/O than a sequential
table scan of the table. Some examples of good candidates for a nonclustered index are invoice numbers, unique customer
numbers, social security numbers, and telephone numbers.
·
Clustered
indexes are much better than nonclustered indexes for
queries that match columns or search for ranges of columns that don't have a
lot of unique values because the clustered index physically orders the table
data, allowing for sequential 64-KB I/O on the key values. Some examples of
possible candidates for a clustered index include states, company branches, date of sale, zip codes, and customer district. It would
tend to be a waste to define a clustered index on the columns that just have
unique values unless typical queries on the system fetch large sequential
ranges of the unique values. The key question to ask when trying to pick the
best column on each table to create the clustered index on is, "Will there
be a lot of queries that need to fetch a large number of rows based on the
order of this column?" The answer is very specific to each user
environment. One company may do a lot of queries based on ranges of dates,
whereas another company may do a lot of queries based on ranges of bank
branches.
Samples of WHERE
predicates that benefit from clustered indexes:
WHERE <column_name> > some_value
WHERE <column_name> BETWEEN some_value
AND some_value
WHERE <column_name> < some_value
Clustered index
selection really involves two major decisions: First, determining which column
of the table will benefit most from the clustered index in terms of providing
sequential I/O for range scans. Second, using the clustered
index to affect the physical placement of table data while avoiding hot spots.
A hot spot occurs when data is placed on hard drives such that many queries are
trying to read or write data in the same area of the disk(s) at the same time.
That creates a disk I/O bottleneck, because more concurrent disk I/O requests
are being received by that hard disk than it can handle. Solutions are to
either stop fetching as much data from this disk or to spread the data across
multiple disks to support the I/O demand. This type of consideration for the
physical placement of data can be critical for good concurrent access to data
among hundreds or thousands of SQL Server users.
These two decisions
often conflict with each other and the best overall decision will have to
balance the two. In high user load environments, improved concurrency (by
avoiding hot spots) may often be more valuable than the performance benefit of
placing the clustered index on that column.
In previous versions
of SQL Server, for tables without a clustered index (tables like this are
referred to as heaps) inserted rows would always be placed at the physical end
of the table on disk. This created the possibility of a hot spot at the end of
a very busy table. SQL Server 7.0's new storage management algorithms provide
free space management, which removes this behavior. Now when rows are inserted
in heaps, SQL Server makes use of the PFS pages to quickly locate available
free space somewhere in the table in which the row can be inserted. PFS pages
indicate free space through the table. This recovers deleted space and avoids
insertion hot spots because inserts will be spread through the physical disk
space throughout the table. Free space management affects clustered index
selection. Because clustered indexes affect physical data placement, hot spots
may occur when a clustered index physically sequences based on a column where
many concurrent inserts occur at the highest column value, which will be
located on the same physical disk location. For columns with monotonically
increasing values, be aware how a clustered index on that column will
sequentially order data rows on disk by that column and remember that placing
the clustered index on another column or by not including a clustered index on
the table, this sequential data placement will change to another column or not
take place at all.
Here is a common
scenario to help illustrate clustered index selection. Suppose a table contains
an invoice date column, a unique invoice number column, and other data. Suppose
that about 10,000 new records are inserted into this table every day and that
SQL queries often need to search this table for all records for one week's
worth of data and many users need concurrent access to this table. The invoice
number would not be a good candidate for the clustered index for several
reasons. First, invoice number is unique and users don't tend to search on
ranges of invoice numbers, so placing invoice numbers physically in sequential
order on disk is not likely to be helpful because range scans on invoice
numbers will likely not happen. Second, the values for invoice number likely
increase monotonically (1001, 1002, 1003, and so on). If the clustered index is
placed on invoice number, inserts of new rows into this table will all happen
at the end of the table (beside the highest invoice number)and, therefore, on
the same physical disk location, creating a hot spot.
Next, consider the
invoice date column. To maximize sequential I/O, invoice date would be a good
candidate for a clustered index because users often are searching for one
week's worth of data (about 70,000 rows). But from the concurrency perspective,
invoice date may not be a good candidate for the clustered index. If the
clustered index is placed on an invoice date, all data will tend to be inserted
at the end of the table, given the nature of dates, and a hot spot may occur on
the hard disk that holds the end of the table. Note that the fact that the
insertions happened at the end of the table is somewhat offset by the fact that
10,000 rows are inserted for the same date, so invoice date would be much less
likely to create a hot spot than invoice number. Also, a hardware RAID controller
would help spread out the 10,000 rows across multiple disks, which would also
help minimize the possibility of a insertion hot spot.
There is no perfect
answer to the above scenario. It may be necessary to decide that it is worth
the risk of hot spotting and choose to place the clustered index on invoice
date in order to speed up queries involving invoice date ranges. If this is the
case, carefully monitor disk queuing on the disks associated with this table
and keep in mind that the inserts may queue up behind each other trying to get
at the end of the table. My recommendation in this scenario would be to define
the clustered index on invoice date because of the benefit to range scans based
on invoice date and so that invoice numbers are not physically sequential on
disk.
Let's consider
another example that is more pleasant to work with. Suppose a table consists of
invoice number, invoice date, invoice amount, sales office where the sale
originated from, and other data. Suppose 10,000 records are inserted into this
table every day. In this case, users most often query invoice amounts based on
sales office. Thus, sales office should be the column on which the clustered
index is created because that is what the range on which scans are based. And
because new rows being inserted will likely have a mix of different sales
offices, inserts should be spread evenly across the table and across the disks
on which the table is located.
In some cases, range
scans may not be the important issue. Suppose a very large employee table has
employee number, social security number, and other data. As rows are inserted,
employee number is incremented. Let's assume that there are 100,000 retrievals
from this table every day and that each retrieval is a
single record fetch based on social security number. A nonclustered
index created on social security number would provide excellent query
performance in this scenario. A clustered index on social security number would
provide slightly better query performance than the nonclustered
index but may be a bit of overkill because range scans are not involved. The
question in this case would be whether to bother defining a clustered index on
this table. In previous versions of SQL Server, it was important to always
define a clustered index on a table even if not required for queries because it
helped with deleted row space recovery. This is not an issue with SQL Server
7.0's new space allocation algorithms and storage structures. The
recommendation in this example would be to create the clustered index on social
security number. The reason being that it is good to find a column on the table
that has data distributed in such a way that it does not follow the sequential
pattern of employee number and social security number tends to have a pretty
even distribution. If a clustered index is created on this evenly distributed
column data, the employee records will be evenly distributed on disk. This
distribution, in conjunction with FILLFACTOR and PAD_INDEX, which will be
discussed in the next section, will provide open data page areas throughout the
table to insert data. Assuming that newly inserted employee records will have
an even distribution of social security numbers, the employee table will fill
evenly and page splitting will be avoided. If a column with even distribution
did not exist on the table, it may have been worthwhile to create an integer
column on to the table and populate the column with values that are evenly
distributed, and the clustered index could be created on the column. This
"filler" or "dummy" column with a clustered index defined
on it is not being used for queries, but to distribute data I/O across disk
drives evenly to improve table access concurrency and overall I/O performance.
This can be a very effective methodology with large and heavily accessed SQL
tables. Another possible solution in this example would be to not create a
clustered index on this table. In this case, SQL Server 7.0 manages all aspects
of the space management. SQL Server finds a free space to insert the row,
reuses space from deleted rows, and automatically reorganizes physical ordering
of data pages on disk when it makes sense (to allow greater amounts of
sequential I/O). The reorganization of data pages happens during database file autoshrink operations. For more information, search in SQL
Server Books Online for the strings "Managing Space Used by Objects"
and "Space Allocation and Reuse."
Another way to think
about hot spots is within the context of selects. If many users are selecting
data with key values that are very close to but not in the same actual row as
each other, a majority of disk I/O activity will tend to occur within the same
physical region of the disk I/O subsystem. This disk I/O activity can be spread
out more evenly by defining the clustered index for this table on a column that
will spread these key values evenly across the disk. If all selects are using
the same unique key value, using a clustered index will not help balance the
disk I/O activity of this table. Use of RAID (either hardware or software)
would help alleviate this problem as well by spreading the I/O across many disk
drives. The type of behavior described here can be viewed as disk access
contention. It is not locking contention.
If a SQL Server
database will be experiencing a large amount of insert activity, it is
important to plan for providing and maintaining open space on index and data
pages to prevent page splitting. Page splitting occurs when an index page or
data page can no longer hold any new rows and a row needs to be inserted into
the page because of the logical ordering of data defined in that page. When
this occurs, SQL Server needs to divide up the data on the full page and move
about half of the data to a new page so that both pages now have some open
space. This consumes some system resources and time.
When indexes are
initially built, SQL Server places the index B-tree structures on contiguous
physical pages, which allows for optimal I/O performance scanning the index
pages with sequential I/O. When page splitting occurs and new pages need to be
inserted into the logical B-tree structure of the index, SQL Server must
allocate new 8-KB index pages. This occurs somewhere else on the hard drive and
will break up the physically sequential nature of the index pages. This
switches I/O operations from sequential to nonsequential
and cuts performance in half. Excessive amounts of page splitting should be
resolved by rebuilding the index to restore the physically sequential order of
the index pages. This same behavior can be encountered on the leaf level of the
clustered index, thereby affecting the data pages of the table.
In Performance
Monitor, keep an eye on "SQL Server: Access Methods – Page Splits." Nonzero values for this counter indicates that page
splitting is occurring and that further analysis should be done with DBCC
SHOWCONTIG. For details on how to use this command, search for "DBCC
SHOWCONTIG" in SQL Server Books Online.
The DBCC SHOWCONTIG
is a very helpful command that can be used to reveal whether excessive page
splitting has occurred on a table. Scan Density is the key indicator that DBCC
SHOWCONTIG provides. It is good for this value to be as close to 100 percent as
possible. If this value is well below 100 percent, rebuild the clustered index
on that table using the DROP_EXISTING option to defragment the table. The
DROP_EXISTING option of the CREATE INDEX statement permits re-creation of
existing indexes and provides better index rebuild performance than dropping
and re-creating the index. For details, search in SQL Server Books Online for
the strings "Create Index" and "rebuilding an index."
The FILLFACTOR
option on the CREATE INDEX and DBCC REINDEX commands provide a way to specify
that percentage of open space to leave on index and data pages. The PAD_INDEX
option for CREATE INDEX applies what has been specified for FILLFACTOR on the nonleaf level index pages. Without the PAD_INDEX option,
FILLFACTOR mainly affects the leaf level index pages of the clustered index. It
is a good idea to use the PAD_INDEX option with FILLFACTOR. More information,
search in SQL Server Books Online for "page split" and "pad_index."
The optimal value to
specify for FILLFACTOR depends upon how much new data will be inserted within a
given time frame into an 8-KB index and data page. It is important to keep in
mind that SQL Server index pages typically contain many more rows than data
pages because index pages only contain the data for columns associated with
that index, whereas data pages hold the data for the entire row. Also bear in
mind how often there will be a maintenance window that will permit the
rebuilding of indexes to avoid page splitting. Strive toward rebuilding the
indexes only as the majority of the index and data pages have become filled
with data. Part of what allows this to happen is the proper selection of
clustered index for a given table. If the clustered index distributes data
evenly so that new row inserts into the table happen across all of the data
pages associated with the table, the data pages will fill evenly. Overall, this
will provide more time before page splitting starts to occur and it is
necessary to rebuild the clustered index. The other part of the decision is the
FILLFACTOR, which should be selected partly on the estimated number of rows
that will be inserted within the key range of an 8-KB page for a given time
frame and how often scheduled index rebuilds can occur on the system.
This is another
situation in which a judgement call must be made, based on the performance
tradeoffs between leaving a lot of open space on pages versus page splitting.
If a small percentage for FILLFACTOR is specified, it will leave large open
spaces on the index and data pages. This helps avoid page splitting but will also
negate some of performance effect of compressing data onto a page. SQL Server
performs faster if more data is compressed on index and data pages because it
can generally fetch more data with fewer pages and I/Os
if the data is more compressed on the pages. Specifying too high a FILLFACTOR
may leave too little open space on pages and allows pages to overflow too
quickly, causing page splitting.
Before using
FILLFACTOR and PAD_INDEX, remember that reads tend to far outnumber writes,
even in an online transaction processing (OLTP) system. Using FILLFACTOR will
slow down all reads, because it spreads tables over a wider area (reduction of
data compression). Before using FILLFACTOR and PAD_INDEX, it is a good idea to
use Performance Monitor to compare SQL Server reads to SQL Server writes and to
only use these options if writes are a substantial fraction of reads (say, more
than 30 percent).
If writes are a
substantial fraction of reads, the best approach in a very busy OLTP system is
to specify as high a FILLFACTOR as feasible that will leave a minimal amount of
free space per 8-KB page but still prevent page splitting and still allow the
SQL Server to reach the next available time window for rebuilding the index.
This methodology balances I/O performance (keeping the pages as full as
possible) and page splitting avoidance (not letting pages overflow). This may
take some experimentation with rebuilding the index with varying FILLFACTOR and
then simulating load activity on the table to validate an optimal value for
FILLFACTOR. Once the optimal FILLFACTOR value has been determined, automate the
scheduled rebuilding of the index as a SQL Server Task. For More information on
automating tasks, search in SQL Server Books Online for the string
"creating a task."
In the situation
where there will be no write activity into the SQL Server database, FILLFACTOR
should be set at 100 percent so that all index and data pages are filled
completely for maximum I/O performance.
For the purposes of
illustrating the use of the SQL Server performance tools, use the following
example. First, the following table is constructed:
create table testtable (nkey1 int identity,
col2 char(300) default 'abc', ckey1 char(1))
Next, the table is
loaded with 10,000 rows of test data. The data being loaded into column nkey1
lends itself to a nonclustered index. The data in
column ckey1 lends itself to a clustered index and the data in col2 is merely
filler to increase the size of each row by 300 bytes:
declare @counter int
set @counter = 1
while (@counter <=
2000)
begin
insert testtable
(ckey1) values ('a')
insert testtable
(ckey1) values ('b')
insert testtable
(ckey1) values ('c')
insert testtable
(ckey1) values ('d')
insert testtable
(ckey1) values ('e')
set @counter = @counter
+ 1
end
The following two
queries make up the database server workload:
select ckey1,col2 from testtable where ckey1 = 'a'
select nkey1,col2 from testtable where nkey1 = 5000
SQL Server Profiler
records detailed information about activity occurring on the database server.
Profiler can be configured to watch and record one or many users executing
queries on SQL Server with a widely configurable amount of performance
information. Performance information available for recording in Profiler
include I/O statistics, CPU statistics, locking requests, T-SQL and RPC
statistics, index and table scans, warnings and errors raised, database object
create/drop, connection connect/disconnects, stored procedure operations,
cursor operation, and more. For full information on what SQL Profiler can
record, search in SQL Server Books Online for the string "Profiler."
Profiler and Index
Tuning Wizard provide a very powerful tool combination to help database
administrators create proper indexes on tables. Profiler records resource
consumption for queries into a .trc file. This .trc file can be read by Index Tuning Wizard, which will
consider both the .trc information and the database
tables and will then provide recommendations about any indexes that should be
created. Index Tuning Wizard provides a choice of automatically creating the
proper indexes for the database, scheduling the automatic index creation for a
later time, or generating a T-SQL script that can be reviewed and executed at a
later time.
Here are the steps
for analyzing a query load:
Set up Profiler
1.
Start Profiler from
SQL Server Enterprise Manager by selecting Tools/SQL Server Profiler
from the menu.
2.
Press CTRL+N to
create a new Profiler trace.
3.
Type in a name for
the trace.
4.
Select the Capture
to File: check box and select a .trc file to output the Profiler information to.
5.
Click OK.
Run the workload
1.
Start
Query Analyzer (either from SQL Server Enterprise Manager by selecting Tools/SQL
Server Query Analyzer from the menu or from the Start menu by
selecting Start\Programs\Microsoft SQL Server 7.0\Query Analyzer).
2.
Connect
to SQL Server and set the current database to be where the table was created.
3.
Type
or copy and paste the following queries into the query window of Query
Analyzer:
select ckey1,col2 from testtable where ckey1 = 'a'
select nkey1,col2 from testtable where nkey1 = 5000
4.
Press
CTRL+E to execute the two queries.
Stop Profiler
·
Click
the red square to stop the Profiler trace.
Load the .trc file into Index Tuning Wizard
1.
Start
Index Tuning Wizard from Profiler by selecting Tools\Index Tuning Wizards… from the menu. Click Next.
2.
Select
the database to be analyzed. Click Next.
3.
Leave
the I have a saved workload file option button selected and click Next.
4.
Select
the My workload file option button,
locate the .trc file created with Profiler, and then
click Next.
5.
In
the Select Tables to Tune dialog box, select the tables that need to be
analyzed and then click Next.
6.
Index
Tuning Wizard will indicate proper indexes to create in the Index
Recommendations dialog box. Click Next.
7.
The
wizard provides the choice of either creating the indexes
immediately, scheduling the index creation, an automated task for a later point
in time, or creating a T-SQL script with the commands to create the indexes.
Select the preferred option and then click Next.
8.
Click
Finish.
/* Created by: Index
Tuning Wizard */
/* Date: 9/7/98 */
/* Time: 6:42:00 PM
*/
/* Server: HENRYLNT2
*/
/* Database : test */
/* Workload file : E:\mssql7\Binn\profiler_load.sql */
USE [test]
BEGIN TRANSACTION
CREATE CLUSTERED
INDEX [testtable2] ON [dbo].[testtable] ([ckey1])
if (@@error <>
0) rollback transaction
CREATE NONCLUSTERED
INDEX [testtable1] ON [dbo].[testtable] ([nkey1])
if (@@error <>
0) rollback transaction
COMMIT TRANSACTION
The indexes
recommended by Index Tuning Wizard for the sample table and data are what we
would have expected. There are only five unique values for ckey1 and 2,000 rows
of each value. Given that one of the sample queries (select ckey1, col2 from testtable where ckey1 = 'a') requires retrieval from the
table based on one of the values in ckey1, it makes sense to create a clustered
index on the ckey1 column. The second query (select nkey1, col2 from testtable where nkey1 = 5000) fetches one row based on the
value of the column nkey1. Nkey1 is unique and there are 10,000 rows, so it
makes sense to create a nonclustered index on this
column.
The Profiler/Index
Tuning Wizard combination becomes very powerful in real database server
environments, where there are many tables and many queries involved. Use
Profiler to record a .trc file while the database
server is experiencing a representative set of queries. Then load the .trc file into Index Tuning Wizard to determine the proper
indexes to build. Follow the prompts in Index Tuning Wizard to automatically
generate and schedule index creation jobs to run at off-peak times. Run
Profiler/Index Tuning Wizard regularly (perhaps weekly) to see if queries being
executed on the database server have changed significantly, thus possibly
requiring different indexes. Regular use of Profiler/Index Tuning Wizard helps
database administrators keep SQL Server running in top form as query workloads
change and database size increase over time.
For more
information, search in SQL Server Books Online for "Index Tuning
Wizard" and "Index Tuning Recommendations."
Another option that
Profiler provides is to log information into a SQL Server table. Once
completed, the table can be queried to determine if certain queries are using
up excessive resources.
Log Profiler
information into a SQL Server table
1.
Start
Profiler from SQL Server Enterprise Manager by selecting Tools/SQL Server
Profiler from the menu.
2.
Press
CTRL+N to create a new Profiler trace.
3.
Type
in a name for the trace.
4.
Click
the Capture to Table: check box and select a SQL Server table to output
the Profiler information to.
5.
Click
OK.
6.
When
finished, stop the Profiler trace by clicking the red square.
After the
information is recorded into the SQL Server table, Query Analyzer can be used
to figure out which queries on the system are consuming the most resources.
This way, database administrators can concentrate their time on improving the
queries that need the most help. For example, the following query would be
typical of the type of analysis done on data recorded from Profiler into a SQL
Server table. The query retrieves the top three consumers of CPU resources on
the database server. Read and write I/O information, along with the duration of
the queries in milliseconds, is returned as well. If a large amount of
information is recorded with Profiler, it makes sense to create indexes on the
table to help speed analysis queries. For example, if CPU is going to be an important criteria for analyzing this table, it would be
a good idea to create a nonclustered index on CPU
column:
select top 3 TextData,CPU,Reads,Writes,Duration from profiler_out_table
order by cpu desc
For more
information, search in SQL Server Books Online for the strings "Viewing
and Analyzing Traces," "Troubleshooting SQL Server Profiler,"
"Tips for Using SQL Server," "Common SQL Server Profiler
Scenarios," "Starting SQL Server Profiler," and "Monitoring
with SQL Server Profiler."
Query Analyzer
provides a Show stats I/O option on the General tab of the Connections
Options dialog box. Select this check box to obtain information about how
much I/O is being consumed for the query just executed in Query Analyzer.
For example, the
query "select ckey1, col2 from testtable where
ckey1 = 'a'" returns the following I/O information in addition to the resultset when the Show stats I/O option is
selected:
Table
'testtable'. Scan count 1, logical reads 400,
physical reads 382, read-ahead reads 400.
Similarly, the query
"select nkey1, col2 from testtable where nkey1 =
5000" returns the following I/O information in addition to the resultset when the Show stats I/O option is
selected:
Table
'testtable'. Scan count 1, logical reads 400,
physical reads 282, read-ahead reads 400.
Using statistics I/O
is a great way to monitor the effect of query tuning. For example, create the
two indexes on this sample table as recommended above by Index Tuning Wizard
and then run the queries again.
In the case of the
query "select ckey1,col2 from testtable
where ckey1 = 'a'", the clustered index improved performance as indicated
below. Given that the query needs to fetch 20 percent of the table, the
performance improvement seems reasonable:
Table
'testtable'. Scan count 1, logical reads 91,
physical reads 5, read-ahead reads 32.
In the case of the
query "select nkey1,col2 from testtable
where nkey1 = 5000", the creation of the nonclustered
index had a very dramatic effect on the performance of the query. Given that
only one row of the 10,000-row table needs to be retrieved for this query, the
performance improvement with the nonclustered index
seems reasonable:
Table
'testtable'. Scan count 1, logical reads 5,
physical reads 0, read-ahead reads 0.
ShowPlan can be used to
focus attention on problematic SQL queries by displaying detailed information
on what the query optimizer is doing. SQL Server 7.0 provides both text and
graphical versions of ShowPlan. Graphical ShowPlan output can be displayed in the Results pane of
Query Analyzer by executing a SQL query with CTRL+L. Icons indicate the
operations that the query optimizer would have performed if it had executed the
query. Arrows indicate the direction of data flow for the query. Details about
each operation can be displayed by holding the mouse pointer over the operation
icon. The equivalent information can be displayed in text-based ShowPlan by executing the command, "set showplan_all on." For reduced output from text-based ShowPlan that displays the query optimizer operations but
skips the display of operation details, execute the command "set showplan_text on."
For more
information, search in SQL Server Books Online for the strings "Graphical ShowPlan," "Using ShowPlan
to Monitor a Database Query," "worktables," and
"Understanding Nested Loops Joins."
Using the example
queries defined earlier and "set showplan_text
on" executed in Query Analyzer:
Query:
select ckey1,col2 from testtable where ckey1 = 'a'
Text-based ShowPlan output:
|--Clustered Index
Seek(OBJECT:([test].[dbo].[testtable].[testtable2]),
SEEK:([testtable].[ckey1]='a') ORDERED)
The preceding query
takes advantage of the clustered index on column ckey1 as indicated by
"Clustered Index Seek."
Equivalent Graphical
ShowPlan output:
Figure
3.
Graphical ShowPlan output for query taking advantage
of a clustered index
If the clustered
index is removed from the table, the query needs to use a table scan. The
following ShowPlan output indicates the change in
behavior.
Text-based ShowPlan output:
|--Table Scan(OBJECT:([test].[dbo].[testtable]), WHERE:([testtable].[ckey1]='a'))
Equivalent Graphical
ShowPlan output:
Figure
4.
Graphical ShowPlan output for query performing a
table scan
Note that table
scans on small tables are nothing to worry about. Table scans are the most
efficient way to retrieve information from small tables. But on larger tables,
table scans indicated by ShowPlan are a warning that
the table may need better indexes or that the existing indexes need to have
their statistics updated (this can be done using the UPDATE STATISTICS
command). SQL Server 7.0 provides automatically updating indexes. It is a good idea to let SQL Server automatically maintain index
statistics because it helps guarantee that queries will always have good index
statistics to work with.
Query:
select nkey1,col2 from testtable where nkey1 = 5000
Text-based ShowPlan output:
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([test].[dbo].[testtable]))
|--Index
Seek(OBJECT:([test].[dbo].[testtable].[testtable1]),
SEEK:([testtable].[nkey1]=5000) ORDERED)
Equivalent Graphical
ShowPlan output:
Figure
5.
Graphical ShowPlan output for query taking advantage
of a nonclustered index, Part 1
Figure
6.
Graphical ShowPlan output for query taking advantage
of a nonclustered index, Part 2
The preceding query
uses the nonclustered index on the column nkey1. This
is indicated by the "Index Seek" operation on the column nkey1. The
"Bookmark Lookup" operation indicates that SQL Server needed to
perform a pointer jump from the index page to the data page of the table to
retrieve the requested data. The pointer jump was required because the query
asked for the column col2, which was not part of the nonclustered
index.
Query:
select nkey1 from testtable where nkey1 = 5000
Text-based ShowPlan output:
|--Index Seek(OBJECT:([test].[dbo].[testtable].[testtable1]), SEEK:([testtable].[nkey1]=[@1])
ORDERED)
Equivalent Graphical
ShowPlan output:
Figure
7.
Graphical ShowPlan output for a covered query
The preceding query
uses the nonclustered index on nkey1 as a covering
index. Note that no "Bookmark Lookup" operation was needed for this
query. This is because all of the information required for the query (both
SELECT and WHERE clauses) are provided by the nonclustered
index. This means that no pointer jumps to the data pages are required from the
nonclustered index pages. I/O is reduced in
comparison to the case where a bookmark lookup was required.
Performance Monitor
provides a wealth of information about what is happening on the database server
in terms of Windows and SQL Server operations. For SQL Server specific
counters, search in SQL Server Books Online for the string combination
"SQL Server:" and "object."
In Performance
Monitor graph mode, take note of the Max and Min values. Don't put too much
emphasis on the average, because heavily polarized data points throw this off.
Study the graph shape and compare to Min/Max to gather an accurate feel for the
behavior. Use the <BACKSPACE> key to highlight counters with a white
line.
It is possible to
use Performance Monitor to log all available Windows NT and SQL Server
performance monitor objects/counters in a log file while at the same time
looking at Performance Monitor interactively (chart mode). The setting of
sampling interval determines how quickly the logfile
grows in size. Logfiles can get pretty big very fast
(for example, 100 megabytes in one hour with all counters turned on and a
sampling interval of 15 seconds). Hopefully, on the test server there will be a
couple of gigabytes free to store these types of files. But if conserving space
is important, try running with a large log interval so that Performance Monitor
does not sample the system as often. Try 30 or 60 seconds. This way all of the
counters are resampled with reasonable frequency but a smaller logfile size is maintained.
Performance Monitor
also consumes a small amount of CPU and disk I/O resources. If a system doesn't
have a lot of disk I/O and/or CPU to spare, consider running Performance
Monitor from another machine to monitor the SQL Server over the network (graph
mode only—it tends to be more efficient to log perfmon
information locally on the SQL Server as compared to sending the information
across a local area network, or LAN) or maybe reduce the logging to only the
most critical counters.
It is a good
practice to log all counters available during performance test runs into a file
for later analysis. That way any counter can be examined further at a later
time. Configure Performance Monitor to log all counters into a logfile and at the same time monitor the most interesting
counters in one of the other modes, like graph mode. This way, all of the
information is recorded but the most interesting counters are presented in an
uncluttered Performance Monitor graph while the performance run is taking
place.
Starting the logging
feature
1.
Open
Performance Monitor.
2.
Select
View/Log from the menu.
3.
Click
the button with the + symbol on it.
4.
A
simple method to log all counters is to left-click the first object in the Add
to Log dialog box.
5.
Hold
down the SHIFT key and use the PAGE DOWN key to highlight all counters.
6.
Click
Add.
7.
Click
Done.
8.
Select
Options/Log from the menu.
9.
In
the space titled File Name: select or create a file name for the performance
information to be logged into.
10.
Click Start Log.
Stopping the logging
feature
1.
Select Options/Log
from the menu.
2.
Click Stop Log.
Load the logged
information into Performance Monitor for analysis
1.
Select
View/Log from the menu.
2.
Select
Options/Data From… from the menu.
3.
Click
the Log File: option button.
4.
Click
the button labeled "Ă‚Â…" and use the Open Input Log File
file browsing window to locate and open the log file.
Double-click the file name.
5.
Click
OK.
6.
Click
the button with the + symbol on it.
7.
Use
the Add to Chart dialog box to add desired counters to the graphical
display. Select the object/counter combination to be added and then click Add.
This feature is very
handy for observing what the database server was doing for a given time period:
1.
Select
and display desired objects/counters in Performance Monitor chart mode using
the preceding instructions.
2.
Select
Edit/Time Window from the menu.
3.
The
Input Log File Timeframe dialog box should appear. By left-clicking and
holding down the left mouse button on the timewindow slidebars provided, it is possible to adjust the start and
stop timewindow of the logged data to be displayed on
the Performance Monitor chart.
4.
Click
OK to reset the chart to only display data logged for the selected timewindow.
·
(Physical or Logical) Disk Queue > 2
This section
requires observation of several Performance Monitor disk counters. In order to
enable these counters, run the command "diskperf
–y" from a Windows NT command window and restart Windows NT.
Physical hard drives
that are experiencing disk queuing will hold back disk I/O requests while they
catch up on I/O processing. SQL Server response time will be degraded for these
drives. This costs query execution time.
If using RAID, it is
necessary to know how many physical hard drives are associated with each drive
array that Windows NT sees as a single physical drive in order to calculate
disk queuing per physical drive. Ask a hardware expert to explain the SCSI
channel and physical drive distribution in order to understand how SQL Server
data is held by each physical drive and how much SQL Server data is distributed
on each SCSI channel.
There are several
choices for looking at disk queuing through Performance Monitor. Logical disk
counters are associated with the logical drive letters assigned through Disk
Administrator, whereas physical disk counters are associated with what Disk
Administrator sees as a single physical disk device. Note that what looks to
Disk Administrator like a single physical device may either be a single hard
drive or a RAID array, which consists of several hard drives. Current Disk
Queue is an instantaneous measure of disk queuing whereas Average Disk Queue
averages the disk queuing measurement over the Performance Monitor sampling
period. Take note of any counter where Logical Disk: Average Disk Queue > 2,
Physical Disk: Average Disk Queue > 2, Logical Disk: Current Disk Queue >
2, or Physical Disk: Average Disk Queue > 2.
These recommended
measurements are specified per physical hard drive. If a RAID array is
associated with a disk queue measurement, the measurement needs to be divided
by the number of physical hard drives in the RAID array to determine the disk
queuing per physical hard drive.
Note On
physical hard drives or RAID arrays that hold SQL Server log files, disk
queuing is not a useful measure because SQL Server Log Manager does not queue
more than a single I/O request to SQL Server logfile(s).
For more
information, search in SQL Server Books Online for the string "monitoring
disk activity."
·
System: Processor Queue Length > 2 (per CPU)
This means that
server's processors are receiving more work requests than they can handle as a
collective group. Therefore, Windows needs to place these requests in a queue.
Some processor
queuing is actually an indicator of good overall SQL Server I/O performance. If
there is no processor queuing and if CPU utilization is low, it may be an
indication that there is a performance bottleneck somewhere else in the system,
the most likely candidate being the disk subsystem. Having a reasonable amount
of work in the processor queue means that the CPUs are not idle and the rest of
the system is keeping pace with the CPUs.
A general rule of
thumb for a good processor queue number is to multiply the number of CPUs on
the database server by 2.
Processor queuing
significantly above this calculation needs to be investigated. Excessive
processor queuing costs query execution time. Several different activities
could be contributing to processor queuing. Eliminating hard and soft paging
will help save CPU resources. Other methodologies that help reduce processor
queuing include SQL query tuning, picking better SQL indexes to reduce disk I/O
(and, hence, CPU), or adding more CPUs (processors) to the system.
·
Hard Paging—Memory: Pages/sec > 0 or Memory: Page
Reads/sec > 5
Memory: Pages/sec
> 0 or Memory: Page Reads/sec > 5 mean that
Windows is going to disk to resolve memory references (hard page fault). This
costs disk I/O + CPU resources. Memory: Pages/sec is a good indicator of the
amount of paging that Windows is performing and the adequacy of the database
server's current RAM configuration. A subset of the hard paging information in
Performance Monitor is the number of times per second Windows had to read from
the paging file to resolve memory references, which is represented by Memory:
Pages Reads/sec. If Memory: Pages Reads/sec > 5, this is bad for
performance.
Automatic SQL Server
memory tuning will do its best to adjust SQL Server memory utilization
dynamically such that paging is avoided. A small amount of pages per second is
normal, but excessive paging requires corrective action.
If SQL Server is
automatically tuning memory, adding more RAM or removing other applications
from the database server are possible options to help bring Memory: Pages/sec
to a reasonable level.
If SQL Server memory
is being manually configured on the database server, it may be necessary to
reduce memory given to SQL Server, remove other applications from the database
server, or add more RAM to the database server.
Keeping
Memory: Pages/sec at or close to zero helps database server performance. It means Windows
and all its applications (this includes SQL Server) are not going to the paging
file to satisfy any data in memory requests, so the amount of RAM on the server
is sufficient. If Pages/sec is greater than zero by a small amount, this is okay, but remember that a relatively high
performance penalty (disk I/O) is paid every time data is retrieved from the
paging file versus RAM.
It is worth taking a
moment to understand the difference between "Memory: Pages Input/sec"
and "Memory: Pages Reads/sec." "Memory: Pages Input/sec"
indicates the actual number of Windows 4-KB pages being brought in from disk to
satisfy page faults. "Memory: Pages Reads/sec" indicates how many
disk I/O requests are made per second in order to satisfy page faults, which
provides a slightly different point of view of the faulting that is occurring.
So, a single page read could contain several Windows 4-KB pages. Disk I/O
performs better as the packet size of data increases (64 KB or greater), so it
may be worthwhile to consider both of these counters at the same time. It is
also important to remember that for a hard disk, completing a single read or write of 4 KB is almost as expensive in terms of time spent
as a single read or write of 64 KB. Consider the following situation: 200 page
reads consisting of eight 4-KB pages per read could conceivably finish faster
than 300 page reads consisting of a single 4-KB page. And note that we are
comparing 1,600 4-KB page reads finishing faster than 300 4-KB page reads. The
key fact here is applicable to all disk I/O analysis: Don't just watch the
number of Disk Bytes/sec, also watch Disk Transfers/sec because both are
relevant. This will be discussed further in the disk I/O sections to follow.
It is useful to
compare "Memory: Page Input/sec" to "Logical Disk: Disk
Reads/sec" across all drives associated with the Windows NT paging file,
and "Memory: Page Output/sec" to "Logical Disk: Disk
Writes/sec" across all drives associated with the Windows paging file,
because they provide a measure of how much disk I/O is strictly related to
paging versus other applications (that is, SQL Server). Another easy way to
isolate paging file I/O activity is to make sure that the paging file is located
on a separate set of drives from all other SQL Server files. Separating the
paging file away from the SQL Server files can also help disk I/O performance
because it allows disk I/O associated with paging to be performed in parallel
to disk I/O associated with SQL Server.
·
Soft Paging—Memory: Pages Faults/sec > 0
Memory: Pages
Faults/sec > 0 indicates that Windows NT is paging but includes both hard
and soft paging within the counter. In the previous section, we discussed hard
paging. Soft paging means that there are application(s) on the database server
that are requesting memory pages still inside RAM but outside of Windows
Working Set. Memory: Page Faults/sec is helpful for deriving the amount of soft
paging that is occurring. There is no counter called Soft Faults per second.
Instead, calculate the number of soft faults happening per second through the
computation:
"Memory: Pages
Faults/sec" - "Memory: Pages Input/sec" = Soft Page Fault/sec
To determine if SQL
Server rather than another process is causing excessive paging, monitor the
Process: Page Faults/sec counter for the SQL Server process and note whether
the number of page faults per second for sqlserver.exe is similar to the number
of Memory: Pages/sec.
Soft faults
generally are not as bad as hard faults for performance because they consume
CPU resources. Hard faults consume disk I/O resources. The best environment for
performance is to have no faulting of any kind.
Note that until SQL
Server actually accesses all of its data cache pages for the first time, the
first access to each page will cause a soft fault. So do not be concerned with
initial soft faulting occurring as SQL Server first starts up and the data
cache is first being exercised.
For more information
on memory tuning, search for the string "monitoring memory usage" in
SQL Server Books Online.
Keep all of the
server's processors busy to maximize performance but not so busy that processor
bottlenecks occur. The performance tuning challenge is that if CPU is not the
bottleneck, something else is the bottleneck (a primary candidate being the
disk subsystem), so the CPU is being wasted; CPU is usually the hardest
resource to expand (above some configuration specific level, such as four or
eight on many current systems), so it should be seen as a good sign that CPU
utilization is more than 95 percent. At the same time, the response time of
transactions should be monitored to ensure they are within reason; if not,
>95 percent CPU usage may simply mean that the workload is just too much for
the available CPU resources and either CPU has to be increased or workload has
to be reduced or tuned.
Look at the
Performance Monitor counter "Processor: Processor Time %" to make
sure all processors are consistently below 95 percent utilization on each CPU.
"System:Processor
Queue" is the processor queue for all CPUs on a Windows NT system. If
"System: Processor Queue" is greater than 2 per CPU, it indicates a
CPU bottleneck. When a CPU bottleneck is detected, it is necessary to either
add processors to the server or reduce the workload on the system. Reducing
workload could be accomplished by query tuning or improving indexes to reduce
I/O and, subsequently, CPU usage.
Another Performance
Monitor counter to watch when a CPU bottleneck is suspected is "System:
Context Switches/sec" because it indicates the number of times per second
that Windows NT and SQL Server had to change from executing on one thread to
executing on another. This costs CPU resources. Context switching is a normal
component of a multithreaded, multiprocessor environment, but excessive context
switching will bog down a system. The approach to take is to only worry about
context switching if there is processor queuing. If processor queuing is
observed, use the level of context switching as a gauge when performance tuning
SQL Server. Consider using the lightweight pooling option so that SQL Server
switches to a fiber-based scheduling model versus the default thread-based
scheduling model. Think of fibers as lightweight threads. Use the command sp_configure 'lightweight pooling',1
to enable fiber-based scheduling. Watch processor queuing and context switching
to monitor the effect.
DBCC SQLPERF
(THREADS) provides more information about I/O, memory, and CPU usage mapped
back to spids. Execute the following SQL query to
take a survey of current top consumers of CPU time: "select * from master.sysprocesses order by cpu desc."
"Disk Write
Bytes/sec" and "Disk Read Bytes/sec" counters provide an idea of
the data throughput in terms of bytes per second per logical drive. Weigh these
numbers carefully along with "Disk Reads/sec" and "Disk
Writes/sec." Don't let a low amount of bytes per second lead you to believe
that the disk I/O subsystem is not busy! Remember that a single hard drive is
capable of supporting a total of 75 nonsequential and
150 sequential disk reads and disk writes per second.
Monitor the
"Disk Queue Length" for all drives associated with SQL Server files
and determine which files are associated with excessive disk queuing.
If Performance
Monitor indicates that some of the drives are not as busy as others, there is
the opportunity to move SQL Server files from drives that are bottlenecking to
drives that are not as busy. This will help spread disk I/O activity more
evenly across hard drives. If one large drive pool is being used for SQL Server
files, the resolution to disk queuing is to make the I/O capacity of the pool
bigger by adding more physical drives to the pool.
Disk queuing may be
a symptom that one SCSI channel is being saturated with I/O requests.
Performance Monitor cannot directly detect if this is the case. Hardware
vendors may be able to provide tools to help detect the amount of I/O being
serviced by a RAID controller and whether the controller is queuing I/O
requests. This would be more likely to occur if many disk drives (10 or more)
are attached to the SCSI channel and they are all performing I/O at full speed.
In this case, the solution would be to take half of the disk drives and connect
them to another SCSI channel or RAID controller to balance that I/O. Typically,
rebalancing drives across SCSI channels requires a rebuild of the RAID arrays
and full backup/restore of the SQL Server database files.
Figure 8 indicates
some typical counters that Performance Monitor can be used to observe. Note
that Processor Queue Length is the current counter being observed. The
<BACKSPACE> key was pressed in order to highlight the current counter in
bright white. This helps to distinguish the current counter from other counters
being observed and can be particularly helpful when observing many counters at
the same time with Performance Monitor.
Note that the Max
value for Processor Queue Length was 22.000. Max, Min, and Average values for
the Performance Monitor Graph cover just the current time window for the graph
as indicated by Graph Time. In the default case, Graph Time covers 100 seconds.
To monitor longer periods of time and to be sure to get representative
Max, Min, and Average values for those time periods, use the logging feature of
Performance Monitor.
The shape of the
Processor Queue graph line indicates that the Max of 22 only occurred for a
short period of time. But there is a period of time preceding the 22 value
where Processor Queue Length is greater than 5. (This is determined by looking
at the graph knowing that 100 percent is 22 and noting that there is a period
of time prior to the 22 value where the graph has values of more than 25
percent, which is approximately 5.) In this example, the database server named
\\HENRYLNT2 only has one processor, and should not sustain Processor Queue
Length greater than 2. Therefore, Performance Monitor is indicating that the
processor on this machine is being overtaxed at times and that either further
investigation be made into reducing the load on the processor or more
processors be added to \\HENRYLNT2 to adequately handle these periods of higher
processor workloads.
Figure
8.
Performance Monitor graph output
Database programmers
that do SQL work with easy-to-use interfaces like the ADO/RDO/DAO database APIs
still have a responsibility to stay very aware of the resultsets
they are building. ADO/RDO/DAO provide programmers
with great database development interfaces that allow rich SQL rowset functionality without requiring a lot of SQL
programming experience. But this comes at a cost. Programmers can expose
themselves to performance problems if they do not take into careful account the
amount of data their application is returning to the client and stay aware of
where the SQL Server indexes are placed and how the SQL Server data is
arranged. SQL Profiler, Index Tuning Wizard, and ShowPlan
are very helpful tools for pinpointing and fixing these problem queries.
Look for
opportunities to reduce the size of the resultset
being returned, by eliminating columns in the select list that do not need to
be returned, or returning only the required rows. This helps to reduce I/O and
CPU consumption.
For more
information, search in SQL Server Books Online for the strings "Optimizing
Application Performance Using Efficient Data Retrieval,"
"Understanding and Avoiding Blocking," and "Application
Design."
If applications
accessing SQL Server are architected so that transactions access tables in the
same chronological order across all user transactions, deadlocking will be
avoided. It is worthwhile to clearly explain this concept of chronological
table access to SQL application developers as early as possible during the
application design process. It will help avoid deadlocking problems that will
be more expensive to solve later on.
Reduce SQL query I/O
and shorten transaction time: This is a roundabout way to prevent deadlocking
and something that should be done with all queries anyway, but it may help
because by making queries faster, lock resources are held for a shorter period
of time and all locking contention (including deadlocking) will be reduced. Use
SQL Query Analyzer's SHOW STATS I/O to determine the number of logical page
fetches associated with large queries. Consider the indexes used by selecting
SQL Query Analyzer's "Show query plan" option. Consider index
placement or SQL query redesign that will be more efficient and thus use less
I/O.
For more
information, search in SQL Server Books Online for the strings "Avoiding
Deadlocks," "Troubleshooting Deadlocking," "Detecting and
Ending Deadlocks," and "Analogy to Nonserializable
Transactions."
Use of inequality
operators in SQL queries will force databases to use table scans to evaluate
the inequalities. This generates high I/O if these queries regularly run
against very large tables.
Examples:
WHERE <column_name> != some_value
WHERE <column_name> <> some_value
Any WHERE expression
with NOT in it
If these types of
queries need to be run, try to restructure the queries to get rid of the NOT
keyword.
Example:
Instead of:
select * from tableA where col1 != "value"
Try using:
select * from tableA where col1 < "value" and col1 >
"value"
This lets SQL Server
make use of use of the index (preferably clustered in this case), if it is
built on col1 versus needing to resort to a table scan.
On very hot (heavily
accessed) tables, if there are columns that a SQL application does not need
regularly, it makes sense to move them to another table. The
more columns that are eliminated, the better for reducing I/O and increasing
performance. For more information, search in SQL Server Books Online for
the strings "Logical Database Design" and "Normalization."
SQL Server 7.0
provides for horizontal partitioning of tables through views. This provides I/O
performance benefits when database users wish to maintain SQL queries that
refer to one table name but the nature of the data retrieval is that queries
will always query fixed subsections of the data. For example, suppose there is
a very large table that documents sales for all sales departments for a year and
also assume that all retrievals from this table will be based on a single sales
department. This is a scenario where a partitioned view might be employed. A
sales table would be defined for each sales department, a constraint needs to
be defined on the sales department column on each table, and then a view would
be created on all of the tables to form a partitioned view. The constraint on
the sales department column is used by the query optimizer. When the view is
queried, all of the sales department tables that do not match the sales
department value provided in the query will be ignored by the query optimizer
and no I/O will be performed against those base tables. This improves query
performance by reducing I/O.
For more
information, search in SQL Server Books Online for the strings "Scenarios
for Using Views," "Create View," "Using Views with
Partitioned Data," "Modifying Data Through a
View," "Copying To or From a View," and
"Partitioning."
Ensuring that the
disk I/O subsystem and CPUs are performing well will provide performance
benefits to all SQL Server operations. This definitely includes replication and
backups. Transactional replication and transaction log backups involve reading
from transaction log files. Snapshot replication and backups perform serial
scans of database files. SQL Server 7.0's new storage structures have been
enhanced to make these operations very fast and efficient, so long as there is
no queuing occurring in the database server's CPUs or disk subsystems.
For more information
on performance tuning replication and backup/restores, search in SQL Server
Books Online for the strings "Replication Performance,"
"Optimizing Backup and Restore Performance," "Creating and
Restoring Differential Database Backups," "Creating and Applying
Transaction Log Backups," "Using Multiple Media or Devices,"
"Minimizing Backup and Recovery Times in Mission-Critical
Environments," "Backup/Restore Architecture," and "SQL
Server 7.0 on Large Servers."
For those
implementing SQL Server database systems on EMC Symmetrix
Enterprise Storage Systems, there are some disk I/O balancing methods that
should be kept in mind because of the unique nature of EMC Symmetrix
storage that will help avoid disk I/O bottleneck problems and maximize
performance.
Symmetrix storage systems
contain up to 16 gigabytes of RAM cache and contain internal processors within
the disk array that help speed the I/O processing of data without using host
server CPU resources. Within the Symmetrix box, there
are four major components to be concerned with to understand what to do to
balance disk I/O. One is the 16-GB cache inside the Symmetrix.
There are up to 32 SA channels that can be used to cable up to 32 SCSI cards
from Windows NT host servers into the Symmetrix, all
of these SA channels can be simultaneously requesting data from the 16-GB
cache. Then, within the Symmetrix box, there are up
to 32 connectors, called DA controllers, which are internal SCSI controllers
that connect up all of the internal disk drives within the Symmetrix
into the internal cache. And finally, there are the hard drives within the Symmetrix.
A note about the EMC
hard drives: They are SCSI hard drives with the same I/O capability of the
other SCSI drives that have been discussed in this document (75/150 rule
applies here). One feature commonly used with EMC technology is referred to as
"hyper-volumes." A hyper-volume is defined as a logical division of
an EMC hard drive such that to Windows NT Disk Administrator, the hyper-volume
looks just like another physical drive, so they can be manipulated with Windows
NT Disk Administrator just like any other disk drive. Multiple hyper-volumes
can be defined on each physical drive. It is very important when conducting
database performance tuning on EMC storage to be sure to work closely with EMC
field engineers to identify how hyper-volumes are defined (if there are any)
because it is important to avoid overloading a physical drive with database
I/O. This can happen easily if two or more hyper-volumes are believed to be
separate physical drives but in reality are two or more hyper-volumes on the
same physical drive.
SQL Server I/O
activities should be divided evenly among distinct DA controllers. This is
because DA controllers are assigned to a defined set of hard drives. As
described earlier in this document, SCSI controller bottlenecks are unlikely.
DA controllers are not likely to suffer an I/O bottleneck, but the set of hard
drives associated with a DA controller may be more susceptible. Within the
context of DA controllers and their associated disk drives, SQL Server disk I/O
balancing is accomplished the same way as with any other vendors
disk drives and controllers.
In terms of
monitoring the I/O on a DA channel or separate physical hard drives, get help
from EMC technical support staff, because this I/O activity is occurring
beneath the EMC internal cache and is not visible to Performance Monitor. EMC
storage units have internal monitoring tools that will allow an EMC technical
support engineer to monitor I/O statistics within the Symmetrix.
Performance Monitor can only see I/O coming to and from an EMC storage unit by
the I/O coming from an SA channel. This is enough information to indicate that
a given SA channel is queuing disk I/O requests but cannot tell which disk or
disks are causing the disk queuing. If an SA channel is queuing it is not
necessarily the case that the SA channel is causing the bottleneck, because it
could be (and more likely is) the disk drives that are causing problems. One
way to isolate the disk I/O bottleneck between the SA channels and the DA
channels + drives is to add another SCSI card to the host server and connect it
up to another SA channel. If Performance Monitor indicates that I/O across both
SA channels has not changed in volume and disk queuing is still occurring, it
indicates that it is not the SA channels that are causing the bottleneck.
Another way to isolate the I/O bottleneck is to have an EMC engineer monitor
the EMC system and analyze which drives or DA channels are bottlenecking,
through EMC monitoring tools.
Divide up SQL Server
activities evenly across as many of the disk drives as are available. If
working with smaller database that will sustain a large amount of I/O, consider
carefully the size of hyper-volume to have EMC technical engineers define.
Suppose the SQL Server will consist of a 30-GB database. EMC hard drives can
provide up to 23 GB in capacity. So, it is possible to fit the entire database
onto two drives. From a manageability and cost standpoint, this might seem
appealing, but from an I/O performance standpoint, it is not. An EMC storage
unit may have more than 100 internal drives to work with. Involving only two
drives for SQL Server may lead to I/O bottlenecks. Consider defining smaller
hyper-volumes, perhaps 2 GB each. This means close to 12 hyper-volumes may be
associated with a given 23-GB hard drive. Assuming 2 GB hyper-volumes, 15
hyper-volumes will be required to store the database. Make sure that each
hyper-volume is associated with a separate physical hard drive. Do not use 12
hyper-volumes from one physical drive and then another three hyper-volumes
associated on another physical drive, because that is the same as using two
physical drives (150 nonsequential I/O / 300
sequential I/O across the two drives). But with 15 hyper-volumes, each of which
are associated to a separate physical drive, SQL
Server will be utilizing 15 physical drives for providing I/O (1,125 nonsequential I/O / 2,250 sequential I/O activity per
second across the 15 drives).
Also consider
employing several SA channels from the host server to divide the I/O work
across controllers. This makes a lot of sense for host servers that support
more than a single PCI bus. In this case, consider using one SA channel per
host server PCI bus to divide I/O work across PCI buses as well as SA channels.
On EMC storage systems, each SA channel is associated to a specific DA channel
and, hence, a specific set of physical hard drives. Because SA channels read
and write their data to and from the EMC internal cache, it is unlikely that
the SA channel will be a point of I/O bottleneck. In keeping with the idea that
SCSI controller bottlenecks are not likely, it is probably best to invest time
to concentrate on balancing SQL Server activities across physical drives versus
worrying too much about how many SA channels to utilize.
·
Microsoft
SQL Server Books Online provides information on SQL Server architecture and
database tuning along with complete documentation on command syntax and
administration. SQL Server Books Online can be installed from the SQL Server
installation media on any SQL Server client or server installation. It is
recommended that any machine that will have someone actively working on SQL
Server have SQL Server Books Online installed on the hard disk for easy access.
·
For
the latest information on Microsoft SQL Server, including other white papers on
SQL Server 7.0, visit the Microsoft SQL Server Web site at http://www.microsoft.com/sql/.
·
Compaq
has updated its RAID white paper, which provides 50 pages of excellent information
on database server performance. Note that the 3 pages of Microsoft SQL
Server-specific information in this white paper pertain to version 6.5 and are
not applicable to SQL Server 7.0. The white paper is titled "Configuring
Compaq RAID Technology for Database Servers" and is located at A 30-page
white paper from Compaq's Windows NT integration team titled "Disk
Subsystem Performance and Scalability" is located at It details hardware
performance characteristics of Compaq hard drives and physical drive behavior.
The information contained in this paper will be applicable to SCSI hard drives
available from Compaq or other vendors.
·
Celko, Joe. SQL for Smarties.
Morgan Kaufmann Publishers, ISBN 1-55860-323-9.
There is some really
helpful information in this book. Contains solutions to
common problems such as representing and querying hierarchical data.
Chapter 28 is dedicated to optimizing SQL queries.