DIAGNOSING
ORACLE DATABASE HANGING ISSUES
This bulletin describes the methods and
tools used in diagnosing
database
hanging issues and performance issue).
These issues may be due to
tuning
problems, design problems or Oracle bugs such as latching issues.
We
determine the cause by narrowing down the problem and getting as much
information
as possible WHILE THE DATABASE IS HANGING.
The following diagnostic steps are
discussed.
1. Describe the Problems. Under all
situations go through this sections and
describe the problem.
2
Look for errors.
3. Do the requested queries.
4. Gather OS level data.
5. Get systemstate and hanganalyze dumps.
6. Get BSTAT-ESTAT OR STATPACK output.
7. Generate a core dump and run a
debugger (contact support
first).
8. If certain process get PROCESSSTATE
dumps
9. Check List for diagnostics.
Note:
It may not be necessary to shutdown the database to stop the
hanging. But if you MUST shutdown before speaking to
support, please
attempt
to get these diagnostics so that we can debug the problem.
Without
these diagnostics it may be impossible for us to determine the cause of
the
problem.
STEPS:
1.
DESCRIBE THE PROBLEM
What releases of all involved products are
you running?
- complete release like database release
8.0.5.2.
Is the "hanging" a true freeze or
is activity just very slow?
(Are logs switching in the alert log? Be
sure to check cpu, I/O and memory
utilization - see step 4).
When does the hanging start?
Does it stop? How long does it last?
Does the hanging happen suddenly or is it a
gradual performance
degradation, perhaps precipitated by
increased activity?
How many users are connected?
Has load on system increased recently?
Check for OTRACE creating files. (see
[NOTE:45482.1])
Unix NT
~~~~ ~~
cd $ORACLE_HOME/otrace/admin cd orant\otrace73\admin
ls *.dat dir *.dat
Note: If .dat files are being
created and OTRACE utility is not being used,
disable this option by setting the
init.ora parameter oracle_trace_enable = FALSE.
Are any events set in the
init<sid>.ora?
Which ones are affected?
What kind of work are they doing?
How much data are we talking about?
Are you running Oracle Parallel Server
(OPS)?
- Does the problem persist if you
shutdown all but one instance?
- Please note: Some of the solutions discussed here are useful for
OPS; many are not. Different tuning rules apply; for example, a
"smaller" buffer cache often
means better performance in OPS. Also,
the most common sources of hanging
issues in OPS are not discussed
here. These include: PCM locking
problems, pinging, space management
problems, lack of work partitioning,
internode parallel query tuning
requirements, shared disk or virtual
shared disk problems, network
problems, and DLM problems.
Are you running Multi-Threaded Server
(MTS)?
- Does the problem persist without it?
- How many users?
Are you using any Oracle applications or
tools
(financials, forms, etc.)?
Did you recently upgrade the rdbms,
application or tool,
operating system, memory or hardware?
Does the problem reproduce?
How often?
Can you reproduce it at will?
When does it NOT reproduce?
What EXACTLY is hanging? What is NOT? Get all the information
requested for each area that applies:
(a)
The entire database?
- All instances?
- All connections?
- All operations?
- All nodes?
- Can you start a session and select *
from dual?
- How often are log switches occurring?
- If there are archive-related errors
in the alert log
(see step 2), stop reading this
bulletin: Archiving
problems always freeze the
database. Review the error(s)
and resolve the problem. Example: the device may be full.
Another common cause is that the
ARCH process is stopped while
the database is in ARCHIVELOG
mode. Check this by connecting
internal and typing ARCHIVE LOG
LIST. If ARCHIVELOG mode and
automatic archival is DISABLED,
that's the problem. Either
restart the ARCH process by typing
ALTER SYSTEM ARCHIVE LOG
START (or log_archive_start = true
in init.ora) or shutdown,
startup mount exclusive and ALTER
DATABASE NOARCHIVELOG.
- If archiving problems have been
ruled out and the database is
truly frozen, complete steps 1, 2,
3, 4, 5 and 6.
- If database is not really frozen but
is suffering performance
degradation due to increased activity,
complete steps 1, 2, 3, 4 and 6.
Do not forget 1(d).
(b)
A specific SQL operation? Be sure you also do step 8.
Get a tkprof output with timed_statistics and explain plan.
See [NOTE:10585.1] QUERY AND APPLICATION
TUNING USING EXPLAIN
AND
TKPROF UTILITY
[NOTE:41634.1] TKPROF and Problem
Solving
[NOTE:1020282.6] TFTS FORMATTED
SELECT OF EXPLAIN PLAN TABLE
[NOTE:1019631.6] TFTS SCRIPT TO
SIMPLIFY THE USE OF EXPLAIN PLAN
- Select statement?
- It may need tuning. (This is
useful for more than sql syntax.)
- If a complex sql statement is
hanging, can you break successfully?
- See 1 (c)
- Parallel query?
- See [NOTE:240762.1] script to
show Explain Plan for Parallel Queries
- Could be space transaction
contention. If there are
ora-1575 errors in the alert
log, try setting
pct_increase = 0 in the
temporary tablespace to
disable SMON from coalescing
contiguous extents,
thus reducing contention with
query slaves. Also
try spreading datafiles over
several disks, increasing
sort_area_size (but don't
introduce paging) and
possibly "reducing"
degree of parallelism. Refer
to additional tuning
documentation.
- Update, insert or delete?
- possibly locking. Be sure to get v$lock
output in step 3. There are the following scripts available
to help debug locking problems.
[NOTE:1020012.6] TFTS SCRIPT TO
RETURN MEDIUM DETAIL LOCKING INFO
[NOTE:1020008.6] TFTS FULLY
DECODED LOCKING SCRIPT
[NOTE:1019527.6] TFTS CHECK FOR
FOREIGN KEY LOCKING
[NOTE:1020007.6] TFTS: DISPLAY
LOCKS & GIVE SID AND SERIAL# TO KILL
- Do you have constraints and/or
triggers on the object(s)
you're updating?
Could be a cascading locking problem.
Try putting index(es) on
relevant foreign key columns.
This modifies the locking
behavior on the parent. (See
Application Developers Guide for
more info).
See [NOTE:33453.1] REFERENTIAL
INTEGRITY AND LOCKING
- also see steps 1(c) and 1(d).
- DDL?
- possibly a data dictionary
related problem. See step 1(c).
- if "create index" it
might be a space transaction contention issue.
- Does the operation run faster the
second time around?
- Try tuning i/o. The following could be helpful:
clustering, increasing
db_block_size, distributing
i/o (i.e. striping, separating
indexes and data,
parallelizing). Also try setting pre_page_sga = true
in the init.ora.
- Is the operation always slower
immediately after a large
update? This is delayed block
cleanout.
(c)
Specific object?
- Can you do ANY operations on the
object?
- Try select count(*)
- If updates only are the problem,
could be locking.
Scripts are available to help
resolve these issues see:
- Be sure to do get lock
information
1(b) and 1(d).
There are the following scripts
available
to help debug locking problems.
[NOTE:1020012.6] TFTS SCRIPT TO
RETURN MEDIUM DETAIL LOCKING INFO
[NOTE:1020008.6] TFTS FULLY
DECODED LOCKING SCRIPT
[NOTE:1019527.6] TFTS CHECK FOR
FOREIGN KEY LOCKING
[NOTE:1020007.6] TFTS: DISPLAY
LOCKS & GIVE SID AND SERIAL# TO KILL
- Did you preallocate extents to the
object? This will bump
up the high water mark and cause
full table scans to appear
to "hang". Full table
scans always scan up to the high
water mark, even when the table contains
only a few rows. The
solution is to avoid preallocating
extents except immediately
before a large parallel procedural
insert or conventional load.
Don't preallocate extents for a
direct load.
- Try ANALYZE TABLE <tablename>
VALIDATE STRUCTURE CASCADE;
- any errors? Errors signify table or index corruption.
Call support.
- If no errors after ANALYZE, run
these queries and send the output and
trace files to support:
SPOOL analyze.log;
SELECT *
FROM sys.dba dba_tables
WHERE table_name =
'<TABLENAME>';
- block level space utilization
(i.e. initrans, maxtrans,
pctfree), a high chain count, etc., could be part of
the problem.
SELECT i.*
FROM sys.index_stats i,
sys.dba_indexes d
WHERE i.name = d.index_name
AND d.table_name =
'<TABLENAME>';
- an unwieldy index could be part
of the
problem. Do you do a lot of updates/deletes?
SPOOL OFF;
- Get tkprof output as in 1(b).
- If a view, do the above for the
underlying tables.
- To identify the underlying tables:
SELECT text
FROM sys.dba_views
WHERE view_name =
'<VIEWNAME>';
- you may need to SET
ARRAYSIZE 1 / SET LONG 2000 to
see all the text.
- definitely get tkprof output
as in 1(b).
- Data dictionary objects might also
be affected by the above.
(d)
Large scale update operation (i.e. perhaps using sqlloader,
import or batch job)?
- Do you have index(es) on the table?
- Does the update run during periods
of peak activity?
- Do you see "checkpoint not
complete" messages in the alert.log?
- This signifies that the redo logs
are too small; they
are having trouble keeping up.
- Is the tablespace wherein the table
resides in hot backup mode?
(see v$backup)
- A tablespace in hot backup mode
generates redo "records"
rather than "vectors".
During a large update this
can cause severe contention and
performance degradation.
- Are any of the following init.ora
parameters set to
non-default values:
db_block_checkpoint_batch
db_block_checksum
db_block_lru_statistics
db_block_lru_extended_statistics
row_locking
serializable
timed_statistics
- If using sqlloader, are you loading
with conventional
path? Using REPLACE option?
(Try using TRUNCATE instead).
Any sql functions in the control
file? Have you tried tuning
with READBUFFERS, BINDSIZE, ROWS,
PARALLEL?
- If using import, are you using commit=y? indexes=y?
constraints = y? Have you tried
increasing BUFFER? Any
info in LOG?
- Try increasing the number of system
buffers and tuning i/o
(see 1(b)).
- If many users are active during the
update, resource
contention could be contributing to
system slowdown.
Rollback segments, redo latches, i/o
and db_block_buffers
are some possible areas of
contention. V$session_wait
and Bstat Estat output are good
places to start researching
contention issues. See section on
estat bstat.
-Do steps 1(b), 1(c), 3, 4 and 5.
(e)
Specific package, procedure or PRO*C application?
- What does it do?
- What SQL statement is it stuck on?
see step. Can you
remove the statement and get it to
run successfully in sqlplus?
- Get tkprof output on the
application, procedure or isolated
sql statement with timed_statistics
and explain plan.
- Optional: if a procedure, utilize
DBMS_ALERT to pinpoint where
the hanging starts.
- If a PRO*C program, and tkprof
identifies "parsing" as the
bottleneck of the embedded sql
statement, tune this with
precompiler parameters hold_cursor
and release_cursor. See
bulletin on precompiler features.
- If a package, can you execute any of
the individual procedures
successfully?
- Is the package/procedure swapping in
and out of the shared pool?
Pinning might be advisable (a
package would have to be created
for a standalone procedure).
SPOOL objcache.log
SELECT *
FROM v$db_object_cache
WHERE name = '<NAME>';
SPOOL OFF;
- Resource contention could be
contributing to system
slowdown. For example, library cache contention
(affected by shared_pool_size) is
possible in addition
to the short list of contention
areas mentioned in 1(d).
Refer to additional tuning
documentation.
- Be sure to do step 3.
- What is the cpu / io utilization?
(get this from
v$sesstat above or see step 4)
- Is the process swapping in and out
of memory? (see step 4)
- Did you change the process priority
or "nice" it?
- Steps 3 and 4 recommended.
(f)
remote access only?
- Can you do select * from
dual@db_link?
- Can you connect to the remote
machine and perform the operation
locally? Can you select * from dual locally?
- If you can do these tests and the
hanging reproduces, you
should answer all these questions for the REMOTE machine.
- Are you doing a distributed update?
- What is distributed_lock_timeout set
to in the init.ora.
- Are you refreshing snapshots? using symmetric replication?
- Is it a remote select statement
that's hanging?
- Get a tkprof output with
timed_statistics and explain
plan. On the explain plan,
you'll see that certain
statements just say
"REMOTE"; the actual statements sent
across the network are recorded
in the OTHERS column
of the PLAN_TABLE - spool a
select of that column.
- If joining two tables on a remote
node, try putting a
'joining view' on that node and
then selecting against the
view.
See [NOTE:41634.1] TKPROF and
Problem Solving
[NOTE:10585.1] QUERY AND
APPLICATION TUNING USING EXPLAIN
AND TKPROF UTILITY
- The following may reduce any network
overhead that's contributing
to the problem: setting ARRAYSIZE in
sql operations, using
anonymous pl/sql blocks rather than
independent sql statements,
using explicit rather than implicit
cursors. Refer to additional
tuning documentation.
(g)
Operations using 3rd party application (Power Builder, etc).
- Can you get the problem to reproduce
in sqlplus? (If you can
do this test but can't reproduce the
problem, you need to
call the 3rd party vendor).
(h)
Database shutdown/startup?
- How was it shutdown? normal?
immediate? abort? did it crash?
- If startup hangs and the
database went down abnormally (crash
or shutdown abort)and there are
no errors in the alert log,
this could be normal instance
recovery, an internal matter
(if errors),system problems (see
system logs), inadequate cleanup
at shutdown (try shutdown abort
/ startup), unsupported
software/ os combination or
(less likely) data corruption.
Wait 30 minutes then call
support if it doesn't open.
During normal instance recovery
an example of the messages
that will be written to the
alert log are listed below:
Starting ORACLE instance
(normal)
Starting up ORACLE RDBMS
Version: 8.1.6.3.0.
System parameters with
non-default values:
processes = 50
...
Crash recovery completed
successfully
Tue May 22 16:15:26 2001
Thread 1 advanced to log sequence 7139
Thread 1 opened at log
sequence 7139
Current log# 2 seq# 7139
mem# 0: /u02/oradata/V817/redo02.log
Successful open of redo
thread 1.
Tue May 22 16:15:26 2001
SMON: enabling cache
recovery
SMON: enabling tx recovery
Tue May 22 16:15:51 2001
Completed: alter database
open
- If shutdown normal hangs, this
means Oracle is waiting
for active sessions to log off.
- If shutdown immediate hangs,
this could be for
the same reasons as a hang
at startup.
- Be sure to check for errors in step
2. Tracing the process
may also be useful: see step 4.
V$session_wait may be
useful: see step 3.
- Note: if using a system debugger to
trace a hanging startup
or shutdown operation, be sure to
get the relevant svrmgr or
sqlplus "shadow" (child)
pid. On unix boxes:
ps -ef | grep svrmgrl
usupport 26615 19866 ...etc... svrmgrl
ps -ef | grep 26615
usupport 52472 26615 ...etc... oracleV732 (DESCR...
Pid 52472 is what you want to
trace.
2. LOOK FOR ERRORS
(a) Check the tail of the
alert_<sid>.log (or portion of the alert
corresponding to the hang). Check for ANY messages other than
log switches ("advanced to log
sequence...")
Note: to find the alert.log directory,
check the
background_dump_dest parameter in the
init<sid>.ora or
config.ora. Or do the following
command.
$ sqlplus "/as sysdba"
show parameter dest
(b) Check the same directory for trace
files generated around the time
of the hang. Look in them for error messages. Don't search
whole files; relevant messages appear
towards the beginning.
On unix ls -ltr will show you files
that have been updated lately last.
(c) If having remote access problems, check
the sql*net trace directory
for traces as well.
(d) Check system error logs for messages.
On many unix systems
cd /var/adm
ls -ltr
Check for messages files recently
updated.
(e) Be prepared to send a portion of the
alert<sid>.log at least 4 hours
prior to the problem,
init<sid>.ora and any traces to support.
For fastest service:
- Move trace files to oracle-ftp
cd server/incoming
mkdir <tar number>
cd <tar number>
place traces files, alert log and any other
files here
Or
- each file should go into a separate
email message
- put the tar number and file
description in the subject line
- don't uuencode or attach the files
if you don't have to
(i.e. unless they're too big) -
just read them into the
message as ascii.
If you can't comply with all of the
above, just do your best.
Faxing is also possible but may not be
practical.
At least, please have the information
READY to communicate to
support before you call.
3. SPOOL OUT RELEVANT V$ VIEWS
WHILE the database is hanging, run the
following queries:
SPOOL v_views.log;
SELECT *
FROM v$parameter;
SELECT class, value, name
FROM v$sysstat;
SELECT sid, id1, id2, type, lmode, request
FROM v$lock;
SELECT l.latch#, n.name, h.pid, l.gets,
l.misses,
l.immediate_gets, l.immediate_misses,
l.sleeps
FROM v$latchname n, v$latchholder h,
v$latch l
WHERE l.latch# = n.latch#
AND l.addr = h.laddr(+);
SELECT *
FROM v$session_wait
ORDER BY sid;
/* repeat last query 3 times - we want to
see who's repeatedly waiting*/
SPOOL OFF;
If a specific session hangs do these
queries:
SPOOL session.log;
- Get the sid from the pid or vice versa:
SELECT s.sid, p.spid
FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND ... < p.spid = <os
pid> or perhaps
s.sid = <sid from v$session> >
- What SQL statement is being executed?
SELECT s.sid, s.status, q.sql_text
FROM v$session s, v$sqltext q
WHERE s.sql_hash_value =
q.hash_value
AND s.sql_address = q.address
AND s.sid = <sid>
order by q.piece;
-Get output from V$SESSION_WAIT
column sid format 990
column seq# format 99990
column wait_time heading 'WTime'
format 99990
column event format a30
column p1 format 9999999990
column p2 format 9999999990
column p3 format 9990
select
sid,event,seq#,p1,p2,p3,wait_time from V$session_wait
where sid=<SID>
order by sid;
*** It is needed to run this statement
a few times to see if the wait events
actually change
- What is LOCKWAIT column set to in
V$SESSION?
SELECT lockwait
FROM v$session
WHERE sid = <sid>;
- If not null, who's holding what lock on
which object?
col Username format A15
col Sid format 9990 heading SID
col Type format A4
col Lmode format 990 heading
'HELD'
col Request format 990 heading
'REQ'
col Id1 format 9999990
col Id2 format 9999990
select SN.Username, M.Sid, M.Type,
DECODE(M.Lmode, 0, 'None', 1,
'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row
Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(Lmode,'990')))
Lmode,
DECODE(M.Request, 0, 'None', 1,
'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row
Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(M.Request, '990')))
Request,
M.Id1, M.Id2 from V$SESSION SN,
V$LOCK M
WHERE (SN.Sid = M.Sid and
M.Request ! = 0)
or (SN.Sid = M.Sid and M.Request = 0 and Lmode != 4 and (id1, id2)
in (select S.Id1, S.Id2 from
V$LOCK S where Request != 0 and S.Id1
= M.Id1 and S.Id2 = M.Id2) )
order by Id1, Id2, M.Request;
- What is LATCHWAIT column set to in
V$PROCESS?
SELECT latchwait
FROM v$process
WHERE spid = <pid>;
- If not null, who's holding what
latch?
column name format a32 heading
'LATCH NAME'
column pid heading 'HOLDER PID'
select
c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b,
v$latchname c
where a.addr = b.laddr(+) and a.latch# = c.latch#
and c.name like '&latch_name%'
order by a.latch#;
- Did the lock holder or latch holder
shut off his/her
terminal without logging off? This can leave a zombie shadow
process holding the resource. Kill it with "alter system
kill session '<sid, serial# from
v$session>'". If on sqlnet 2.1
or above, try setting
sqlnet.expire_time in the sqlnet.ora
on the server side. (For more info, see page A-16 of
"Understanding SQL*NET"
v2.1).
- If session is not frozen but slow,
get session-specific
statistics:
SELECT s.sid, s.value, t.name
FROM v$sesstat s, v$statname t
WHERE s.statistic# = t.statistic#
AND s.sid = <sid>;
- If session is extremely slow or
truly frozen, get session-
specific wait info:
SELECT *
FROM v$session_wait
where sid = <sid>;
- repeat above query 3 times.
SPOOL OFF;
-
If this is distributed transaction run the following statements from EVERY
NODE involved in the transaction:
SPOOL 2pc.log;
SELECT * FROM dba_2pc_pending;
SELECT * FROM pending_sessions$;
SELECT * FROM
pending_sub_sessions$;
SELECT * FROM dba_2pc_neighbors;
SPOOL OFF;
The following spool will help determine the
cause in MTS.
SPOOL mts.log
- How busy the dispatchers are:
select name,network,status,
(busy /(busy + idle)) * 100 "% of
time busy" from v$dispatchers
- query the shared servers view
V$SHARED_SERVERS:
select name,status,requests, (busy /(busy + idle)) * 100 "% of time
busy"
from v$shared_servers
Spool off
4. GATHER OS-LEVEL INFO
a. Briefly describe your architecture.
Include number of
CPUs / disks. Are you using raw devices, NFS mounted
files, shared disks, RAM disks...Are
you mirroring? etc.
b. Measure unusual OS level activity:
excessive CPU or i/o,
paging, swapping, etc. There are other
monitor tools such
as TOP. Sun has utilities on their
webpage that can be downloaded
for monitoring the system.
If you're not comfortable with the
utilities listed below, be sure to do
step 3 and,
if you have isolated a troubled process,
the v$sesstat
output in step.
-
On unix: use SAR, VMSTAT and NETSTAT (man sar)
- You can use truss, trace (man
truss) or oradbx
(call support) for tracing specific
processes.
- You can also unix debuggers such as
dbx, xdb, adb or truss
to dump process activity. Example:
dbx -a <shadow pid> /*attach to hanging process*/
where /*dump Oracle stack */
detach /*detach from process*/
truss -o /tmp/mytruss -fae -p
<pid of process>
-
On vms: use MONITOR (help monitor)
- you can use analyze/process (help
analyze) or orambx
(call support) for tracing specific
processes.
-
On novl: use LOAD MONITOR (gui - no online documentation)
-
On Windows NT: use Performance Monitor, Event Monitor,
Dr. Watson (see Resource kit for
documentation).
- use pview for monitoring specific
processes
(resource kit).
c. Check the system log for additional
information. On many platforms
it is at /var/adm the files are the
message files.
5. GET SYSTEMSTATE and HANGANALYZE DUMPS.
This creates a large trace file in the
user_dump_dest (30M or more is not
unusual).
Note: the init<sid>.ora parameter
MAX_DUMP_FILE_SIZE controls
the maximum trace file size. Using Oradebug and setting unlimit will
allow a complete dump which we will need.
Do this step for sure if the entire
database is frozen or nearly
frozen and if this condition came on
suddenly and there are no archive
errors in the alert log. Please note: As systemstate dumps are instance
specific, they tend to be inconclusive with
hanging problems involving
Oracle Parallel Server (OPS) unless you
get them from each node. You will
need 3 system state dumps from each node
for OPS.
NOTE: Do systemstate dump 3 times in a
row, so we can determine whether
the processes are hung or active.
For Oracle 8.0.5.x to 8.1.7.x
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ svrmgrl
svrmgr> connect internal
svrmgr>ALTER SESSION SET EVENTS
'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS
'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';
EXIT ... then reconnect
svrmgr>ALTER SESSION SET
MAX_DUMP_FILE_SIZE=UNLIMITED;
svrmgr>ALTER SESSION SET EVENTS
'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS
'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS
'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';
For Oracle 9.2.0.1 or higher
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ sqlplus /nolog
connect / as sysdba
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 10
6. GET BSTAT-ESTAT OR (for 8.1.6 or higher)
STATSPACK output
If the entire database is suffering
performance degradation .
See [NOTE:62161.1] System wide Tuning using UTLESTAT Reports in
Oracle7/8
See [NOTE:149113.1] Installing and
Configuring StatsPack Package
See [NOTE:149124.1] Creating a StatsPack
performance report
Send this report to support along with the
other files requested.
7. GENERATE A CORE DUMP AND RUN A DEBUGGER.
This step is a last resort. It is included to advise
you of an option when others are exhausted
(e.g. when
systemstate dump hangs). You should be in contact with
support before proceeding. Do not send a
core to Oracle unless requested.
Core: Debugger: Documentation:
Unix: kill -11
dbx, adb man
VMS: N/A
analyze/system help
Windows NT: pview Dr.
Watson resource kit
Novl: vdb3 or vdb4
vdb3 or vdb4 Appx of Ora7
Guide
for Netware
8. If certain processes get PROCESSSTATE dump.
-
Trace the process from the os level (see step 4).
Get processstate dumps. Do 3 times.
This generates a trace file in your
user_dump_dest
(from sqlplus: show parameter
user_dump_dest).
$
sqlplus "/as sysdba"
oradebug setospid <process ID>
oradebug unlimit
oradebug dump processstate 10
Get errorstacks from the process. Do 3
times. This generates a trace
file in your user_dump_dest
(from sqlplus: show parameter
user_dump_dest).
$
sqlplus "/as sysdba"
oradebug setospid <process ID>
oradebug unlimit
oradebug dump errorstack 3
9. Check List for diagnostics.
Go
over the information you have gathered and then place it on Oracle
External
FTP
site.
Please
include all configuration information releases and options used.
Please
include any changes you made to the system recently hardware or software.
Please
include results from locking scripts if this may be a locking issue.
If
you did an analyze please include results.
Did
you run an analyze.log if so include it?
Did
you include init parameters and config files?
Did
you include the statements that are having the problem?
If
this is a tool such as import, export, sqlloader did you include the exact
statements
used to run and the parameter files?
Did
you answer the questions and include the answers?
Did
you get the explain plan and tkprof output?
If
you have a small reproducible test case did you include it?
n
Statements that are being run.
n
All table, index, view definitions.
n
All constraints. In short everything it
would take for us to reproduce it.
Did
you trace the processes with a debugger? If so did you include the trace
from
the debugger?
Have
you included all trace files?
Have
you included the alert<sid>.log?
If
core dumps where generated did you translate them and send the stack?
(Do
not send core files.)
Have
you sent the sql*net trace files?
Did
you include v_views.log?
Have
you included session.log;
Did
you spool session-specific wait info:
If
this is distributed transaction have you included 2PC.log?
If
MTS have you included mts.log?
Did
you gather os-level information on CPU,
Memory and I/O?
Did
you GET SYSTEMSTATE DUMPS and palce
these on ftp site?
Did
you run BSTAT-ESTAT and include the report?
Did
you include PROCESSSTATE and ERRORSTACK
dumps.
APPENDIX
A: RECOMMENDED BULLETINS
[NOTE:41634.1] TKPROF and Problem Solving
[NOTE:1020282.6] TFTS FORMATTED SELECT OF
EXPLAIN PLAN