Vladimir Grigorian, DIAGNOSING ORACLE DATABASE HANGING ISSUES

DIAGNOSING ORACLE DATABASE HANGING ISSUES

This bulletin describes the methods and tools used in diagnosing

database hanging issues and performance issue). These issues may be due to

tuning problems, design problems or Oracle bugs such as latching issues.

We determine the cause by narrowing down the problem and getting as much

information as possible WHILE THE DATABASE IS HANGING.

The following diagnostic steps are discussed.

1. Describe the Problems. Under all situations go through this sections and

describe the problem.

2 Look for errors.

3. Do the requested queries.

4. Gather OS level data.

5. Get systemstate and hanganalyze dumps.

6. Get BSTAT-ESTAT OR STATPACK output.

7. Generate a core dump and run a debugger (contact support

first).

8. If certain process get PROCESSSTATE dumps

9. Check List for diagnostics.

Note: It may not be necessary to shutdown the database to stop the

hanging. But if you MUST shutdown before speaking to support, please

attempt to get these diagnostics so that we can debug the problem.

Without these diagnostics it may be impossible for us to determine the cause of

the problem.

STEPS:

1. DESCRIBE THE PROBLEM

What releases of all involved products are you running?

- complete release like database release 8.0.5.2.

Is the "hanging" a true freeze or is activity just very slow?

(Are logs switching in the alert log? Be sure to check cpu, I/O and memory

utilization - see step 4).

When does the hanging start?

Does it stop? How long does it last?

Does the hanging happen suddenly or is it a gradual performance

degradation, perhaps precipitated by increased activity?

How many users are connected?

Has load on system increased recently?

Check for OTRACE creating files. (see [NOTE:45482.1])

Unix NT

~~~~ ~~

cd $ORACLE_HOME/otrace/admin cd orant\otrace73\admin

ls *.dat dir *.dat

Note: If .dat files are being created and OTRACE utility is not being used,

disable this option by setting the init.ora parameter oracle_trace_enable = FALSE.

Are any events set in the init<sid>.ora?

Which ones are affected?

What kind of work are they doing?

How much data are we talking about?

Are you running Oracle Parallel Server (OPS)?

- Does the problem persist if you shutdown all but one instance?

- Please note: Some of the solutions discussed here are useful for

OPS; many are not. Different tuning rules apply; for example, a

"smaller" buffer cache often means better performance in OPS. Also,

the most common sources of hanging issues in OPS are not discussed

here. These include: PCM locking problems, pinging, space management

problems, lack of work partitioning, internode parallel query tuning

requirements, shared disk or virtual shared disk problems, network

problems, and DLM problems.

Are you running Multi-Threaded Server (MTS)?

- Does the problem persist without it?

- How many users?

Are you using any Oracle applications or tools

(financials, forms, etc.)?

Did you recently upgrade the rdbms, application or tool,

operating system, memory or hardware?

Does the problem reproduce?

How often?

Can you reproduce it at will?

When does it NOT reproduce?

What EXACTLY is hanging? What is NOT? Get all the information

requested for each area that applies:

(a) The entire database?

- All instances?

- All connections?

- All operations?

- All nodes?

- Can you start a session and select * from dual?

- How often are log switches occurring?

- If there are archive-related errors in the alert log

(see step 2), stop reading this bulletin: Archiving

problems always freeze the database. Review the error(s)

and resolve the problem. Example: the device may be full.

Another common cause is that the ARCH process is stopped while

the database is in ARCHIVELOG mode. Check this by connecting

internal and typing ARCHIVE LOG LIST. If ARCHIVELOG mode and

automatic archival is DISABLED, that's the problem. Either

restart the ARCH process by typing ALTER SYSTEM ARCHIVE LOG

START (or log_archive_start = true in init.ora) or shutdown,

startup mount exclusive and ALTER DATABASE NOARCHIVELOG.

- If archiving problems have been ruled out and the database is

truly frozen, complete steps 1, 2, 3, 4, 5 and 6.

- If database is not really frozen but is suffering performance

degradation due to increased activity, complete steps 1, 2, 3, 4 and 6.

Do not forget 1(d).

(b) A specific SQL operation? Be sure you also do step 8.

Get a tkprof output with timed_statistics and explain plan.

See [NOTE:10585.1] QUERY AND APPLICATION TUNING USING EXPLAIN

AND TKPROF UTILITY

[NOTE:41634.1] TKPROF and Problem Solving

[NOTE:1020282.6] TFTS FORMATTED SELECT OF EXPLAIN PLAN TABLE

[NOTE:1019631.6] TFTS SCRIPT TO SIMPLIFY THE USE OF EXPLAIN PLAN

- Select statement?

- It may need tuning. (This is useful for more than sql syntax.)

- If a complex sql statement is hanging, can you break successfully?

- See 1 (c)

- Parallel query?

- See [NOTE:240762.1] script to show Explain Plan for Parallel Queries

- Could be space transaction contention. If there are

ora-1575 errors in the alert log, try setting

pct_increase = 0 in the temporary tablespace to

disable SMON from coalescing contiguous extents,

thus reducing contention with query slaves. Also

try spreading datafiles over several disks, increasing

sort_area_size (but don't introduce paging) and

possibly "reducing" degree of parallelism. Refer

to additional tuning documentation.

- Update, insert or delete?

- possibly locking. Be sure to get v$lock

output in step 3. There are the following scripts available

to help debug locking problems.

[NOTE:1020012.6] TFTS SCRIPT TO RETURN MEDIUM DETAIL LOCKING INFO

[NOTE:1020008.6] TFTS FULLY DECODED LOCKING SCRIPT

[NOTE:1019527.6] TFTS CHECK FOR FOREIGN KEY LOCKING

[NOTE:1020007.6] TFTS: DISPLAY LOCKS & GIVE SID AND SERIAL# TO KILL

- Do you have constraints and/or triggers on the object(s)

you're updating? Could be a cascading locking problem.

Try putting index(es) on relevant foreign key columns.

This modifies the locking behavior on the parent. (See

Application Developers Guide for more info).

See [NOTE:33453.1] REFERENTIAL INTEGRITY AND LOCKING

- also see steps 1(c) and 1(d).

- DDL?

- possibly a data dictionary related problem. See step 1(c).

- if "create index" it might be a space transaction contention issue.

- Does the operation run faster the second time around?

- Try tuning i/o. The following could be helpful:

clustering, increasing db_block_size, distributing

i/o (i.e. striping, separating indexes and data,

parallelizing). Also try setting pre_page_sga = true

in the init.ora.

- Is the operation always slower immediately after a large

update? This is delayed block cleanout.

- Can you do ANY operations on the object?

- Try select count(*)

- If updates only are the problem, could be locking.

Scripts are available to help resolve these issues see:

- Be sure to do get lock information

1(b) and 1(d).

There are the following scripts available

to help debug locking problems.

[NOTE:1020012.6] TFTS SCRIPT TO RETURN MEDIUM DETAIL LOCKING INFO

[NOTE:1020008.6] TFTS FULLY DECODED LOCKING SCRIPT

[NOTE:1019527.6] TFTS CHECK FOR FOREIGN KEY LOCKING

[NOTE:1020007.6] TFTS: DISPLAY LOCKS & GIVE SID AND SERIAL# TO KILL

- Did you preallocate extents to the object? This will bump

up the high water mark and cause full table scans to appear

to "hang". Full table scans always scan up to the high

water mark, even when the table contains only a few rows. The

solution is to avoid preallocating extents except immediately

before a large parallel procedural insert or conventional load.

Don't preallocate extents for a direct load.

- Try ANALYZE TABLE <tablename> VALIDATE STRUCTURE CASCADE;

- any errors? Errors signify table or index corruption.

Call support.

- If no errors after ANALYZE, run these queries and send the output and

trace files to support:

SPOOL analyze.log;

SELECT *

FROM sys.dba dba_tables

WHERE table_name = '<TABLENAME>';

- block level space utilization (i.e. initrans, maxtrans,

pctfree), a high chain count, etc., could be part of

the problem.

SELECT i.*

FROM sys.index_stats i, sys.dba_indexes d

WHERE i.name = d.index_name

AND d.table_name = '<TABLENAME>';

- an unwieldy index could be part of the

problem. Do you do a lot of updates/deletes?

SPOOL OFF;

- Get tkprof output as in 1(b).

- If a view, do the above for the underlying tables.

- To identify the underlying tables:

SELECT text

FROM sys.dba_views

WHERE view_name = '<VIEWNAME>';

- you may need to SET ARRAYSIZE 1 / SET LONG 2000 to

see all the text.

- definitely get tkprof output as in 1(b).

- Data dictionary objects might also be affected by the above.

(d) Large scale update operation (i.e. perhaps using sqlloader,

import or batch job)?

- Do you have index(es) on the table?

- Does the update run during periods of peak activity?

- Do you see "checkpoint not complete" messages in the alert.log?

- This signifies that the redo logs are too small; they

are having trouble keeping up.

- Is the tablespace wherein the table resides in hot backup mode?

(see v$backup)

- A tablespace in hot backup mode generates redo "records"

rather than "vectors". During a large update this

can cause severe contention and performance degradation.

- Are any of the following init.ora parameters set to

non-default values:

db_block_checkpoint_batch

db_block_checksum

db_block_lru_statistics

db_block_lru_extended_statistics

row_locking

serializable

timed_statistics

- If using sqlloader, are you loading with conventional

path? Using REPLACE option? (Try using TRUNCATE instead).

Any sql functions in the control file? Have you tried tuning

with READBUFFERS, BINDSIZE, ROWS, PARALLEL?

- If using import, are you using commit=y? indexes=y?

constraints = y? Have you tried increasing BUFFER? Any

info in LOG?

- Try increasing the number of system buffers and tuning i/o

(see 1(b)).

- If many users are active during the update, resource

contention could be contributing to system slowdown.

Rollback segments, redo latches, i/o and db_block_buffers

are some possible areas of contention. V$session_wait

and Bstat Estat output are good places to start researching

contention issues. See section on estat bstat.

-Do steps 1(b), 1(c), 3, 4 and 5.

(e) Specific package, procedure or PRO*C application?

- What does it do?

- What SQL statement is it stuck on? see step. Can you

remove the statement and get it to run successfully in sqlplus?

- Get tkprof output on the application, procedure or isolated

sql statement with timed_statistics and explain plan.

- Optional: if a procedure, utilize DBMS_ALERT to pinpoint where

the hanging starts.

- If a PRO*C program, and tkprof identifies "parsing" as the

bottleneck of the embedded sql statement, tune this with

precompiler parameters hold_cursor and release_cursor. See

bulletin on precompiler features.

- If a package, can you execute any of the individual procedures

successfully?

- Is the package/procedure swapping in and out of the shared pool?

Pinning might be advisable (a package would have to be created

for a standalone procedure).

SPOOL objcache.log

SELECT *

FROM v$db_object_cache

WHERE name = '<NAME>';

SPOOL OFF;

- Resource contention could be contributing to system

slowdown. For example, library cache contention

(affected by shared_pool_size) is possible in addition

to the short list of contention areas mentioned in 1(d).

Refer to additional tuning documentation.

- Be sure to do step 3.

- What is the cpu / io utilization? (get this from

v$sesstat above or see step 4)

- Is the process swapping in and out of memory? (see step 4)

- Did you change the process priority or "nice" it?

- Steps 3 and 4 recommended.

(f) remote access only?

- Can you do select * from dual@db_link?

- Can you connect to the remote machine and perform the operation

locally? Can you select * from dual locally?

- If you can do these tests and the hanging reproduces, you

should answer all these questions for the REMOTE machine.

- Are you doing a distributed update?

- What is distributed_lock_timeout set to in the init.ora.

- Are you refreshing snapshots? using symmetric replication?

- Is it a remote select statement that's hanging?

- Get a tkprof output with timed_statistics and explain

plan. On the explain plan, you'll see that certain

statements just say "REMOTE"; the actual statements sent

across the network are recorded in the OTHERS column

of the PLAN_TABLE - spool a select of that column.

- If joining two tables on a remote node, try putting a

'joining view' on that node and then selecting against the

view.

See [NOTE:41634.1] TKPROF and Problem Solving

[NOTE:10585.1] QUERY AND APPLICATION TUNING USING EXPLAIN

AND TKPROF UTILITY

- The following may reduce any network overhead that's contributing

to the problem: setting ARRAYSIZE in sql operations, using

anonymous pl/sql blocks rather than independent sql statements,

using explicit rather than implicit cursors. Refer to additional

tuning documentation.

(g) Operations using 3rd party application (Power Builder, etc).

- Can you get the problem to reproduce in sqlplus? (If you can

do this test but can't reproduce the problem, you need to

call the 3rd party vendor).

(h) Database shutdown/startup?

- How was it shutdown? normal? immediate? abort? did it crash?

- If startup hangs and the database went down abnormally (crash

or shutdown abort)and there are no errors in the alert log,

this could be normal instance recovery, an internal matter

(if errors),system problems (see system logs), inadequate cleanup

at shutdown (try shutdown abort / startup), unsupported

software/ os combination or (less likely) data corruption.

Wait 30 minutes then call support if it doesn't open.

During normal instance recovery an example of the messages

that will be written to the alert log are listed below:

Starting ORACLE instance (normal)

Starting up ORACLE RDBMS Version: 8.1.6.3.0.

System parameters with non-default values:

processes = 50

...

Crash recovery completed successfully

Tue May 22 16:15:26 2001

Thread 1 advanced to log sequence 7139

Thread 1 opened at log sequence 7139

Current log# 2 seq# 7139 mem# 0: /u02/oradata/V817/redo02.log

Successful open of redo thread 1.

Tue May 22 16:15:26 2001

SMON: enabling cache recovery

SMON: enabling tx recovery

Tue May 22 16:15:51 2001

Completed: alter database open

- If shutdown normal hangs, this means Oracle is waiting

for active sessions to log off.

- If shutdown immediate hangs, this could be for

the same reasons as a hang at startup.

- Be sure to check for errors in step 2. Tracing the process

may also be useful: see step 4. V$session_wait may be

useful: see step 3.

- Note: if using a system debugger to trace a hanging startup

or shutdown operation, be sure to get the relevant svrmgr or

sqlplus "shadow" (child) pid. On unix boxes:

ps -ef | grep svrmgrl

usupport 26615 19866 ...etc... svrmgrl

ps -ef | grep 26615

usupport 52472 26615 ...etc... oracleV732 (DESCR...

Pid 52472 is what you want to trace.

2. LOOK FOR ERRORS

(a) Check the tail of the alert_<sid>.log (or portion of the alert

corresponding to the hang). Check for ANY messages other than

log switches ("advanced to log sequence...")

Note: to find the alert.log directory, check the

background_dump_dest parameter in the init<sid>.ora or

config.ora. Or do the following command.

$ sqlplus "/as sysdba"

show parameter dest

(b) Check the same directory for trace files generated around the time

of the hang. Look in them for error messages. Don't search

whole files; relevant messages appear towards the beginning.

On unix ls -ltr will show you files that have been updated lately last.

for traces as well.

(d) Check system error logs for messages. On many unix systems

cd /var/adm

ls -ltr

Check for messages files recently updated.

(e) Be prepared to send a portion of the alert<sid>.log at least 4 hours

prior to the problem, init<sid>.ora and any traces to support.

For fastest service:

- Move trace files to oracle-ftp

cd server/incoming

mkdir <tar number>

cd <tar number>

place traces files, alert log and any other files here

- each file should go into a separate email message

- put the tar number and file description in the subject line

- don't uuencode or attach the files if you don't have to

(i.e. unless they're too big) - just read them into the

message as ascii.

If you can't comply with all of the above, just do your best.

Faxing is also possible but may not be practical.

At least, please have the information READY to communicate to

support before you call.

3. SPOOL OUT RELEVANT V$ VIEWS

WHILE the database is hanging, run the following queries:

SPOOL v_views.log;

SELECT *

FROM v$parameter;

SELECT class, value, name

FROM v$sysstat;

SELECT sid, id1, id2, type, lmode, request

FROM v$lock;

SELECT l.latch#, n.name, h.pid, l.gets, l.misses,

l.immediate_gets, l.immediate_misses, l.sleeps

FROM v$latchname n, v$latchholder h, v$latch l

WHERE l.latch# = n.latch#

AND l.addr = h.laddr(+);

SELECT *

FROM v$session_wait

ORDER BY sid;

/* repeat last query 3 times - we want to see who's repeatedly waiting*/

SPOOL OFF;

If a specific session hangs do these queries:

SPOOL session.log;

- Get the sid from the pid or vice versa:

SELECT s.sid, p.spid

FROM v$session s, v$process p

WHERE s.paddr = p.addr

AND ... < p.spid = <os pid> or perhaps

s.sid = <sid from v$session> >

- What SQL statement is being executed?

SELECT s.sid, s.status, q.sql_text

FROM v$session s, v$sqltext q

WHERE s.sql_hash_value = q.hash_value

AND s.sql_address = q.address

AND s.sid = <sid>

order by q.piece;

-Get output from V$SESSION_WAIT

column sid format 990

column seq# format 99990

column wait_time heading 'WTime' format 99990

column event format a30

column p1 format 9999999990

column p2 format 9999999990

column p3 format 9990

select sid,event,seq#,p1,p2,p3,wait_time from V$session_wait

where sid=<SID>

order by sid;

*** It is needed to run this statement a few times to see if the wait events

actually change

- What is LOCKWAIT column set to in V$SESSION?

SELECT lockwait

FROM v$session

WHERE sid = <sid>;

- If not null, who's holding what lock on which object?

col Username format A15

col Sid format 9990 heading SID

col Type format A4

col Lmode format 990 heading 'HELD'

col Request format 990 heading 'REQ'

col Id1 format 9999990

col Id2 format 9999990

select SN.Username, M.Sid, M.Type,

DECODE(M.Lmode, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row

Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',

LTRIM(TO_CHAR(Lmode,'990'))) Lmode,

DECODE(M.Request, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row

Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',

LTRIM(TO_CHAR(M.Request, '990'))) Request,

M.Id1, M.Id2 from V$SESSION SN, V$LOCK M

WHERE (SN.Sid = M.Sid and M.Request ! = 0)

or (SN.Sid = M.Sid and M.Request = 0 and Lmode != 4 and (id1, id2)

in (select S.Id1, S.Id2 from V$LOCK S where Request != 0 and S.Id1

= M.Id1 and S.Id2 = M.Id2) ) order by Id1, Id2, M.Request;

- What is LATCHWAIT column set to in V$PROCESS?

SELECT latchwait

FROM v$process

WHERE spid = <pid>;

- If not null, who's holding what latch?

column name format a32 heading 'LATCH NAME'

column pid heading 'HOLDER PID'

select c.name,a.addr,a.gets,a.misses,a.sleeps,

a.immediate_gets,a.immediate_misses,b.pid

from v$latch a, v$latchholder b, v$latchname c

where a.addr = b.laddr(+) and a.latch# = c.latch#

and c.name like '&latch_name%' order by a.latch#;

- Did the lock holder or latch holder shut off his/her

terminal without logging off? This can leave a zombie shadow

process holding the resource. Kill it with "alter system

kill session '<sid, serial# from v$session>'". If on sqlnet 2.1

or above, try setting sqlnet.expire_time in the sqlnet.ora

on the server side. (For more info, see page A-16 of

"Understanding SQL*NET" v2.1).

- If session is not frozen but slow, get session-specific

statistics:

SELECT s.sid, s.value, t.name

FROM v$sesstat s, v$statname t

WHERE s.statistic# = t.statistic#

AND s.sid = <sid>;

- If session is extremely slow or truly frozen, get session-

specific wait info:

SELECT *

FROM v$session_wait

where sid = <sid>;

- repeat above query 3 times.

SPOOL OFF;

- If this is distributed transaction run the following statements from EVERY

NODE involved in the transaction:

SPOOL 2pc.log;

SELECT * FROM dba_2pc_pending;

SELECT * FROM pending_sessions$;

SELECT * FROM pending_sub_sessions$;

SELECT * FROM dba_2pc_neighbors;

SPOOL OFF;

The following spool will help determine the cause in MTS.

SPOOL mts.log

- How busy the dispatchers are:

select name,network,status,

(busy /(busy + idle)) * 100 "% of time busy" from v$dispatchers

- query the shared servers view V$SHARED_SERVERS:

select name,status,requests, (busy /(busy + idle)) * 100 "% of time busy"

from v$shared_servers

Spool off

4. GATHER OS-LEVEL INFO

a. Briefly describe your architecture. Include number of

CPUs / disks. Are you using raw devices, NFS mounted

files, shared disks, RAM disks...Are you mirroring? etc.

b. Measure unusual OS level activity: excessive CPU or i/o,

paging, swapping, etc. There are other monitor tools such

as TOP. Sun has utilities on their webpage that can be downloaded

for monitoring the system.

If you're not comfortable with the

utilities listed below, be sure to do step 3 and,

if you have isolated a troubled process, the v$sesstat

output in step.

- On unix: use SAR, VMSTAT and NETSTAT (man sar)

- You can use truss, trace (man truss) or oradbx

(call support) for tracing specific processes.

- You can also unix debuggers such as dbx, xdb, adb or truss

to dump process activity. Example:

dbx -a <shadow pid> /*attach to hanging process*/

where /*dump Oracle stack */

detach /*detach from process*/

truss -o /tmp/mytruss -fae -p <pid of process>

- On vms: use MONITOR (help monitor)

- you can use analyze/process (help analyze) or orambx

(call support) for tracing specific processes.

- On novl: use LOAD MONITOR (gui - no online documentation)

- On Windows NT: use Performance Monitor, Event Monitor,

Dr. Watson (see Resource kit for documentation).

- use pview for monitoring specific processes

(resource kit).

c. Check the system log for additional information. On many platforms

it is at /var/adm the files are the message files.

5. GET SYSTEMSTATE and HANGANALYZE DUMPS.

This creates a large trace file in the user_dump_dest (30M or more is not

unusual).

Note: the init<sid>.ora parameter MAX_DUMP_FILE_SIZE controls

the maximum trace file size. Using Oradebug and setting unlimit will

allow a complete dump which we will need.

Do this step for sure if the entire database is frozen or nearly

frozen and if this condition came on suddenly and there are no archive

errors in the alert log. Please note: As systemstate dumps are instance

specific, they tend to be inconclusive with hanging problems involving

Oracle Parallel Server (OPS) unless you get them from each node. You will

need 3 system state dumps from each node for OPS.

NOTE: Do systemstate dump 3 times in a row, so we can determine whether

the processes are hung or active.

For Oracle 8.0.5.x to 8.1.7.x

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ svrmgrl

svrmgr> connect internal

svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';

wait 90 seconds

svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';

EXIT ... then reconnect

svrmgr>ALTER SESSION SET MAX_DUMP_FILE_SIZE=UNLIMITED;

svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';

wait 90 seconds

svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';

wait 90 seconds

svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';

For Oracle 9.2.0.1 or higher

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ sqlplus /nolog

connect / as sysdba

oradebug setmypid

oradebug unlimit

oradebug dump systemstate 10

6. GET BSTAT-ESTAT OR (for 8.1.6 or higher) STATSPACK output

If the entire database is suffering performance degradation .

See [NOTE:62161.1] System wide Tuning using UTLESTAT Reports in Oracle7/8

See [NOTE:149113.1] Installing and Configuring StatsPack Package

See [NOTE:149124.1] Creating a StatsPack performance report

Send this report to support along with the other files requested.

7. GENERATE A CORE DUMP AND RUN A DEBUGGER.

This step is a last resort. It is included to advise

you of an option when others are exhausted (e.g. when

systemstate dump hangs). You should be in contact with

support before proceeding. Do not send a core to Oracle unless requested.

Core: Debugger: Documentation:

Unix: kill -11 dbx, adb man

VMS: N/A analyze/system help

Windows NT: pview Dr. Watson resource kit

Novl: vdb3 or vdb4 vdb3 or vdb4 Appx of Ora7 Guide

for Netware

8. If certain processes get PROCESSSTATE dump.

- Trace the process from the os level (see step 4).

Get processstate dumps. Do 3 times. This generates a trace file in your

user_dump_dest

(from sqlplus: show parameter user_dump_dest).

$ sqlplus "/as sysdba"

oradebug setospid <process ID>

oradebug unlimit

oradebug dump processstate 10

Get errorstacks from the process. Do 3 times. This generates a trace

file in your user_dump_dest

(from sqlplus: show parameter user_dump_dest).

$ sqlplus "/as sysdba"

oradebug setospid <process ID>

oradebug unlimit

oradebug dump errorstack 3

9. Check List for diagnostics.

Go over the information you have gathered and then place it on Oracle External

FTP site.

Please include all configuration information releases and options used.

Please include any changes you made to the system recently hardware or software.

Please include results from locking scripts if this may be a locking issue.

If you did an analyze please include results.

Did you run an analyze.log if so include it?

Did you include init parameters and config files?

Did you include the statements that are having the problem?

If this is a tool such as import, export, sqlloader did you include the exact

statements used to run and the parameter files?

Did you answer the questions and include the answers?

Did you get the explain plan and tkprof output?

If you have a small reproducible test case did you include it?

n Statements that are being run.

n All table, index, view definitions.

n All constraints. In short everything it would take for us to reproduce it.

Did you trace the processes with a debugger? If so did you include the trace

from the debugger?

Have you included all trace files?

Have you included the alert<sid>.log?

If core dumps where generated did you translate them and send the stack?

(Do not send core files.)

Have you sent the sql*net trace files?

Did you include v_views.log?

Have you included session.log;

Did you spool session-specific wait info:

If this is distributed transaction have you included 2PC.log?

If MTS have you included mts.log?

Did you gather os-level information on CPU, Memory and I/O?

Did you GET SYSTEMSTATE DUMPS and palce these on ftp site?

Did you run BSTAT-ESTAT and include the report?

Did you include PROCESSSTATE and ERRORSTACK dumps.

APPENDIX A: RECOMMENDED BULLETINS

[NOTE:41634.1] TKPROF and Problem Solving

[NOTE:1020282.6] TFTS FORMATTED SELECT OF EXPLAIN PLAN