How to use hinted SQL to fix bad SQL plan

As I said in a previously blog Outline, SQL profile and SQL Plan Management(SPM), outline is the oldest approach to fix SQL plan, but it’s the easiest one to use, and works for all oracle version, and do NOT need extra license.

I’d like to elaborate the most frequently used scenario: a SQL is using a bad SQL plan. We can use SQL hints to get a GOOD plan. We want to use outline to fix the SQL plan to the GOOD one. In another word, we want to use a SQL plan with different SQL text(hints) but logically the same.

Let’s create an test scenario:
1. Create a test table


ALEX@ZENG: SQL> create table t (id number, name varchar2(100));

Table created.

ALEX@ZENG: SQL> insert into t select 1, object_name from dba_objects;

74562 rows created.

ALEX@ZENG: SQL> create index t_idx1 on t(id);

Index created.

ALEX@ZENG: SQL> exec dbms_stats.gather_table_stats('ALEX','T');

PL/SQL procedure successfully completed.

ALEX@ZENG: SQL> show parameter optimizer_features_enable

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      10.2.0.4

2. Create the bad SQL plan scenario


ALEX@ZENG: SQL> update t set id=rownum;

74563 rows updated.

ALEX@ZENG: SQL> commit;

Commit complete.
ALEX@ZENG: SQL> set autotrace traceonly
ALEX@ZENG: SQL> select * from t where id=1;

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2153619298

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 74556 |  1820K|    74   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    | 74556 |  1820K|    74   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=1)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        319  consistent gets
          0  physical reads
          0  redo size
        593  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

3. Use hint to get a better plan

--it's better to use index scan because there's only one row with id=1 now
--We can get the plan by adding hint

ALEX@ZENG: SQL> explain plan for select /*+ index(t t_idx1) */ * from t where id=1;

Explained.

ALEX@ZENG: SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3292636276

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        | 74556 |  1820K|   150   (2)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      | 74556 |  1820K|   150   (2)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | T_IDX1 | 74556 |       |    55   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

14 rows selected.

Since INDEX RANGE SCAN is better plan, we want to use it to replace the FULL TABLE SCAN plan.

Step 1. Get the bad SQL hash value


ALEX@ZENG: SQL> select address, hash_value, sql_text from v$sql where sql_text like 'select * from t%';

ADDRESS          HASH_VALUE SQL_TEXT
---------------- ---------- ------------------------------------------------------------
000000045F900938  716428968 select * from t where id=1
...

Step 2. Create a SQL file with hints

$ cat good.sql
select /*+ index(t t_idx1) */ * from t where id=1;
$

Step 3. Fix the SQL plan using script fix_plan.sh

#!/bin/bash

if [ $# -lt 3 ]; then
cat <<EOF
Fix given SQL plan using given sql in file
  usage:    fix_plan.sh <hash_value> <hinted_sql_file> <owner>
  examples: fix_plan.sh 716428968 good.sql ALEX
EOF
exit 1
fi

HASH_VALUE=$1
SQL_FILE=$2
OL_OWNER=$3

echo "HASH_VALUE : $HASH_VALUE"
echo "SQL_FILE   : $SQL_FILE"
echo "OL_OWNER   : $OL_OWNER"
echo ""

#Create outline by hash_value
function create_ol_from_hashvalue {
HASH_VALUE=$1
OL_NAME=$2
OL_OWNER=$3

#generate create outline sql
#I didn't use dbms_outln.create_outline, because it cannot create given name outline
# and there's no hash value in V$SQL and DBA_OUTLINES to associate the two 
# according to "How To Match a SQL Statement to a Stored Outline (Doc ID 743312.1)"
sqlplus -S "/ as sysdba" > /tmp/tmp_$OL_NAME.sql <<EOF
set feedback off
set serveroutput on size unlimited
declare
v_sqltext varchar2(32000);
begin
  --get sql text
  select dbms_lob.substr(SQL_FULLTEXT, 30000, 1 ) into v_sqltext from v\$sql where hash_value = $HASH_VALUE and rownum=1;

  dbms_output.put_line('alter session set current_schema = $OL_OWNER;');
  v_sqltext := 'create or replace outline $OL_NAME for category TEMP_PLAN on ' || chr(10) || v_sqltext || chr(10) ||';';
  dbms_output.put_line(v_sqltext);
  dbms_output.put_line('exit;');
end;
/
EOF

sqlplus -S "/ as sysdba" @/tmp/tmp_$OL_NAME.sql
}

#Create outline from sql file
function create_ol_from_sqlfile {
SQL_FILE=$1
OL_NAME=$2
OL_OWNER=$3

#generate create outline sql
cat > /tmp/tmp_$OL_NAME.sql <<EOF
alter session set current_schema = $OL_OWNER;
create or replace outline $OL_NAME for category TEMP_PLAN on
`cat $SQL_FILE`
exit;
EOF

sqlplus -S "/ as sysdba" @/tmp/tmp_$OL_NAME.sql

}

#Exchange outlines, make GOOD_SQL plan to GOOD_PLAN category
#Ref: How to Edit a Stored Outline to Use the Plan from Another Stored Outline (Doc ID 730062.1)
function exchange_outline {
OL1=$1
OL2=$2
OL_OWNER=$3

sqlplus -S "/ as sysdba" <<EOF
set feedback off
alter session set current_schema = $OL_OWNER;
create private outline OLFROM from $OL1;
create private outline OLTO from $OL2;
update ol$ set hintcount=(select hintcount from ol$ where ol_name='OLTO') where ol_name='OLFROM';
delete from ol$ where ol_name='OLTO';
update ol$ set ol_name='OLTO' where ol_name='OLFROM';
commit;
execute dbms_outln_edit.refresh_private_outline('OLTO');
create or replace outline $OL1 from private OLTO for category GOOD_PLAN;
drop outline $OL2;
exit;
EOF

}

#display outline
function display_outline {
OL_NAME=$1
OL_OWNER=$2

sqlplus -S "/ as sysdba" <<EOF
set pagesize 1000 linesize 160
set long 32000
col hint format a55
col join_pos format a45
col owner    format a12
col name     format a18
col ts       format a14
col h        format 999
col category format a12
col sql_text format a80
col used     format a6

select name, sql_text, category, used, to_char(TIMESTAMP, 'YY-mm-dd hh24:MI')
 from  dba_outlines
where name = '$OL_NAME' and OWNER = '$OL_OWNER';

select ol_name name, category, hint#, stage# stage, hint_text hint, join_pred join_pos
 from outln.ol\$hints
 where ol_name = '$OL_NAME'
 order by ol_name, hint#;
exit;
EOF
}

#main function
echo "1. Create outline OL_$HASH_VALUE for SQL $HASH_VALUE"
create_ol_from_hashvalue $HASH_VALUE OL_$HASH_VALUE $OL_OWNER

echo "2. Create outline OL_TEMP for SQL in $SQL_FILE"
create_ol_from_sqlfile $SQL_FILE OL_TEMP $OL_OWNER

echo "3. Exchange outline OL_$HASH_VALUE with OL_TEMP, and drop OL_TEMP"
exchange_outline OL_$HASH_VALUE OL_TEMP $OL_OWNER

echo "4. Display final outline for SQL $HASH_VALUE : OL_$HASH_VALUE in category GOOD_PLAN "
display_outline OL_$HASH_VALUE $OL_OWNER

Run it:

$./fix_plan.sh 716428968 good.sql ALEX
HASH_VALUE : 716428968
SQL_FILE : good.sql
OL_OWNER : ALEX

1. Create outline OL_716428968 for SQL 716428968

Session altered.

Outline created.

2. Create outline OL_TEMP for SQL in good.sql

Session altered.

Outline created.

3. Exchange outline OL_716428968 with OL_TEMP, and drop OL_TEMP
4. Display final outline for SQL 716428968 : OL_716428968 in category GOOD_PLAN

NAME SQL_TEXT CATEGORY USED TO_CHAR(TIMEST
------------------ -------------------------------------------------------------------------------- ------------ ------ --------------
OL_716428968 select * from t where id=1 GOOD_PLAN UNUSED 13-12-29 19:39

1 row selected.

NAME CATEGORY HINT# STAGE HINT JOIN_POS
------------------ ------------ ---------- ---------- ------------------------------------------------------- ---------------------------------------------
OL_716428968 GOOD_PLAN 1 1 OUTLINE_LEAF(@"SEL$1")
OL_716428968 GOOD_PLAN 2 1 IGNORE_OPTIM_EMBEDDED_HINTS
OL_716428968 GOOD_PLAN 3 1 OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
OL_716428968 GOOD_PLAN 4 1 DB_VERSION('11.2.0.2')
OL_716428968 GOOD_PLAN 5 1 OPT_PARAM('_b_tree_bitmap_plans' 'false')
OL_716428968 GOOD_PLAN 6 1 OPT_PARAM('_index_join_enabled' 'false')
OL_716428968 GOOD_PLAN 7 1 OPT_PARAM('_optim_peek_user_binds' 'false')
OL_716428968 GOOD_PLAN 8 1 OPT_PARAM('_optimizer_skip_scan_enabled' 'false')
OL_716428968 GOOD_PLAN 9 1 OPT_PARAM('optimizer_index_cost_adj' 30)
OL_716428968 GOOD_PLAN 10 1 INDEX_RS_ASC(@"SEL$1" "T"@"SEL$1" ("T"."ID"))

10 rows selected.

Step 4. Flush the bad SQL plan

--if it's 10g, use grant to invalidate the SQL plan
SQL> grant select on ALEX.T to dba;

--if it's 11g, use procedure DBMS_SHARED_POOL.PURGE
SQL> select ADDRESS, HASH_VALUE from V$SQL where hash_value = 716428968;

ADDRESS          HASH_VALUE
---------------- ----------
000000045F900938  716428968

1 row selected.

SQL> exec DBMS_SHARED_POOL.PURGE ('000000045F900938, 716428968','C');

PL/SQL procedure successfully completed.

Step 5. Check the new plan

--it's IMPORTANT to set use_stored_outlines to use the stored outline category if not yet
--You can set it at session level and test the plan first if needed
SQL> alter system set use_stored_outlines = GOOD_PLAN;

System altered.

ALEX@ZENG: SQL> set autotrace traceonly
ALEX@ZENG: SQL> select * from t where id=1;

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3292636276

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    27 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    27 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX1 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

Note
-----
   - outline "OL_716428968" used for this statement

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        185  consistent gets
          0  physical reads
          0  redo size
        597  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

There is a problem caused by “ghost” parameter use_stored_outlines. Why?
You cannot set it in inital file, and you cannot see it after setting it, and it’s lost its value after DB restart.
So we need to create a trigger to set it automatically when DB start:

SYS@ZENG: SQL> show parameter use_stored_outlines
SYS@ZENG: SQL>
SYS@ZENG: SQL> create or replace trigger enable_outlines_trig
--Ref : How to Enable USE_STORED_OUTLINES Permanently (Doc ID 560331.1)
after startup on database
begin
execute immediate('alter system set use_stored_outlines=GOOD_PLAN');
end;
/

Trigger created.

Until then, you can have a good sleep without worry about the bad SQL plan causes DB high session/load.

How to add hint for SQL using oracle view

A developer asked me to tune a SQL recently. It turned out that the column is very skewed, and Oracle cannot find the optimized plan. Analyzing the table column with histograms may solve the problem, may not solve the issue because it’s quite sensitive to data selected, and it may have side impacts to other SQLs. As the developer can change the SQL easily, I choose to add hints for the SQL. The SQL is based on view which is different from adding hints on tables. Here is the steps to add hints for a SQL on view:

Step 1. Get the sub-query name of the table we want to add hint on:

SQL>  explain plan for select parent_task_id,task_id, title, status_label from bugtracking where BUGTRACKING_STATUS <5;

Explained.

ARADMIN@TRACE:prod SQL> select * from table(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2453567761

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       | 25202 |  2928K|  3280   (5)| 00:00:19 |
|   1 |  NESTED LOOPS OUTER|       | 25202 |  2928K|  3280   (5)| 00:00:19 |
|*  2 |   TABLE ACCESS FULL| T508  | 25202 |  2534K|  3272   (5)| 00:00:19 |
|*  3 |   INDEX UNIQUE SCAN| IB508 |     1 |    16 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$E213D7C1
   2 - SEL$E213D7C1 / T508@SEL$3  <-- this is the subquery name in SQL/view
   3 - SEL$E213D7C1 / B508@SEL$2

....

We want to add an index hint on table T508. It’s in query block SEL$3.

Step 2. Add hints at the subquery

SQL>  explain plan for       select/*+ index(@SEL$3 T508 I508_660000000_1) */ parent_task_id,task_id, title, status_label from bugtracking where BUGTRACKING_STATUS <5;

Explained.

ARADMIN@TRACE:prod SQL> select * from table(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2546157666

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  | 25202 |  2928K|  9024   (1)| 00:00:51 |
|   1 |  NESTED LOOPS OUTER          |                  | 25202 |  2928K|  9024   (1)| 00:00:51 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T508             | 25202 |  2534K|  9016   (1)| 00:00:51 |
|*  3 |    INDEX RANGE SCAN          | I508_660000000_1 | 25202 |       |    40   (3)| 00:00:01 |
|*  4 |   INDEX UNIQUE SCAN          | IB508            |     1 |    16 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$E213D7C1
   2 - SEL$E213D7C1 / T508@SEL$3
   3 - SEL$E213D7C1 / T508@SEL$3
   4 - SEL$E213D7C1 / B508@SEL$2

As you can see, the index hint on view format is /*+ index(SUBQUERY_NAME TABLE_NAME INDEX_NAME) */

Debug a simple PK update SQL takes 500+ seconds in Oracle

We run into a problem recently in Oracle database at Solaris. A simple PK update SQL takes 500+ seconds to finish. It didn’t happen all the time, less than 1% chances. It causes app timeout when it happens.

We observed more wait time on “SQL*Net Message to client” in statspack data, no other obviously abnormal events beside this. We suspect it’s caused by network issue. SA told us the network usage is only 20%~30%. We did ping test from db host to client. It didn’t show any obviously problems.

Later, we did a test to switch 2 DB’s host in a db family. It makes the problem moved to the other host. At this point, we are sure it’s not DB problem. It must be host or network issue. Later we found the “TCP retransmission” rate is high at the problematic host. Finally, SA changed the network interface to fix this issue.

We can check the “TCP retransmission” using below scripts:

while true loop
do
 echo -n "`date` : "
 netstat -s -P tcp | grep tcpRetrans
 sleep 1
done;

You can reference this webpage for more about the TCP retrasmission issue.

Enqueue Waits in Oracle Database 10g

Enqueue waits events usually causes problems in busy Oracle databases. But oracle online document only have some of them until 11gR2, not sure why. I paste these information here for quick reference.

In Oracle Database 10g Release 1, each enqueue type is represented by its own wait event, making it much easier to understand exactly what type of enqueue the session is waiting for. You do not need to decipher the values from the P1, P2, P3, P1RAW, P2RAW, and P3RAW columns in the V$SESSION_WAIT or the V$SESSION view.

The following table lists all the enqueue waits in Oracle Database 10g Release 1 and describes what the enqueue is for. This information is available in the X$KSQST structure. The aggregated statistics for each of these enqueue types is displayed by the view V$ENQUEUE_STAT.

Enqueue Type Description
enq: AD – allocate AU Synchronizes accesses to a specific OSM (Oracle Software Manager) disk AU
enq: AD – deallocate AU Synchronizes accesses to a specific OSM disk AU
enq: AF – task serialization Serializes access to an advisor task
enq: AG – contention Synchronizes generation use of a particular workspace
enq: AO – contention Synchronizes access to objects and scalar variables
enq: AS – contention Synchronizes new service activation
enq: AT – contention Serializes alter tablespace operations
enq: AW – AW$ table lock Allows global access synchronization to the AW$ table (analytical workplace tables used in OLAP option)
enq: AW – AW generation lock Gives in-use generation state for a particular workspace
enq: AW – user access for AW Synchronizes user accesses to a particular workspace
enq: AW – AW state lock Row lock synchronization for the AW$ table
enq: BR – file shrink Lock held to prevent file from decreasing in physical size during RMAN backup
enq: BR – proxy-copy Lock held to allow cleanup from backup mode during an RMAN proxy-copy backup
enq: CF – contention Synchronizes accesses to the controlfile
enq: CI – contention Coordinates cross-instance function invocations
enq: CL – drop label Synchronizes accesses to label cache when dropping a label
enq: CL – compare labels Synchronizes accesses to label cache for label comparison
enq: CM – gate Serializes access to instance enqueue
enq: CM – instance Indicates OSM disk group is mounted
enq: CT – global space management Lock held during change tracking space management operations that affect the entire change tracking file
enq: CT – state Lock held while enabling or disabling change tracking to ensure that it is enabled or disabled by only one user at a time
enq: CT – state change gate 2 Lock held while enabling or disabling change tracking in RAC
enq: CT – reading Lock held to ensure that change tracking data remains in existence until a reader is done with it
enq: CT – CTWR process start/stop Lock held to ensure that only one CTWR (Change Tracking Writer, which tracks block changes and is initiated by the alter database enable block change tracking command) process is started in a single instance
enq: CT – state change gate 1 Lock held while enabling or disabling change tracking in RAC
enq: CT – change stream ownership Lock held by one instance while change tracking is enabled to guarantee access to thread-specific resources
enq: CT – local space management Lock held during change tracking space management operations that affect just the data for one thread
enq: CU – contention Recovers cursors in case of death while compiling
enq: DB – contention Synchronizes modification of database wide supplemental logging attributes
enq: DD – contention Synchronizes local accesses to ASM (Automatic Storage Management) disk groups
enq: DF – contention Enqueue held by foreground or DBWR when a datafile is brought online in RAC
enq: DG – contention Synchronizes accesses to ASM disk groups
enq: DL – contention Lock to prevent index DDL during direct load
enq: DM – contention Enqueue held by foreground or DBWR to synchronize database mount/open with other operations
enq: DN – contention Serializes group number generations
enq: DP – contention Synchronizes access to LDAP parameters
enq: DR – contention Serializes the active distributed recovery operation
enq: DS – contention Prevents a database suspend during LMON reconfiguration
enq: DT – contention Serializes changing the default temporary table space and user creation
enq: DV – contention Synchronizes access to lower-version Diana (PL/SQL intermediate representation)
enq: DX – contention Serializes tightly coupled distributed transaction branches
enq: FA – access file Synchronizes accesses to open ASM files
enq: FB – contention Ensures that only one process can format data blocks in auto segment space managed tablespaces
enq: FC – open an ACD thread LGWR opens an ACD thread
enq: FC – recover an ACD thread SMON recovers an ACD thread
enq: FD – Marker generation Synchronization
enq: FD – Flashback coordinator Synchronization
enq: FD – Tablespace flashback on/off Synchronization
enq: FD – Flashback on/off Synchronization
Enqueue Type Description
enq: FG – serialize ACD relocate Only 1 process in the cluster may do ACD relocation in a disk group
enq: FG – LGWR redo generation enq race Resolves race condition to acquire Disk Group Redo Generation Enqueue
enq: FG – FG redo generation enq race Resolves race condition to acquire Disk Group Redo Generation Enqueue
enq: FL – Flashback database log Synchronizes access to Flashback database log
enq: FL – Flashback db command Synchronizes Flashback Database and deletion of flashback logs
enq: FM – contention Synchronizes access to global file mapping state
enq: FR – contention Begins recovery of disk group
enq: FS – contention Synchronizes recovery and file operations or synchronizes dictionary check
enq: FT – allow LGWR writes Allows LGWR to generate redo in this thread
enq: FT – disable LGWR writes Prevents LGWR from generating redo in this thread
enq: FU – contention Serializes the capture of the DB feature, usage, and high watermark statistics
enq: HD – contention Serializes accesses to ASM SGA data structures
enq: HP – contention Synchronizes accesses to queue pages
enq: HQ – contention Synchronizes the creation of new queue IDs
enq: HV – contention Lock used to broker the high watermark during parallel inserts
enq: HW – contention Lock used to broker the high watermark during parallel inserts
enq: IA – contention Information not available
enq: ID – contention Lock held to prevent other processes from performing controlfile transaction while NID is running
enq: IL – contention Synchronizes accesses to internal label data structures
Enqueue Type Description
enq: IM – contention for blr Serializes block recovery for IMU txn
enq: IR – contention Synchronizes instance recovery
enq: IR – contention2 Synchronizes parallel instance recovery and shutdown immediate
enq: IS – contention Synchronizes instance state changes
enq: IT – contention Synchronizes accesses to a temp object’s metadata
enq: JD – contention Synchronizes dates between job queue coordinator and slave processes
enq: JI – contention Lock held during materialized view operations (such as refresh, alter) to prevent concurrent operations on the same materialized view
enq: JQ – contention Lock to prevent multiple instances from running a single job
enq: JS – contention Synchronizes accesses to the job cache
enq: JS – coord post lock Lock for coordinator posting
enq: JS – global wdw lock Lock acquired when doing wdw ddl
enq: JS – job chain evaluate lock Lock when job chain evaluated for steps to create
enq: JS – q mem clnup lck Lock obtained when cleaning up q memory
enq: JS – slave enq get lock2 Gets run info locks before slv objget
enq: JS – slave enq get lock1 Slave locks exec pre to sess strt
enq: JS – running job cnt lock3 Lock to set running job count epost
enq: JS – running job cnt lock2 Lock to set running job count epre
enq: JS – running job cnt lock Lock to get running job count
enq: JS – coord rcv lock Lock when coord receives msg
enq: JS – queue lock Lock on internal scheduler queue
enq: JS – job run lock – synchronize Lock to prevent job from running elsewhere
enq: JS – job recov lock Lock to recover jobs running on crashed RAC inst
Enqueue Type Description
enq: KK – context Lock held by open redo thread, used by other instances to force a log switch
enq: KM – contention Synchronizes various Resource Manager operations
enq: KP – contention Synchronizes kupp process startup
enq: KT – contention Synchronizes accesses to the current Resource Manager plan
enq: MD – contention Lock held during materialized view log DDL statements
enq: MH – contention Lock used for recovery when setting Mail Host for AQ e-mail notifications
enq: ML – contention Lock used for recovery when setting Mail Port for AQ e-mail notifications
enq: MN – contention Synchronizes updates to the LogMiner dictionary and prevents multiple instances from preparing the same LogMiner session
enq: MR – contention Lock used to coordinate media recovery with other uses of datafiles
enq: MS – contention Lock held during materialized view refresh to set up MV log
enq: MW – contention Serializes the calibration of the manageability schedules with the Maintenance Window
enq: OC – contention Synchronizes write accesses to the outline cache
enq: OL – contention Synchronizes accesses to a particular outline name
enq: OQ – xsoqhiAlloc Synchronizes access to olapi history allocation
enq: OQ – xsoqhiClose Synchronizes access to olapi history closing
enq: OQ – xsoqhistrecb Synchronizes access to olapi history globals
enq: OQ – xsoqhiFlush Synchronizes access to olapi history flushing
enq: OQ – xsoq*histrecb Synchronizes access to olapi history parameter CB
enq: PD – contention Prevents others from updating the same property
enq: PE – contention Synchronizes system parameter updates
Enqueue Type Description
enq: PF – contention Synchronizes accesses to the password file
enq: PG – contention Synchronizes global system parameter updates
enq: PH – contention Lock used for recovery when setting proxy for AQ HTTP notifications
enq: PI – contention Communicates remote Parallel Execution Server Process creation status
enq: PL – contention Coordinates plug-in operation of transportable tablespaces
enq: PR – contention Synchronizes process startup
enq: PS – contention Parallel Execution Server Process reservation and synchronization
enq: PT – contention Synchronizes access to ASM PST metadata
enq: PV – syncstart Synchronizes slave start_shutdown
enq: PV – syncshut Synchronizes instance shutdown_slvstart
enq: PW – prewarm status in dbw0 DBWR0 holds this enqueue indicating pre-warmed buffers present in cache
enq: PW – flush prewarm buffers Direct Load needs to flush prewarmed buffers if DBWR0 holds this enqueue
enq: RB – contention Serializes OSM rollback recovery operations
enq: RF – synch: per-SGA Broker metadata Ensures r/w atomicity of DG configuration metadata per unique SGA
enq: RF – synchronization: critical ai Synchronizes critical apply instance among primary instances
enq: RF – new AI Synchronizes selection of the new apply instance
enq: RF – synchronization: chief Anoints 1 instance’s DMON (Data Guard Broker Monitor) as chief to other instance’s DMONs
enq: RF – synchronization: HC master Anoints 1 instance’s DMON as health check master
enq: RF – synchronization: aifo master Synchronizes critical apply instance failure detection and failover operation
enq: RF – atomicity Ensures atomicity of log transport setup
Enqueue Type Description
enq: RN – contention Coordinates nab computations of online logs during recovery
enq: RO – contention Coordinates flushing of multiple objects
enq: RO – fast object reuse Coordinates fast object reuse
enq: RP – contention Enqueue held when resilvering is needed or when data block is repaired from mirror
enq: RS – file delete Lock held to prevent file from accessing during space reclamation
enq: RS – persist alert level Lock held to make alert level persistent
enq: RS – write alert level Lock held to write alert level
enq: RS – read alert level Lock held to read alert level
enq: RS – prevent aging list update Lock held to prevent aging list update
enq: RS – record reuse Lock held to prevent file from accessing while reusing circular record
enq: RS – prevent file delete Lock held to prevent deleting file to reclaim space
enq: RT – contention Thread locks held by LGWR, DBW0, and RVWR (Recovery Writer, used in Flashback Database operations) to indicate mounted or open status
enq: SB – contention Synchronizes logical standby metadata operations
enq: SF – contention Lock held for recovery when setting sender for AQ e-mail notifications
enq: SH – contention Enqueue always acquired in no-wait mode; should seldom see this contention
enq: SI – contention Prevents multiple streams table instantiations
enq: SK – contention Serialize shrink of a segment
enq: SQ – contention Lock to ensure that only one process can replenish the sequence cache
enq: SR – contention Coordinates replication / streams operations
enq: SS – contention Ensures that sort segments created during parallel DML operations aren’t prematurely cleaned up
Enqueue Type Description
enq: ST – contention Synchronizes space management activities in dictionary-managed tablespaces
enq: SU – contention Serializes access to SaveUndo Segment
enq: SW – contention Coordinates the ‘alter system suspend’ operation
enq: TA – contention Serializes operations on undo segments and undo tablespaces
enq: TB – SQL Tuning Base Cache Update Synchronizes writes to the SQL Tuning Base Existence Cache
enq: TB – SQL Tuning Base Cache Load Synchronizes writes to the SQL Tuning Base Existence Cache
enq: TC – contention Lock held to guarantee uniqueness of a tablespace checkpoint
enq: TC – contention2 Lock during setup of a unique tablespace checkpoint in null mode
enq: TD – KTF dump entries KTF dumping time/scn mappings in SMON_SCN_TIME table
enq: TE – KTF broadcast KTF broadcasting
enq: TF – contention Serializes dropping of a temporary file
enq: TL – contention Serializes threshold log table read and update
enq: TM – contention Synchronizes accesses to an object
enq: TO – contention Synchronizes DDL and DML operations on a temp object
enq: TQ – TM contention TM access to the queue table
enq: TQ – DDL contention DDL access to the queue table
enq: TQ – INI contention TM access to the queue table
enq: TS – contention Serializes accesses to temp segments
enq: TT – contention Serializes DDL operations on tablespaces
enq: TW – contention Lock held by one instance to wait for transactions on all instances to finish
Enqueue Type Description
enq: TX – contention Lock held by a transaction to allow other transactions to wait for it
enq: TX – row lock contention Lock held on a particular row by a transaction to prevent other transactions from modifying it
enq: TX – allocate ITL entry Allocating an ITL entry in order to begin a transaction
enq: TX – index contention Lock held on an index during a split to prevent other operations on it
enq: UL – contention Lock held used by user applications
enq: US – contention Lock held to perform DDL on the undo segment
enq: WA – contention Lock used for recovery when setting watermark for memory usage in AQ notifications
enq: WF – contention Enqueue used to serialize the flushing of snapshots
enq: WL – contention Coordinates access to redo log files and archive logs
enq: WP – contention Enqueue to handle concurrency between purging and baselines
enq: XH – contention Lock used for recovery when setting No Proxy Domains for AQ HTTP notifications
enq: XR – quiesce database Lock held during database quiesce
enq: XR – database force logging Lock held during database force logging mode
enq: XY – contention Lock used by Oracle Corporation for internal testing

Reference : Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning

We can get the description of p1, p2 and p3 at v$event_name. You can also get them from p1text, p2text and p3text from v$session if the event is happening right now.

I pasted the enq event here for reference:

NAME PARAMETER1 PARAMETER2 PARAMETER3
enq: AD – allocate AU name|mode group and disk number AU number
enq: AD – deallocate AU name|mode group and disk number AU number
enq: AF – task serialization name|mode task id 0
enq: AG – contention name|mode workspace # generation
enq: AM – client registration name|mode id1 id2
enq: AM – rollback COD reservation name|mode id1 id2
enq: AM – shutdown name|mode id1 id2
enq: AO – contention name|mode workspace # object #
enq: AS – modify service name|mode 0 0
enq: AS – service activation name|mode 0 0
enq: AT – contention name|mode 0 0
enq: AU – audit index file name|mode XML audit index file 0
enq: AW – AW generation lock name|mode operation workspace #
enq: AW – AW state lock name|mode operation workspace #
enq: AW – AW$ table lock name|mode operation workspace #
enq: AW – user access for AW name|mode operation workspace #
enq: BF – PMON Join Filter cleanup name|mode node#/parallelizer# bloom#
enq: BF – allocation contention name|mode node#/parallelizer# bloom#
enq: BR – file shrink name|mode operation file #
enq: BR – proxy-copy name|mode operation file #
enq: CF – contention name|mode 0 operation
enq: CI – contention name|mode opcode type
enq: CL – compare labels name|mode object # 0
enq: CL – drop label name|mode object # 0
enq: CM – gate name|mode disk group # type
enq: CM – instance name|mode disk group # type
enq: CN – race with init name|mode reg id 0
enq: CN – race with reg name|mode reg id 0
enq: CN – race with txn name|mode reg id 0
enq: CT – CTWR process start/stop name|mode operation operation parm
enq: CT – change stream ownership name|mode operation operation parm
enq: CT – global space management name|mode operation operation parm
enq: CT – local space management name|mode operation operation parm
enq: CT – reading name|mode operation operation parm
enq: CT – state name|mode operation operation parm
enq: CT – state change gate 1 name|mode operation operation parm
enq: CT – state change gate 2 name|mode operation operation parm
enq: CU – contention name|mode handle handle
enq: DB – contention name|mode EnqMode 0
enq: DD – contention name|mode disk group type
enq: DF – contention name|mode 0 file #
enq: DG – contention name|mode disk group type
enq: DL – contention name|mode object # 0
enq: DM – contention name|mode type type
enq: DN – contention name|mode 0 0
enq: DP – contention name|mode 0 0
enq: DR – contention name|mode 0 0
enq: DS – contention name|mode 0 0
enq: DT – contention name|mode 0 0
enq: DV – contention name|mode object # 0
enq: DX – contention name|mode transaction entry # 0
enq: FA – access file name|mode disk group number file number
enq: FB – contention name|mode tablespace # dba
enq: FC – open an ACD thread name|mode disk group thread
enq: FC – recover an ACD thread name|mode disk group thread
enq: FD – Flashback coordinator name|mode Internal Internal
enq: FD – Flashback on/off name|mode Internal Internal
enq: FD – Marker generation name|mode Internal Internal
enq: FD – Restore point create/drop name|mode Internal Internal
enq: FD – Tablespace flashback on/off name|mode Internal Internal
enq: FG – FG redo generation enq race name|mode disk group type
enq: FG – LGWR redo generation enq race name|mode disk group type
enq: FG – serialize ACD relocate name|mode disk group type
enq: FL – Flashback database log name|mode Log # zero
enq: FL – Flashback db command name|mode Log # zero
enq: FM – contention name|mode 0 0
enq: FP – global fob contention name|mode low file obj add high file obj add
enq: FR – contention name|mode disk group unused
enq: FS – contention name|mode 0 type
enq: FT – allow LGWR writes name|mode disk group thread
enq: FT – disable LGWR writes name|mode disk group thread
enq: FU – contention name|mode 0 0
enq: HD – contention name|mode disk group 0
enq: HP – contention name|mode tablespace # dba
enq: HQ – contention name|mode object # hash value
enq: HV – contention name|mode object # 0
enq: HW – contention name|mode table space # block
enq: IA – contention name|mode 0 0
enq: ID – contention name|mode 0 0
enq: IL – contention name|mode object # 0
enq: IM – contention for blr name|mode pool # 0
enq: IR – contention name|mode 0 0/1
enq: IR – contention2 name|mode 0 0/1
enq: IS – contention name|mode 0 type
enq: IT – contention name|mode object # 0
enq: JD – contention name|mode 0 0
enq: JI – contention name|mode view object # 0
enq: JQ – contention name|mode 0 0
enq: JS – contention name|mode service ID queue type
enq: JS – evt notify name|mode service ID queue type
enq: JS – evtsub add name|mode service ID queue type
enq: JS – evtsub drop name|mode service ID queue type
enq: JS – job recov lock name|mode service ID queue type
enq: JS – job run lock – synchronize name|mode service ID queue type
enq: JS – q mem clnup lck name|mode service ID queue type
enq: JS – queue lock name|mode service ID queue type
enq: JS – sch locl enqs name|mode service ID queue type
enq: JS – wdw op name|mode service ID queue type
enq: KK – context name|mode 0 redo thread
enq: KM – contention name|mode type type
enq: KO – fast object checkpoint name|mode 2 0
enq: KP – contention name|mode 0 0
enq: KT – contention name|mode plan # 0
enq: MD – contention name|mode master object # 0
enq: MH – contention name|mode 0 0
enq: MK – contention name|mode 0 0
enq: ML – contention name|mode 0 0
enq: MN – contention name|mode session ID 0
enq: MO – contention name|mode 0 0
enq: MR – contention name|mode 0 or file # type
enq: MS – contention name|mode master object # 0
enq: MW – contention name|mode Schedule Id 0
enq: OC – contention name|mode 1 2
enq: OL – contention name|mode hash value 0
enq: OQ – xsoq*histrecb name|mode resource id 0
enq: OQ – xsoqhiAlloc name|mode resource id 0
enq: OQ – xsoqhiClose name|mode resource id 0
enq: OQ – xsoqhiFlush name|mode resource id 0
enq: OQ – xsoqhistrecb name|mode resource id 0
enq: OW – initialization name|mode 0 0
enq: OW – termination name|mode 0 0
enq: PD – contention name|mode property name key hash
enq: PE – contention name|mode parno 0
enq: PF – contention name|mode 0 0
enq: PG – contention name|mode 0 0
enq: PH – contention name|mode 0 0
enq: PI – contention name|mode operation serial #
enq: PL – contention name|mode 0 0
enq: PR – contention name|mode 0 0
enq: PS – contention name|mode instance slave ID
enq: PT – contention name|mode disk group # type
enq: PV – syncshut name|mode 0 0
enq: PV – syncstart name|mode 0 0
enq: PW – flush prewarm buffers name|mode 0 0
enq: PW – perwarm status in dbw0 name|mode 0 0
enq: RB – contention name|mode disk group 0
enq: RF – RF – Database Automatic Disable name|mode lock operation lock value
enq: RF – RF – FSFO Observed name|mode lock operation lock value
enq: RF – RF – FSFO connectivity name|mode lock operation lock value
enq: RF – RF – FSFO state name|mode lock operation lock value
enq: RF – RF – FSFO synchronization name|mode lock operation lock value
enq: RF – RF – FSFO wait name|mode lock operation lock value
enq: RF – atomicity name|mode lock operation lock value
enq: RF – new AI name|mode lock operation lock value
enq: RF – synch: DG Broker metadata name|mode lock operation lock value
enq: RF – synchronization: HC master name|mode lock operation lock value
enq: RF – synchronization: aifo master name|mode lock operation lock value
enq: RF – synchronization: chief name|mode lock operation lock value
enq: RF – synchronization: critical ai name|mode lock operation lock value
enq: RN – contention name|mode thread number log number
enq: RO – contention name|mode 2 0
enq: RO – fast object reuse name|mode 2 0
enq: RP – contention name|mode file # 1 or block
enq: RR – contention name|mode lock# not used
enq: RS – file delete name|mode record type record id
enq: RS – persist alert level name|mode record type record id
enq: RS – prevent aging list update name|mode record type record id
enq: RS – prevent file delete name|mode record type record id
enq: RS – read alert level name|mode record type record id
enq: RS – record reuse name|mode record type record id
enq: RS – write alert level name|mode record type record id
enq: RT – contention name|mode redo thread type
enq: RU – contention name|mode 0 0
enq: RU – waiting name|mode 0 0
enq: RW – MV metadata contention name|mode table obj# 0
enq: SB – contention name|mode 0 0
enq: SE – contention name|mode Session-id Serial#
enq: SF – contention name|mode 0 0
enq: SH – contention name|mode 0 0
enq: SI – contention name|mode object # 0
enq: SK – contention name|mode tablespace # dba
enq: SQ – contention name|mode object # 0
enq: SR – contention name|mode operation sequence # / apply #
enq: SS – contention name|mode tablespace # dba
enq: ST – contention name|mode 0 0
enq: SU – contention name|mode table space # 0
enq: SW – contention name|mode 0 0
enq: TA – contention name|mode operation undo segment # / other
enq: TB – SQL Tuning Base Cache Load name|mode 1 2
enq: TB – SQL Tuning Base Cache Update name|mode 1 2
enq: TC – contention name|mode checkpoint ID 0
enq: TC – contention2 name|mode checkpoint ID 0
enq: TD – KTF dump entries name|mode 0 0
enq: TE – KTF broadcast name|mode 0 0
enq: TF – contention name|mode tablespace # relative file #
enq: TL – contention name|mode 0 0
enq: TM – contention name|mode object # table/partition
enq: TO – contention name|mode object # 1
enq: TQ – DDL contention name|mode QT_OBJ# 0
enq: TQ – INI contention name|mode QT_OBJ# 0
enq: TQ – TM contention name|mode QT_OBJ# 0
enq: TS – contention name|mode tablespace ID dba
enq: TT – contention name|mode tablespace ID operation
enq: TW – contention name|mode 0 operation
enq: TX – allocate ITL entry name|mode usn<<16 | slot sequence
enq: TX – contention name|mode usn<<16 | slot sequence
enq: TX – index contention name|mode usn<<16 | slot sequence
enq: TX – row lock contention name|mode usn<<16 | slot sequence
enq: UL – contention name|mode id 0
enq: US – contention name|mode undo segment # 0
enq: WA – contention name|mode 0 0
enq: WF – contention name|mode 0 0
enq: WL – contention name|mode log # / thread id # sequence #
enq: WP – contention name|mode 0 0
enq: WR – contention name|mode thread id # sequence #
enq: XH – contention name|mode 0 0
enq: XQ – recovery name|mode disk group # unused
enq: XQ – relocation name|mode disk group # unused
enq: XR – database force logging name|mode operation 0
enq: XR – quiesce database name|mode operation 0
enq: XY – contention name|mode id1 id2
enq: ZG – contention name|mode file group id version id

For the P1 values which is “name|mode”, we can decode it like this:

select chr(bitand(p1,-16777216)/16777215)||chr(bitand(p1, 16711680)/65535) "Name",
bitand(p1, 65535) "Mode"
from v$session;

# Expert Oracle Practices # Troubleshooting Latch Contention

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Troubleshooting Latch Contention by Riyaj Shamsudeen, is very helpful, for both understanding the theory behind and practice in read world.

Latches, what is it? Enqueues?
Latches are a serialization mechanism to protect critical resources in the system global area (SGA) from concurrent modifications. A latch is implemented as a few bytes of memory.The value of a latch’s memory area indicates whether that latch is available or not.

Latches are handled exclusively by the Oracle RDBMS engine. Users have no ability to control latch operations directly.

Latches are very different from enqueues (also termed locks). Enqueues are used to lock objects or rows to avoid simultaneous conflicting operations on tables or rows. Enqueues have their own complex data structures, and the resources in question can be locked in one of six available modes. In contrast, latches can be acquired only in either share mode or exclusive mode. Another difference is that enqueues have queuing structures, and they are acquired in a strict serial fashion. By contrast, latches have no queuing structures.

Latches classes:
Latches can be grouped into two classes: solitaire and non-solitaire latches.
Solitaire latche, only one latch, a single latch that allows only one process to modify the critical resource it
protects.
Non-solitaire latches, has a Parent latch, and many Child Latches. The non-solitaire group can further be broken into parent and child latches.

The cache buffers chains latch (CBC latch) is a typical example of a non-solitaire latch. There are thousands of child latches for this parent latch.

Views about latch
Latches are externalized in dynamic views such as v$latch, v$latch_children, and v$latch_parent. The view v$latch_children maintains statistics for latches at the child level, while v$latch maintains aggregated statistics for all child latches, and v$latch_parent maintains statistics at the parent level.

Rem Example of solitaire latches
col name format A35
select name, latch# , count(*) from v$latch_children group by name, latch#
having count(*) =1 order by 1;
NAME LATCH# COUNT(*)
----------------------------------- ---------- ----------
TLCR meta context 126 1
channel handle pool latch 33 1
ges resource scan list 51 1
granule operation 42 1
redo allocation 115 1

Rem Example of non-solitaire latches
select name, latch# , count(*) from v$latch_children group by name, latch#
having count(*) >1 order by 1;
NAME LATCH# COUNT(*)
----------------------------------- ---------- ----------
KCL freelist parent latch 120 10
KCL gc element parent latch 118 250
KCL name table parent latch 119 832
KJC message pool free list 58 3
KJCT flow control latch 66 33
…

What’s spin & sleep? Why ?
Before we dive into these modes, let’s take a brief look at sleeps and spins. Algorithms presented in the following section use sleeps and spins if a latch is not available. If a latch is not immediately available, then the process might need to sleep before trying to acquire the latch again. Sleep is the process of giving up CPU cycles voluntarily and “sleeping” for a specific time-out period. Sleep results in costly context switches. Typically, latches are held for very short durations on the order of a few nanoseconds, and in many cases it is better to avoid the context switches. Thus, Oracle Database code makes a preset number of attempts to acquire a latch before going to sleep. This process is known as spinning.

Statistics of spins and sleeps
The spin_gets value indicates the number of times a process was able to acquire a latch with just spinning alone, without the need to sleep. The sleep_gets statistic indicates the number of sleeps encountered trying to acquire a latch. As the value of sleep_gets increases, performance worsens.

What’s Immediate Mode
In the immediate mode of latch acquisition, Oracle code tries to acquire any of the available child latches (of a given latch family) without waiting, and it tries to acquire the last child latch in a willing-to-wait mode. For example, redo copy latches must be held before copying redo records to the log buffer. There are several Redo copy latches configured in an SGA, and it is sufficient to hold any one of the Redo copy latch children to copy the redo records to the log buffer. Oracle Database’s internal code tries to acquire any of those child latches s in a round-robin fashion. If none of the child latches are available, then the code tries to acquire the last child latch in willing-to-wait mode.

A pseudocode algorithm to illustrate immediate-mode latch acquisition

Try to acquire first child latch selected pseudo randomly.
If latch acquired
  Increment gets counter
  Exit loop and proceed to modify critical resource
Else
  Step X: Try to acquire next child latch in a round-robin fashion.
  If latch is available then
    Increment immediate_gets counter
    Exit loop and proceed to modify critical resource.
  Else
    Increment immediate_misses counter
    Go to step X until all but one child latches tried.
  End
  If all the child latches except one has been tried, then
    Try that last child latch in willing-to-wait mode.
    While (latch is not available)
    Loop
      if latch acquired
        increment gets counter
        exit loop and proceed to modify critical resource.
      Else
        For ( req_count=0;
        req_count < spin_count && latch not acquired;
        req_count++
        )
        Loop
          Spin and try to acquire latch;
        End loop;
        If the latch is acquired
          Increment misses counter
          Increment spin_gets counter
          exit loop and proceed to modify the critical resource
        else
          while (latch is not available)
          loop
            Sleep (sleep duration increases slowly until reaching
            _max_exponential_sleep).
            Increment sleeps counter.
          End loop;
          Increment misses counter.
          Exit loop and proceed to modify the critical resource.
        End if
      End if
    End loop;
    Increment gets counter;
    Exit loop and proceed to modify the critical resource
  End if
End if

What’s Willing-to-Wait Mode
Willing-to-wait mode involves the practice of requesting a latch and waiting until the latch becomes available. Successful attempts to acquire the latch are termed gets. Failed attempts are termed misses.

For example, before modifying a cache buffers chains in the SGA, the child latch protecting that hash chain must be held. (a solitaire latch, the resource being protected is protected by only that latch)

Latch acquisition in willing-to-wait mode

Try to acquire the latch in willing-to-wait mode.
While (latch is not available)
Loop
  if latch acquired
    increment gets counter.
    Exit loop and proceed to modify critical resource.
  Else
    Increment misses counter.
    For ( req_count=0;
    req_count < spin_count && latch not acquired;
    req_count++
    )
    Loop
      Spin and try to acquire latch;
    End loop;
    If the latch is acquired
      Increment spin_gets counter.
      Increment misses counter.
      exit loop and proceed to modify critical resource
    else
      while (latch is not available)
      loop
        Sleep (sleep duration increases slowly until
        reaching _max_exponential_sleep).
        Increment sleeps counter.
        Try to acquire the latch.
      End loop;
      Increment misses counter.
      Exit loop and proceed to modify critical resource.
    End if
  End if
  Try to acquire the latch;
End loop;
Exit loop and proceed to modify critical resource

What’s Latch-Wait Posting Mode?
The third method of latch acquisition relies on latch wait posting. This method uses more sleeps, thus reducing spin. Excessive spin can lead to high CPU usage, and latch wait posting effectively addresses that problem. When the holder of a long wait latch is ready to release the latch, it will scan through the waiters list and post the processes waiting for the latch to indicate latch availability. The waiting processes largely sleep until the latch is available and posted.

Two latches general categories:
short-wait, using immediate mode or willing-to-wait mode
long-wait latches, using spinning also, but processes go to sleep while waiting for long wait latches, thus reducing the CPU usage associated with latch spin.(I am not sure whether it’s still true in 11g)

For example, a cache buffers chains latch is a short wait latch, meaning that most probably this latch is held for a very short time.

The library cache latch is a long-wait latch. It can take longer to search for a suitable object in library cache hash chain, and so the library cache latch may be held for longer time.

What’s Cache Buffers Chains (CBC) ?
Cache buffers chains (CBC) latch contention is the most common cause of latch contention in the real world.
The Buffer cache is split into multiple data block buffers. Every data block buffer has an associated buffer header pointing to that buffer. These buffer headers (externalized in x$bh or in the dynamic view v$bh) are hanging from a hash chain. Multiple buffer headers are chained together to create a hash chain, and a hash bucket is in the head of this chain. The buffer header associated with a database block can only be chained to a specific hash chain.

How a database block is read into the buffer cache by Oracle Database code?
1. Identify the database block to be read from the disk.
2. Check to see if the block is already in the buffer cache: Apply a hash function over the DBA of the block to identify the hash bucket. If that buffer is in the buffer cache, it will be in the hash chain associated with the bucket.
3. Acquire the CBC latch protecting that hash chain. Walk the hash chain to see if that block with a specific version can be found.
4. If a block is found in the hash chain, use that block. If the block is not found, find a free buffer in the buffer cache, delink the buffer header for that buffer from its current chain, and link that buffer header to the hash chain under the protection of the CBC latch. Pin the buffer header, release the latch child, and read the block into this buffer.

Causes of CBC Latch Contention?
1. Longer cache buffers chains.
Even if the processes are trying to access completely different objects, because the same latch is protecting all buffer headers linked in that hash chain, that child latch can inadvertently cause latch contention. This problem is mostly resolved from Oracle Database version 10g, since the number of hash buckets2 configured by default increased dramatically in that release.The number of buffer cache hash buckets is configured by _db_block_hash_buckets. The number of CBC latches is configured by _db_block_hash_latches.

2. Heavy concurrent access of an index root block or branch blocks.
In a tight, nested-loops join based upon a unique or primary key, the associated index’s root block must be accessed very frequently.
Consider the access path shown in Listing 12-8. For each row from the outer row source at step 130, probe the inner table through a unique key column. To find a specific join key, the database engine will access the root block of the RCV_SHIPMENT_HEADERS_U1 index. Next, the engine
will access branch blocks, and then leaf blocks. If the query is executed by hundreds of concurrent processes, then each of those processes will access the root/branch/leaf blocks thousands of times per second. That heavy concurrent access can result in numerous processes trying to access the hash chain looking for the root block of the index, in turn leading to CBC latch contention.

A tight, nested-loops join
| 129 | NESTED LOOPS | | 1 |
| 130 |   TABLE ACCESS BY INDEX ROWID | RCV_TRANSACTIONS_INTERFACE |1339 |
|*131 | INDEX RANGE SCAN | CUS_RCV_TXNS_INTERFACE_C3 |1339 |
|*132 |   TABLE ACCESS BY INDEX ROWID | RCV_SHIPMENT_HEADERS | 1 |
|*133 |    INDEX UNIQUE SCAN | RCV_SHIPMENT_HEADERS_U1 | 1 |

3. Heavy concurrent access of a leaf block.
Such access typically happens if concurrent inserts are made into a nonpartitioned table with a monotonically increasing sequence of primary or unique keys. Sequence-based values will be in a strict numeric order. Since primary or unique indexes are populated with these sequence values, recently generated values tend to be populated in the same leaf block. CBC latches protecting the hash chains of those buffers will be heavily accessed, leading to CBC latch contention.

4. Heavily accessed small table.
In a nested loops join method, if the inner row source is a small table, and if it is not indexed, then for every row from the outer row source, the inner row source will be accessed using a full table scan. The result can be much higher concurrent access to the few blocks used by that table, which in turn may lead to CBC latch contention for the latches protecting that small table’s blocks.

A heavily accessed small table
| 129 | NESTED LOOPS | | 1 |
| 130 |  TABLE ACCESS BY INDEX ROWID | RCV_TRANSACTIONS_INTERFACE |1339 |
|*131 |    INDEX RANGE SCAN | CUS_RCV_TXNS_INTERFACE_T3 |1339 |
|*132 |  FULL TABLE SCAN | MTL_PARAMETERS | 1 |

5. High consistent block generation.
This makes it necessary to create buffer clones consistent with the SELECT statement. Buffer clones are created by applying undo records to a copy of the buffers being cloned, creating consistent versions of the blocks—known as consistent read (CR) block generation. The process of cloning buffers can result in higher access to the undo header block, and to undo blocks

6. Many sessions performing full table scan on one or few tables
This means that many processes will compete for the same blocks or range of blocks, and that competition can in turn lead to CBC latch contention.

How-to Analyzing CBC Latch Contention?
Step 1: Review the Distribution of Gets
To analyze CBC latch contention, you need to find whether the latch contention is widespread across many child latches or is limited to a smaller set of child latches. You can use the view v$latch_children to determine which is the case.

Querying to see the distribution of child latches
Select * from (
select child#, gets, misses,sleeps,
rank() over (partition by latch# order by gets desc ) rnk_gets,
rank() over (partition by latch# order by misses desc )rnk_misses,
rank() over (partition by latch# order by sleeps desc ) rnk_sleeps
from v$latch_children where name = 'cache buffers chains'
)
where rnk_gets <=20 and rnk_misses <=20 and rnk_sleeps <=20 order by rnk_gets / 

Step 2: Identify the SQL Statements Involved Your next step is to identify SQL statements involved in the latch contention.
Use the SQL in Listing 12-10 to show the top SQL statements suffering from latch contention. Keep in mind that any given SQL statement identified by the query in Listing 12-10 could be merely a victim of latch contention, and not a root cause

Listing 12-10. Querying to see Latch Free events and SQL hash values
select substr(w.event, 1, 28) event, s.sql_hash_value, count(*)
from v$session_wait w, v$session s, v$process p
where s.sid=w.sid
and p.addr = s.paddr
and s.username is not null
and w.event not like '%pipe%'
and w.event not like 'SQL*%'
group by substr(w.event, 1, 28), sql_hash_value
order by 3
/

EVENT SQL_HASH_VALUE COUNT(*)
------------------------------ -------------- ----------
enqueue 3740270 1
enqueue 747790152 1
...
latch free 622474477 3
latch free 1509082258 58

You need to drill down to identify the specific latch type at the root of the contention. Column p2 in v$sesssion_wait identifies the latch# in the v$latch dynamic performance view.

Listing 12-11. Finding the latch type and SQL hash value
Set lines 160 pages 100
Column event format A35
Column name format A35
select x.event, x.sql_hash_value,
case when x.event like 'latch%' then
l.name
else ' '
end name,
x.cnt from (
select substr(w.event, 1, 28) event, s.sql_hash_value, w.p2,count(*) cnt
from v$session_wait w, v$session s, v$process p
where s.sid=w.sid
and p.addr = s.paddr
and s.username is not null
and w.event not like '%pipe%'
and w.event not like 'SQL*%'
group by substr(w.event, 1, 28), sql_hash_value,w.p2
) x,
v$latch l
where
x.p2 = l.latch#(+)
order by cnt
/
EVENT SQL_HASH_VALUE NAME CNT
----------------------------------- -------------- ------------------------- ----
…
Latch free 1509082258 cache buffers chains 56

From Oracle Database version 10g onward, Active Session History4 (ASH) provides sufficient information to identify SQL statements suffering from CBC latch contention.

Listing 12-12. Querying Active Session History to identify SQL_IDs
select event, sql_id, sql_child_number, count(*) cnt from v$active_session_history where event like 'latch%' and sample_time > sysdate-(1/24)
group by event, sql_id, sql_child_number
order by 4 desc
/

It is possible that another process is holding the latches in Listing 12-12 and thus not allowing SQL from the application in question to acquire those latches. This scenario is rare but needs to be analyzed. You need to verify that the hash value acquiring or holding the latch matches the hash value from Listing 12-11. You should execute the SQL in Listing 12-13 repeatedly (at least 10 times) to verify that the hash
values match.

Listing 12-13. Querying for SQL hash values acquiring or holding a latch
select s.sql_hash_value, lh.* from
v$latchholder lh, v$session s where
lh.sid=s.sid
order by s.sql_hash_value
/
SQL_HASH_VALUE PID SID LADDR NAME GETS
-------------- ------- ---------- ---------------- ------------------- ----------
0 418 10766 0000000ABBF5E3E8 library cache pin 16775103
1509082258 418 10766 0000000ACA8A1158 cache buffers chains 461959

Step 3: Identify Objects Causing the Contention
The script in Listing 12-14 queries to find objects that are hanging from the hash chain protected by those child latches. Output is sorted in part by the tch column, placing the more suspect objects first in the output. Objects with higher tch values are problematic and most probably causing the latch contention. You may need to execute the query a few times to get consistent data while latch contention is occurring. In the example output, you can see that the inv.mtl_onhand_quantities table is causing the contention.

Listing 12-14. Querying to find objects involved in latch contention
with bh_lc as
(select
lc.addr, lc.child#, lc.gets, lc.misses, lc.immediate_gets, lc.immediate_misses,
lc.spin_gets, lc.sleeps,
bh.hladdr, bh.tch tch, bh.file#, bh.dbablk, bh.class, bh.state, bh.obj
from
v$session_wait sw,
v$latchname ld,
v$latch_children lc,
x$bh bh
where lc.addr =sw.p1raw
and sw.p2= ld.latch#
and ld.name='cache buffers chains'
and lower(sw.event) like '%latch%'
and bh.hladdr=lc.addr
)
select bh_lc.hladdr, bh_lc.tch, o.owner, o.object_name, o.object_type,
bh_lc.child#,
bh_lc.gets, bh_lc.misses, bh_lc.immediate_gets,
bh_lc.immediate_misses, spin_gets, sleeps
from
bh_lc, dba_objects o
where bh_lc.obj = o.data_object_id(+)
order by 1,2 desc
/
HLADDR TCH OWNER OBJECT_NAME OBJECT_TYP
---------------- --- ------- ------------------------------ ----------...
0000001029B81010 203 INV MTL_ONHAND_QUANTITIES_DETAIL TABLE...
0000001029B81010 81 ONT OE_ORDER_HOLDS_ALL_C3 INDEX...
...

Step 4: Review Execution Plan of Problem Statements
In step 2, SQL statements causing latch contention were identified. In step 3 objects involved in CBC latch contention were identified. Data from steps 2 and 3 can be correlated and matched. This correlation can be done by reviewing the execution plans of the SQL statements from step 2.

How-to resolve CBC Latch Contention?
1. Eliminate Full Table Scan on Small Tables

2. Eliminate Full Index Scan on Small Indices
Similar to full scans on small tables, full index scans on smaller indices also will result in many concurrent processes accessing a small set of buffers, in turn leading to CBC latch contention. Eliminate full index scans with proper indexing techniques such as careful selection of index columns and choosing proper index types (such as btree or bitmap indexes).

3. Eliminate Leaf Block Contention with Partitioning
Processes that insert rows with primary key values generated from a sequence will insert rows in the rightmost leaf block of the associated primary or unique index btree. All concurrent inserts will suffer from CBC latch contention accessing that leaf block. The root cause here is that synthetic keys are inserted in just a few leaf blocks of index.
To eliminate leaf block contention, you can hash partition the involved table on the primary key.
Also, if a table cannot be partitioned, you may be able to convert the primary key and unique key indices into hash-partitioned indices over the nonpartitioned table

4. Favor Hash Joins Over Tightly-Nested Loop Joins
If the number of rows from the outer row source in a nested loops (NL) join is very high, and the inner probed table is small, processes can contend for the latches that protect the small inner table. Sometimes you can resolve this contention by converting the nested loops join to a hash join.

5. Tune Inefficient Indices
Table blocks are accessed for each row returned from an index scan. Excessive accesses to the table blocks can cause CBC latch contention. When too many blocks are scanned with index range scans (because of index inefficiency), concurrent processes will contend for the same set of buffers and latches, and that leads to CBC latch contention.

6. Reduce CPU Usage
If CPU usage is very high in the database server, that can also lead to latch contention. Ironically, high CPU usage is also a side effect of latch contention. Knowing whether high CPU usage is the cause or the result of contention can be a challenge. If you suspect latch contention from high CPU usage, you should reduce CPU usage and then measure for latch contention again.

What’s Shared Pool Latch ?
Shared pool latches are held while allocating or releasing space from the shared pool. Essentially, shared pool child latches protect changes to the shared pool free lists.
Prior to Oracle version 9i, the shared pool was just one big heap protected by just one shared pool latch, and it was a major contributor to latch contention. But from Oracle version 9i onward, the shared pool is split into multiple subheaps, and each subheap is protected by a shared pool child latch. This change has reduced the incidence of trouble with shared pool latch contention. Still, it is important to understand the problem of shared pool latch contention.

Structures in the Shared Pool
Each of the subheaps in the shared pool is further split into smaller subheaps. The fixed table x$ksmss externalizes shared pool subheaps.

Listing 12-20. Query to list shared pool subheaps
select ksmdsidx, count(*) , sum(ksmsslen) from x$ksmss
where ksmdsidx>0
group by ksmdsidx
order by 1
/
KSMDSIDX COUNT(*) SUM(KSMSSLEN)
---------- ---------- -------------
1 229 268435536
2 237 268435520
3 241 268435520
4 235 268435512
5 234 268439728
6 247 268435616
7 248 268435552

Listing 12-21. Query showing the number of shared pool latches
select count(*) from v$latch_children where name='shared pool'
COUNT(*)
----------
7

The number of subheaps (and so the number of shared pool latches) is dependent upon initialization factors such as shared pool size, granule size, a few underscore parameters, the Oracle Database version, and even a few OS parameters.

Shared Pool Free Lists
Oracle’s internal algorithm tries to find a chunk at least as big as the request, and the search for a free chunk is not cheap.
From Oracle Database release 9i onward, these free chunks of memory are grouped by their size and arranged as free lists from a hash bucket. This arrangement of free chunks grouped by size improves free space search.

Shared Pool Reserved Free Lists
A small percentage of the shared pool is set aside as the shared pool reserved area. Each subheap of the shared pool allocates a fraction of itself as part of the shared pool reserved area. And even though the shared pool reserved area has separate free lists, it shares the same latch with the non-reserved area.
If there is no free space in the main part of the shared pool, and if a requested chunk size is above a threshold determined by the _shared_pool_reserved_min_alloc initialization parameter, then the chunk is allocated from the shared pool reserved area.

Causes of Shared Pool Latch Contention?
Shared pool latch contention is almost always due to shared pool fragmentation. If the shared pool is fragmented, then the shared pool latch will be held longer during this search.
1. Shared pool fragmentation
While a process is trying to allocate a chunk big enough to satisfy an allocation request, recreatable or freeable chunks can be flushed off the shared pool and those chunks need to be coalesced to create a bigger chunk. That coalescing might involve walking shared pool freelists many times and flushing chunks until suitable chunks can be found or created. This activity is performed under the protection of the shared pool child latch leading to latch contention

2. Incorrect configuration
If a pool is incorrectly or inadequately configured, then chunks may be allocated from the shared pool instead of, say, the large pool. For example, if the large pool is not configured, then RMAN communication buffers or parallel query slave communication buffers may
be allocated from the shared pool. Allocation of these buffers can deplete and fragment shared pool free space, leading to shared pool latch contention.

3. Reduced shared pool subheaps.
Multiple shared pool subheaps are very important to performance, since it is critical to have multiple shared pool latches. Each shared pool subheap is protected by a shared pool child latch. If the number of shared pool subheaps is reduced, then the number of shared pool child latches also will be reduced

4. High CPU usage in the server.
If CPU usage is very high in the server, then the process holding a latch might not get enough CPU cycles and go to sleep prematurely before completing the code path. That sleep can result in the process holding the latch longer than is ideal. Fortunately, this problem has been solved in many platforms through the use of facilities to prevent a process from being preempted on the CPU. Such an option is referred to as the no_preempt option. But the no_preempt option is not available in all platforms, and without such an option it becomes important to reduce CPU usage to resolve latch contention.

5. Too-frequently flushed shared pool.
Flushing the shared pool too often can lead to latch contention. This typically happens in databases upgraded from Oracle Database version 8i or earlier. Flushing the shared pool involves flushing unpinned, recreatable chunks. Latches must be held during the time in which the flushing occurs. The result is increased activity on the shared pool latches.

How-to analyze Shared pool Latch Contention?
Step 1: Review Distribution Among Child Latches
Check whether the distribution among various child latches of the shared pool is uniform. If the distribution of gets and misses is skewed, meaning that a few child latches have much more demand than others in the shared pool, it may mean that those few subheaps are undergoing intense pressure.

Listing 12-24 shows a situation in which just a few shared-pool child latches are acquired much
more frequently than other child latches. Such a situation suggests higher activity in a few shared pool
subheaps.

select child#, gets , misses, immediate_gets, immediate_misses
from v$latch_children where name='shared pool';
CHILD# GETS MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
---------- ---------- ---------- -------------- ----------------
7 280 0 0 0
6 280 0 0 0
5 280 0 0 0
4 280 0 0 0
3 12907260 1035751 0 0
2 15153322 1276786 0 0
1 12826219 1154988 0 0
7 rows selected.

If the gets among children are skewed, then it is important to understand the type of chunks allocated in those subheaps

You can delve into the problem in at least two ways. Statspack and AWR reports have a section that will show the increase/decrease in various shared pool area sizes.

Listing 12-25. Statspack output showing SGA differences
SGA breakdown difference for DB: APDB Instance: APDB3 Snaps: 242425 -242426
Pool Name Begin value End value % Diff
------ ------------------------------ ---------------- ---------------- -------
shared library cache 44,448,832 44,463,992 0.03
shared messages 2,080,000 2,080,000 0.00
shared miscellaneous 40,601,264 38,964,144 -4.03
shared parameters 188,648 188,648 0.00
shared partitioning d 228,600 228,600 0.00
shared pl/sql source 3,288 3,288 0.00
shared processes 13,760,000 13,760,000 0.00
shared qmps connections 4,842,200 4,842,200 0.00

Another method of researching the problem is to query the fixed table x$ksmss. It can be queried to see which area has grown bigger. The SQL in Listing 12-26 prints the top 20 areas that are consuming space in the shared pool.
You may need to run a query or report multiple times. In some cases, it may necessary to monitor shared pool areas with a custom script to detect abnormal growth.

Listing 12-26. Query against x$ksmss to see which area is bigger
select ksmdsidx,ksmssnam, size_area from(
select ksmdsidx, ksmssnam,sum(ksmsslen) size_area from x$ksmss
where ksmssnam!='free memory'
group by ksmdsidx, ksmssnam
order by 3 desc
)
Where rownum<21; 

Step 2: Inspect Shared Pool Fragmentation
In most cases, shared pool latch contention is caused by shared pool free list fragmentation.

Step 3: Review Objects Causing Flushing
The fixed table x$ksmlru keeps track of the most recent flushes of the shared pool. If there is no chunk big enough to accommodate an allocation request, then chunks that can be recreated or freed might need to be flushed. This object flushing also will increase the activity on the shared pool latches. In Listing 12-30, column KSMLRNUM indicates the number of items flushed to accommodate an allocation request. KSMLRCOM shows the allocation comment for the allocation request. Reviewing that column will provide valuable clues about why there is higher activity on shared pool latches.

 Listing 12-30. Query to identify objects that are flushing the shared pool
Set lines 160 pages 100
Spool ksmlru.lst
Select * from x$ksmlru order by ksmlrnum;
Spool off

Interestingly, querying x$ksmlru will remove rows from that fixed table. Thus, it is better to spool the output to a file.

Step 4: Identify SQL Statements Using Literal Values
Another common reason—probably the most common—for shared pool fragmentation is repeated hard parsing of the SQL statements due to use of literal values in those statements However, most of these SQL statements will have the same execution plan, and hence the same plan hash value. The SQL in Listing 12-31 uses that fact to identify SQL statements that do not use bind variables.

Listing 12-31. Query to see the top 20 SQL statements that use literal values
select * from (
select plan_hash_value, count(distinct(hash_value)), sum(executions),
sum(parse_calls)
from v$sql
group by plan_hash_value
having count(distinct(hash_value)) > 10
order by 2 desc
) where rownum<21;
PLAN_HASH_VALUE COUNT(DISTINCT(HASH_VALUE)) SUM(EXECUTIONS) SUM(PARSE_CALLS)
--------------- --------------------------- --------------- ----------------
511709263 4939 4939 4939
1438036477 4907 4907 4907
3532438726 4814 4814 4814
...

How-to Resolve Shared Pool Latch Contention?
1. Avoid Unnecessary Heaps
Configure large pool, etc.

2. Avoid and Reduce Fragmentation by Sharing SQL Statements
SQL statements that are executed very frequently with literal values must be converted to use bind variables.

3. Avoid Setting _kghdsidx_count to 1
Set the _kghdsidx_count parameter to 1, oracle will allocates just one shared pool heap with just one shared pool latch. This means that the shared pool latch becomes a solitaire latch, and this can lead to shared pool latch contention.

4. Avoid Flushing the Shared Pool
You can flush the shared pool with the alter system flush shared_pool command. Flushing a shared pool throws away recreatable and freeable chunks. These chunks must be added to shared pool free lists under the protection of a shared pool child latch. Flushing the shared pool will increase activity against shared pool latches artificially and can lead to shared pool latch contention.
Sometimes flushing the shared pool can improve performance. However, if you find that flushing the shared pool improves performance, the correct action is not to continue to flush, but to find the root cause of the fragmentation and resolve that.

5. Avoid Shared Pool Reserved Free List Fragmentation
The shared pool reserved free list is searched only if there is not enough contiguous space in the main free list, and if an allocation request is larger than the value specified by the _shared_pool_reserved_min_alloc parameter. Up until version 11g, this parameter defaults to 4200. The
default value is sufficient in many cases. Setting the value of the parameter lower can lead to fragmentation in the reserved pool, in turn leading to latch contention.

What’s Library Cache Latch ?
The library cache holds parsed representations of SQL statements, execution heaps of procedures, functions, and packages. There are many other items, such as table definitions.

The library cache is organized as an array of hash buckets. Hash chains are hanging from these hash buckets. Library cache objects are attached to these hash chains, and these objects point to various heaps in the shared pool. Changes and inspections to these library cache hash chains are protected by library cache latches. The library cache latch is a non-solitaire latch, and there are many child latches protecting changes to the hash chains.

The parsing step involves too many operations to discuss in detail, but following is a subset of what happens:
1. First, a unique hash value is created from the SQL text, or from the text of the library cache object.
2. Using that hash value, a library cache bucket number is derived.
3. That library cache bucket is protected by a library cache child latch. The parsing process holds that latch while searching the hash chain for a match on the hash value (together with a few other criteria).
4. If match is found, then the parsed representation of the SQL statement (or other object) exists, and no additional hard parsing is needed.
5. If there is no match for a hash value, then an additional hard parsing step must be completed. A new library cache object must be added to the hash chain with pointers to various heaps for the object being parsed. Any change to library cache hash chains is done under the protection of a library cache child latch. Hard parsing can lead to contention for those child latches.

Causes of Library Cache Latch Contention?
Library cache latch contention is almost always caused by excessive parsing or by nonsharable child cursors. Use of literal values in SQL is the most common cause of library cache and shared pool latch contention.
1. Excessive parsing:
Caused by literal values in SQL
2. Side effects of hard parsing:
If there is no free space available to accommodate an incoming allocation request, existing cursor heaps might need to be flushed from the shared pool/library cache. This increases activity against shared pool and library cache latches
3. Creation of excessive child cursors:
That lack of sharing will result in longer hash chains, and in many objects hanging from any one chain. Processes will be searching through these longer hash chains, increasing activity against library cache child latches

How-to Analyze Library Cache Latch Contention?
The process for analyzing library cache latch contention is almost identical to analyzing shared pool latch contention.

How -to Resolve Library Cache Latch Contention
1. Use Bind Variables
It’s especially important to use bind variables for statements that have a high number of executions.

2. Avoid Flushing the Shared Pool
Unpinned recreatable chunks from the library cache are thrown away during a shared pool flush operation.Flushing artificially induces a higher amount of hard parse calls.

3. Adjust the session_cached_cursors Parameter
If a cursor is found in the session cursor cache, then the parsing step is cheaper than otherwise. Increase of this parameter comes at a cost, since cursors that are in the session cursor cache are pinned in the shared pool, and this can result in higher space usage in the shared pool.

4. Adjust the cursor_space_for_time Parameter to True
If there is a cursor open pointing to a cursor heap, then that cursor heap is less likely to be flushed out. Retaining the cursor heap improves the performance of subsequent executions of that SQL statement at the cost of increased shared pool usage.
As of Oracle Database version 10.2.0.5 and 11.1.0.7, this parameter is deprecated.

5. Control and Limit Histograms
If your application is collecting histograms on all columns, then rethink that practice, and narrow the scope of collection to only those columns that matter. That’s avoid unnecessary SQL plan for same SQL.

Library Cache Latches and Mutexes
From Oracle version 10g onward, most library cache latches have been replaced by mutexes.
Library cache objects are locked and pinned under the protection of a mutex associated with the parent cursor. Still, library cache hash chains are protected by library cache latches, and having longer hash chains can result in library cache latch contention.
Version 11g improved this area further. Each bucket in the library cache has its own mutex, and scanning through the library cache hash chain is also performed under the protection of a mutex governing that specific bucket.

What’s Enqueue Hash Chains Latch?
Enqueue hash chain latches protect structures handling enqueues, commonly known as locks. Higher amounts of enqueue activity can lead to enqueue hash chains latch contention

For example, if a process is trying to acquire a table level lock (lock type TM), then the process allocates a resource structure having
the combination of that table’s object_id and lock type as a resource key.

Following is a summary of the operational details of the enqueue latching mechanism
1. A hashing function is applied on a string with a combination of . For example, if the table scott.emp is to be locked, and that table has anobject ID of 6509, and if the object type is TM for table, then a hashing function is applied over the string “(TM, 6509)”.
2. The result of the hashing function uniquely identifies a hash bucket to which the resource will always be chained. Changes to or inspection of the hash chain are protected by the enqueue hash chains child latches. So, while holding an enqueue hash chains child latch, that hash chain is searched to see if that resource is already allocated.
3. If the resource exists in the hash chain, then the process tries to see whether the resource is locked in a compatible mode with the lock request mode (assuming that there are no other waiters). If the resource is available or if the resource is locked in compatible mode, then a locking structure is added to the holders’ queue.
4. If there is another process waiting to lock the resource, then a locking structure is added to the waiters’ queue, and the process waits until the resource is available.
5. If the resource does not exist in the hash chain at all, then a new resource structure is added to the enqueue hash chain. A locking structure is also added to the holders’ queue.
6. Changes to the enqueue hash chain are protected by enqueue hash chains latches.

If there is an enormous amount of locking activity, typically in the case of high-end OLTP applications, it is conceivable that there will be a higher amount of activity against enqueue structures and enqueue hash chains child latches. This leads to enqueue hash chains latch contention.

Causes of Enqueue Hash Chains Latch Contention?
1. Excessive locking activity:
OLTP applications tend to have thousands of short transactions per second. Thousands of such short transactions will result in excessive activity against resource structures, lock structures, and enqueue hash chains latches.

2. Deadlock detection algorithm:
Oracle’s deadlock detection algorithm is invoked periodically, and it holds a parent enqueue hash chains latch (thereby holding all child latches) while searching for deadlocks between processes. If there are numerous processes, locks, and resource combinations in the database concurrently, then the deadlock detection algorithm can take long enough to result in contention for enqueue hash chains latches.

How-to Analyze Enqueue Hash Chains Latch Contention?
Step 1: Review Distribution Among Child Latches.

Listing 12-35. Skew among enqueue hash chains child latches
select * from (
select child#, gets, misses, sleeps
from v$latch_children where name like 'enqueue hash%'
order by gets desc )
where rownum <=20;
CHILD# GETS MISSES SLEEPS
---------- ---------- ---------- ----------
28 12297672 211726 0
20 284429 2151 1
...

Step 2: Identify the Enqueue Causing Latch Contention
You can get the below result from the session wait event info in v$session also. p2 is the object_id.
An appropriate action is to enable SQL Trace at level 8 (with waits) on a process suffering from latch contention.

Listing 12-36. An example of SQL trace output lines (version 9.2.0.8)
WAIT #1: nam='enqueue' ela= 301 p1=1414332420 p2=9086 p3=0
WAIT #1: nam='latch free' ela= 1 p1=-1437405580 p2=19 p3=0
WAIT #1: nam='latch free' ela= 1 p1=-1437405580 p2=19 p3=1
WAIT #1: nam='latch free' ela= 1 p1=-1437405580 p2=19 p3=2

From the analysis in this section we can safely conclude that latches protecting TM enqueues are causing performance issues. We also know that the object_id causing trouble is 9086. Querying dba_objects for that ID will reveal the object_name and owner. You need to repeat this exercise for several times to confirm your analysis.

Step 3: Identify Lock Types with High Gets

Listing 12-39. Top 20 lock types by activity
-- v$enqueue_statistics top 20 only..
select * from (
select eq_name, eq_type, total_req#, total_wait#, succ_req#, failed_req#
from v$enqueue_statistics
order by total_Req# desc)
where rownum<21
/
EQ_NAME EQ TOTAL_REQ# TOTAL_WAIT# SUCC_REQ# FAILED_REQ#
------------------------- -- ---------- ----------- ---------- -----------
DML TM 18554 17630 18155 400
Session Migration SE 17709 0 17725 0
...

How-to Resolve Enqueue Hash Chains Latch Contention
1. Avoid Excessive Short Transactions
If possible, modify the code to avoid or reduce this type of activity, and try grouping small transactions into bigger ones.
Note that worrying about short transactions only matters if the lock type behind the contention is TM or TX.

2. Disable Table Level Locks
In some cases, it may be beneficial to avoid table-level locks

Listing 12-40. Disabling table level lock
Alter table emp disable table lock;
alter table backup.emp add (n2 number)
*
ERROR at line 1:
ORA-00069: cannot acquire lock -- table locks disabled for EMP
Alter table emp enable table lock;

Disabling table level locks means that there can be no DDL statements on the target table. But in a production OLTP database suffering from latch contention, not being able to issue DDLs against a table is a very small price to pay to resolve the problem. Of course, you can always enable locks during maintenance on that table

3. Reduce or Avoid Activity Against the Lock Type Causing Contention
Reduce or avoid activity against the lock type underlying the contention. For example, if the lock type is DX, try to reduce distributed transaction by redesigning the application architecture.

Advanced topic for Latch Contention Problems
There are a few exceptions and bugs that can cause excessive latch contention. This section tries to list them, but for informational and debugging purposes only.

v$latch_parent
The v$latch_parent view maintains statistics at the parent latch level. In some cases, if there are many child latches, then the RDBMS code will need to acquire all the child latches. For example, deadlock detection code holds the parent latch of enqueue hash chains during execution of the deadlock detection algorithm, and no child latch can be acquired by any other process until that parent latch is released.

spin_count
The parameter spin_count determines amount of time the process will spin on the CPU before going to sleep. This parameter affects the behavior of all latches. Rarely, adjusting the spin_count is a good solution. Adjusting the spin count is unnecessary, and there are only few situations in which it might need to be adjusted.

_latch_classes and _latch_class_N
If it is not possible to resolve the root cause of the latch contention, then the _latch_classes and _latch_class_n parameters can be used to modify the spin count just for few latches from Oracle Database version 9iR2. For example, if you want to increase the spin count for library cache latches, you can use the method demonstrated in Listing 12-41.
In the listing, which is from Oracle Database version 10.2.0.4. the spin_count for _latch_class_1 is set to 18000, and the _latch_classes parameter specifies that latch number 214 belongs to latch_class_1. Latch number 214 is for the library cache latch.

Listing 12-41. _latch_classes and _latch_class_1 parameter changes for library cache latches
Select latch#, name from v$latch where name='library cache';
LATCH# NAME
---------- --------------------------------------------------
214 library cache
*._latch_class_1=18000
*._latch_classes='214:1'

_latch_wait_posting and _enable_reliable_latch_waits
The parameter _latch_wait_posting (obsoleted as of 11g) controls how long wait latches such as library cache latches are handled. By default, this parameter is set to 1,If a latch is not available after a certain period, the process sleeps until posted by the latch holder.
If _enable_reliable_latch_waits is set to true, then all latch posts are considered reliable. A process waiting for a latch will go to sleep and not wake up until posted. The result is reduced CPU usage, tending to reduce the symptoms of latch contention.

Summary
As always, it is best to understand the root cause of a problem, in this case a latch contention problem, and to resolve that root cause. It is
always better to resolve a root cause than to apply a band-aid.

I read this chapter at least 3 times. I believe it’s very helpful on understanding what’s going on in oracle although maybe we cannot solve it from DB side. This is the last chapter I will write about # Expert Oracle Practices #.

# Expert Oracle Practices # Statistics

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Statistics by Jonathan Lewis, guides us on how to create a suitable gather statistics strategy.

Sometimes oracle optimizer just don’t know how many percent of data will return even with many many statistics, just like ourselves before we run the SQL. Take an example:

How many people in your company earn more than the average salary?
If everyone earns about the same amount, and there are no particularly high-paid or low-paid workers, it’s quite possible that the answer matches the default human response of “about half.”
But if you have even a small number of outliers, the figure could be a long way from 50 percent. Include a handful of highly paid executives, and they drag the average (mean) wage up a little bit, and suddenly nearly everyone is earning less than average.
If include a group of low-paid office cleaners who drag the average (mean) wage down a little bit, and suddenly nearly everyone is earning more than average.

So what does Oracle do? It “guesses” that the answer is 5 percent of the employees (for more than > or less than <). For equal (=), it guesses 1 percent.

According to Oracle, 5 percent more than average, 1 percent equal to average, and 5 percent less than average, so the optimizer has just lost 89 percent people somewhere!

Oracle does, indeed, give us a couple of possible strategies to address this type of problem.

  1. dynamic sampling, since 9i
  2. extended statistics, since 11g

Problems with Statistics

When problems appear, the DBA should be able to go to the developer (or the designer if there is one) and ask questions such as the following:

  1. How many trades do we do each day?
  2. How long does it take the status to change from new to closed?
  3. For deals that are not closed, how many should we assume are in each status?
  4. How many deals does the typical customer do per month?
  5. Are there any products that are traded far more than others?

Creating Statistics
The key ideas we have to hit with the strategy are as follows:

  1. We don’t want to spend all our time creating and maintaining the stats-gathering code.
  2. We don’t want the demand for machine resources to affect user activity.
  3. We would like to have a known upper limit on the work done collecting stats.
  4. Some statistics have to be created at specific points in time.
  5. Some statistics have to be carefully constructed lies.

A few guidelines that may be helpful in designing the strategy are as follows:

  1. Start by looking at the ideas Oracle Corp. has embedded in their automatic statscollection job.
  2. Remember to exclude system-generated objects from any driving SQL.
  3. Make sure you don’t exclude objects by accident.
  4. Some tables don’t need to have their statistics changed very often.
  5. Some tables need statistics changed more than once every 24 hours, perhaps as part of a loading process.
  6. You probably have time to use compute to gather stats on small tables—where small is a fairly arbitrary limit that you decide for your system.
  7. You can often gather reasonable stats on large tables (or individual partitions) with a very small sample size.
  8. At a global level, partitioned tables probably need special treatment (custom code).
  9. A few indexes will need special treatment to adjust their clustering_factor.
  10. You may want to adjust the number of distinct values for a few columns.
  11. A few columns will need histograms, and it’s probably best to write code to construct them.

We need to design a mechanism that is easy to manage but allows special handling for a table-driven approach for exceptions (critical objects at a detailed level).

In a nutshell,  create the mechanisms are as follows:

  1. A default mechanism that generates statistics for objects that need them (for example, Oracle’s gather stale)
  2. A mechanism to identify any objects that dropped out of sight of the default mechanism
  3. A mechanism for unlocking and locking stats on objects that must not be touched by the default mechanism
  4. A follow-up that checks whether any of the gathered stats need to be “fixed,” or whether any special cases need handling
  5. A custom mechanism for dealing with each partitioned object

When implementing such a system, remember three key points:

  1. Keep it simple—don’t let special cases proliferate.
  2. Think about how the system can be self-correcting.
  3. Keep an eye on how Oracle does the difficult bits. Don’t get left behind as Oracle changes strategies.

Finally, if there’s one thing you should remember above all others after reading this chapter, it’s this: the task of maintaining appropriate statistics should be part of the application code. Leaving it as a
DBA (or automatic) task is an abdication of responsibility that’s begging for performance problems to appear.

Others in this chapter
dbms_stats.set_column_stats()

dbms_stats.get_column_stats() — into pl/sql variables
adjust pl/sql variables — typically high-value, and num_distinct
dbms_stats.set_column_stats()

in 11g with the copy_table_stats()
dbms_stats.lock_table_stats()

Where I mention gathering stats, I generally use dbms_stats.gather_table_stats() with cascade set to true to collect index stats, method_opt set to for all columns size 1 to avoid histograms, and estimate_percent set to 100 because the tests usually involve small tables. I also tend to disable CPU costing (also known as system statistics) simply to ensure that my test cases are more likely to be repeatable.

# Expert Oracle Practices # Managing the Very Large Database

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Managing the Very Large Database by Tim Gorman, will tell us the key point to manage VLDB. The key feature that makes VLDB possible in oracle is partition. The important concept is still divide and conquer. Awareness of the power of the EXCHANGE PARTITION operation is the key to optimizing data manipulation, and ultimately this operation facilitates a very large database. Let’s highlight some contents:

Deleting or Updating Millions of Rows
Compare the two approaches :

  1. A parallel UPDATE statement generates insane volumes of rollback/undo as part of normal processing, generates equally insane volumes of redo as part of normal processing, requires searches for the rows to manipulate prior to manipulating them, and performs all work within the Oracle buffer cache, requiring each parallel worker process to latch/lock buffers after they’re retrieved, before manipulating them.
  2. A parallel unrecoverable/nologging CREATE TABLE … AS SELECT operation, using parallel direct-path sessions, has no contention with other sessions in the database nor between its own parallel worker processes. It does not generate rollback/undo or redo, except for the Data Definition Language (DDL) commands involved with extent allocation.

The basic lesson from this is that the fastest mass update is actually an insert. When you have a partitioned table, you can use an insert command to perform an update operation.

Loading Millions of Rows
The scenario described in the previous section shows the basic five-step method of loading data into a partitioned table using the exchange partition:
1. Create a “temporary” table that will later be switched with a target partition in the partitioned target table.
2. Load the temporary table using the fastest methods appropriate for the situation (for example, direct-path insert).
3. Gather cost-based optimizer statistics on the just-loaded temporary table.
4. Create indexes on the just-loaded temporary table to match the LOCAL partitioned indexes on the partitioned target table, if any.
5. Exchange the target partition in the target table with the just-loaded temporary table.

Partition Pruning Pitfalls

  1. The partition key column is in a function or expression : A very common problem arises because of a matter of programming style in dealing with the time component of the DATE datatype.
  2. Compare data with different type causes implicit datatype conversion.

Method of partitioning

  1. Range partitioning uses a finite to infinite list of data values, searched using equivalence operators (that is, =) and range operators ( >, >=, <, <=, BETWEEN,LIKE, and so forth).
  2. List partitioning uses a finite list of identified data values plus DEFAULT for everything else, searched using equivalence operators only.
  3. Hash partitioning uses an infinite list of data values, searched using equivalence operators only.

Information Life Cycle Management
Let’s consider creating tablespaces to hold a month’s worth of ORDER_LINES partitions and their index partitions.

Now, with time-variant tablespaces, we can create a new tablespace on tier 1 storage, initially in read-write status. After two months, we change the tablespace from read-write to read-only status. After six months, we move that tablespace from tier 1 to tier 2 storage. Then, after 12 months, we move that tablespace from tier 2 to tier 3 storage, and after seven years, we drop that tablespace and all the partitions with it. The key here is that the tablespace must be in read-only status before we can consider moving it.

Two major methods to migrate tablespaces from one tier of storage to another:

  1. Using RMAN to first copy the data files in the tablespace to the new location, and then switch the identity of the data files from the old location to the new location
  2. Creating the new tablespace in the new location, using CREATE TABLE … AS SELECT to copy the data from the existing partitions to the new tables, and then using the exchange partition technique to switch the old partitions with the new tables in the new tablespace

Each of these two techniques has its advantages and its disadvantages. The advantage of the RMAN technique is simplicity and no need to take another backup of the moved tablespaces. The advantage of the copy/exchange technique is the opportunity to compress a previously uncompressed partition during the copy phase.

Add one more method : transport tablespaces from an OLTP db to an archive db.

Backup Optimization and Guaranteed Recovery
Key point: Dividing line between read-write and read-only

The majority of the volume of a database that is configured for ILM is read-only and needs to be backed up only infrequently, perhaps once or twice per year, if not less frequently, depending on how much you trust your backup media. It is only the read-write portion of the database, along with the archived redo log files, that need to be backed up frequently, on a daily or weekly basis

Luckily, this planning and design coincides precisely with the goals of ILM, and Oracle provides the toolkit for this in the features of partitioning, read-only tablespaces, and incremental backups by Oracle RMAN.

Breaking large objects into smaller ones, just for the sake of doing so, does not provide much benefit. However, the way in which those smaller objects are managed, how they are stored, according to a life cycle, makes all the difference in the world.

Further Notes on Storage
The question: Which should be used, file systems, “raw” devices, or Automated Space Management (ASM)?

Lots of people are suspicious of Oracle ASM because they feel it is too new, unfamiliar, and buggy. But I have worked with Oracle ASM in some of the most extreme database environments I have ever experienced, and it nicely complements the best features of the Oracle database without limitations.
In the world of very large databases, removing limitations and enabling all of the features available is vital. Try ASM, if you haven’t already. There is a learning curve, but you’ll be glad you invested the effort.

Limits of Which to Be Aware
Database Block Size, there are numerous limits based on database block size, so it worthwhile to choose wisely:

  1. Maximum size of an extent in a segment (4 billion database blocks for dictionarymanaged tablespaces, 2 billion database blocks for locally managed tablespaces)
  2. Maximum control file size (20,000 database blocks)
  3. Maximum data file size (4 million database blocks for SMALLFILE tablespaces, 4 billion database blocks for BIGFILE tablespaces)
  4. Maximum database size (65,533 times the maximum data file sizes cited earlier)

Number of Files in a Database
As recently as Oracle 11g R2, the limit on the number of data files in a database is 65,533.

So, here are two bits of advice for the planning and designing of the very large database:

  1. Be sure to take advantage of data file autoextension.
  2. Consider using BIGFILE tablespaces.

Starting with Oracle 10g, Oracle provides the concept of BIGFILE tablespaces, which are tablespaces that are permitted one and only one data file. This single data file is capable of growing to 4 billion database blocks, whereas the data files of traditional (SMALLFILE) tablespaces can grow to only 4 million database blocks.

Storage That Can Migrate
One such concept is hierarchical storage management (HSM). All of the volumes in the HSM are available at all times, but if a volume is not used for a prespecified period of time, most of the volume (except for a small header element) is migrated transparently off to lower-cost storage. HSM volumes are often ideal as tier 3, or archival-class, storage for infrequently used data.

Tablespaces created on (or migrated to) HSM volumes had better be dictionary managed, so as not to run into the file-open latency problem every time the DBA_EXTENTS and DBA_FREE_SPACE views are queried.

Parameter READ_ONLY_OPEN_DELAYED
The parameter READ_ONLY_OPEN_DELAYED changes this behavior at instance startup. Data files belonging to tablespaces that are read-only are only touched when they are accessed, not when the database is opened. So, as the number of data files grows larger, and as the number of read-only tablespaces increases, you can keep the time needed to restart the database instance to a minimum by setting this parameter to TRUE, especially if you have read-only tablespaces based on migrateable HSM storage.