A script to format pseudo code

To format the pseudo code in previous blog, I wrote a small perl script to do that:

#!/usr/bin/perl -w

#2/3/2013  Alex Zeng  format code with if, else, loop

my $ifile = shift;
my $blank = 0;
my $indent = 2;
open(IFILE, "$ifile") or die "Cannot open $ifile\n";
while(my $line = <IFILE>) {
  if($line =~ /^\s*else/i
     or $line =~ /^\s*end/i
    ) {  #reduce indent before xxx
    $blank = $blank - 2;
  }
  if($blank > 0) {
    print " " x $blank;  #print several blanks
    print "$line";
  } else {
    print "$line";
    $blank = 0;
  }
  if($line =~ /^\s*if/i
     or $line =~ /^\s*else/i
     or $line =~ /^\s*loop/i
    ) {  #increase indent after xxx
    $blank = $blank + 2;
  }
}

As always, it’s not perfect, but a start. Maybe sometimes it’s helpful for you.

# Expert Oracle Practices # Troubleshooting Latch Contention

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Troubleshooting Latch Contention by Riyaj Shamsudeen, is very helpful, for both understanding the theory behind and practice in read world.

Latches, what is it? Enqueues?
Latches are a serialization mechanism to protect critical resources in the system global area (SGA) from concurrent modifications. A latch is implemented as a few bytes of memory.The value of a latch’s memory area indicates whether that latch is available or not.

Latches are handled exclusively by the Oracle RDBMS engine. Users have no ability to control latch operations directly.

Latches are very different from enqueues (also termed locks). Enqueues are used to lock objects or rows to avoid simultaneous conflicting operations on tables or rows. Enqueues have their own complex data structures, and the resources in question can be locked in one of six available modes. In contrast, latches can be acquired only in either share mode or exclusive mode. Another difference is that enqueues have queuing structures, and they are acquired in a strict serial fashion. By contrast, latches have no queuing structures.

Latches classes:
Latches can be grouped into two classes: solitaire and non-solitaire latches.
Solitaire latche, only one latch, a single latch that allows only one process to modify the critical resource it
protects.
Non-solitaire latches, has a Parent latch, and many Child Latches. The non-solitaire group can further be broken into parent and child latches.

The cache buffers chains latch (CBC latch) is a typical example of a non-solitaire latch. There are thousands of child latches for this parent latch.

Views about latch
Latches are externalized in dynamic views such as v$latch, v$latch_children, and v$latch_parent. The view v$latch_children maintains statistics for latches at the child level, while v$latch maintains aggregated statistics for all child latches, and v$latch_parent maintains statistics at the parent level.

Rem Example of solitaire latches
col name format A35
select name, latch# , count(*) from v$latch_children group by name, latch#
having count(*) =1 order by 1;
NAME LATCH# COUNT(*)
----------------------------------- ---------- ----------
TLCR meta context 126 1
channel handle pool latch 33 1
ges resource scan list 51 1
granule operation 42 1
redo allocation 115 1

Rem Example of non-solitaire latches
select name, latch# , count(*) from v$latch_children group by name, latch#
having count(*) >1 order by 1;
NAME LATCH# COUNT(*)
----------------------------------- ---------- ----------
KCL freelist parent latch 120 10
KCL gc element parent latch 118 250
KCL name table parent latch 119 832
KJC message pool free list 58 3
KJCT flow control latch 66 33
…

What’s spin & sleep? Why ?
Before we dive into these modes, let’s take a brief look at sleeps and spins. Algorithms presented in the following section use sleeps and spins if a latch is not available. If a latch is not immediately available, then the process might need to sleep before trying to acquire the latch again. Sleep is the process of giving up CPU cycles voluntarily and “sleeping” for a specific time-out period. Sleep results in costly context switches. Typically, latches are held for very short durations on the order of a few nanoseconds, and in many cases it is better to avoid the context switches. Thus, Oracle Database code makes a preset number of attempts to acquire a latch before going to sleep. This process is known as spinning.

Statistics of spins and sleeps
The spin_gets value indicates the number of times a process was able to acquire a latch with just spinning alone, without the need to sleep. The sleep_gets statistic indicates the number of sleeps encountered trying to acquire a latch. As the value of sleep_gets increases, performance worsens.

What’s Immediate Mode
In the immediate mode of latch acquisition, Oracle code tries to acquire any of the available child latches (of a given latch family) without waiting, and it tries to acquire the last child latch in a willing-to-wait mode. For example, redo copy latches must be held before copying redo records to the log buffer. There are several Redo copy latches configured in an SGA, and it is sufficient to hold any one of the Redo copy latch children to copy the redo records to the log buffer. Oracle Database’s internal code tries to acquire any of those child latches s in a round-robin fashion. If none of the child latches are available, then the code tries to acquire the last child latch in willing-to-wait mode.

A pseudocode algorithm to illustrate immediate-mode latch acquisition

Try to acquire first child latch selected pseudo randomly.
If latch acquired
  Increment gets counter
  Exit loop and proceed to modify critical resource
Else
  Step X: Try to acquire next child latch in a round-robin fashion.
  If latch is available then
    Increment immediate_gets counter
    Exit loop and proceed to modify critical resource.
  Else
    Increment immediate_misses counter
    Go to step X until all but one child latches tried.
  End
  If all the child latches except one has been tried, then
    Try that last child latch in willing-to-wait mode.
    While (latch is not available)
    Loop
      if latch acquired
        increment gets counter
        exit loop and proceed to modify critical resource.
      Else
        For ( req_count=0;
        req_count < spin_count && latch not acquired;
        req_count++
        )
        Loop
          Spin and try to acquire latch;
        End loop;
        If the latch is acquired
          Increment misses counter
          Increment spin_gets counter
          exit loop and proceed to modify the critical resource
        else
          while (latch is not available)
          loop
            Sleep (sleep duration increases slowly until reaching
            _max_exponential_sleep).
            Increment sleeps counter.
          End loop;
          Increment misses counter.
          Exit loop and proceed to modify the critical resource.
        End if
      End if
    End loop;
    Increment gets counter;
    Exit loop and proceed to modify the critical resource
  End if
End if

What’s Willing-to-Wait Mode
Willing-to-wait mode involves the practice of requesting a latch and waiting until the latch becomes available. Successful attempts to acquire the latch are termed gets. Failed attempts are termed misses.

For example, before modifying a cache buffers chains in the SGA, the child latch protecting that hash chain must be held. (a solitaire latch, the resource being protected is protected by only that latch)

Latch acquisition in willing-to-wait mode

Try to acquire the latch in willing-to-wait mode.
While (latch is not available)
Loop
  if latch acquired
    increment gets counter.
    Exit loop and proceed to modify critical resource.
  Else
    Increment misses counter.
    For ( req_count=0;
    req_count < spin_count && latch not acquired;
    req_count++
    )
    Loop
      Spin and try to acquire latch;
    End loop;
    If the latch is acquired
      Increment spin_gets counter.
      Increment misses counter.
      exit loop and proceed to modify critical resource
    else
      while (latch is not available)
      loop
        Sleep (sleep duration increases slowly until
        reaching _max_exponential_sleep).
        Increment sleeps counter.
        Try to acquire the latch.
      End loop;
      Increment misses counter.
      Exit loop and proceed to modify critical resource.
    End if
  End if
  Try to acquire the latch;
End loop;
Exit loop and proceed to modify critical resource

What’s Latch-Wait Posting Mode?
The third method of latch acquisition relies on latch wait posting. This method uses more sleeps, thus reducing spin. Excessive spin can lead to high CPU usage, and latch wait posting effectively addresses that problem. When the holder of a long wait latch is ready to release the latch, it will scan through the waiters list and post the processes waiting for the latch to indicate latch availability. The waiting processes largely sleep until the latch is available and posted.

Two latches general categories:
short-wait, using immediate mode or willing-to-wait mode
long-wait latches, using spinning also, but processes go to sleep while waiting for long wait latches, thus reducing the CPU usage associated with latch spin.(I am not sure whether it’s still true in 11g)

For example, a cache buffers chains latch is a short wait latch, meaning that most probably this latch is held for a very short time.

The library cache latch is a long-wait latch. It can take longer to search for a suitable object in library cache hash chain, and so the library cache latch may be held for longer time.

What’s Cache Buffers Chains (CBC) ?
Cache buffers chains (CBC) latch contention is the most common cause of latch contention in the real world.
The Buffer cache is split into multiple data block buffers. Every data block buffer has an associated buffer header pointing to that buffer. These buffer headers (externalized in x$bh or in the dynamic view v$bh) are hanging from a hash chain. Multiple buffer headers are chained together to create a hash chain, and a hash bucket is in the head of this chain. The buffer header associated with a database block can only be chained to a specific hash chain.

How a database block is read into the buffer cache by Oracle Database code?
1. Identify the database block to be read from the disk.
2. Check to see if the block is already in the buffer cache: Apply a hash function over the DBA of the block to identify the hash bucket. If that buffer is in the buffer cache, it will be in the hash chain associated with the bucket.
3. Acquire the CBC latch protecting that hash chain. Walk the hash chain to see if that block with a specific version can be found.
4. If a block is found in the hash chain, use that block. If the block is not found, find a free buffer in the buffer cache, delink the buffer header for that buffer from its current chain, and link that buffer header to the hash chain under the protection of the CBC latch. Pin the buffer header, release the latch child, and read the block into this buffer.

Causes of CBC Latch Contention?
1. Longer cache buffers chains.
Even if the processes are trying to access completely different objects, because the same latch is protecting all buffer headers linked in that hash chain, that child latch can inadvertently cause latch contention. This problem is mostly resolved from Oracle Database version 10g, since the number of hash buckets2 configured by default increased dramatically in that release.The number of buffer cache hash buckets is configured by _db_block_hash_buckets. The number of CBC latches is configured by _db_block_hash_latches.

2. Heavy concurrent access of an index root block or branch blocks.
In a tight, nested-loops join based upon a unique or primary key, the associated index’s root block must be accessed very frequently.
Consider the access path shown in Listing 12-8. For each row from the outer row source at step 130, probe the inner table through a unique key column. To find a specific join key, the database engine will access the root block of the RCV_SHIPMENT_HEADERS_U1 index. Next, the engine
will access branch blocks, and then leaf blocks. If the query is executed by hundreds of concurrent processes, then each of those processes will access the root/branch/leaf blocks thousands of times per second. That heavy concurrent access can result in numerous processes trying to access the hash chain looking for the root block of the index, in turn leading to CBC latch contention.

A tight, nested-loops join
| 129 | NESTED LOOPS | | 1 |
| 130 |   TABLE ACCESS BY INDEX ROWID | RCV_TRANSACTIONS_INTERFACE |1339 |
|*131 | INDEX RANGE SCAN | CUS_RCV_TXNS_INTERFACE_C3 |1339 |
|*132 |   TABLE ACCESS BY INDEX ROWID | RCV_SHIPMENT_HEADERS | 1 |
|*133 |    INDEX UNIQUE SCAN | RCV_SHIPMENT_HEADERS_U1 | 1 |

3. Heavy concurrent access of a leaf block.
Such access typically happens if concurrent inserts are made into a nonpartitioned table with a monotonically increasing sequence of primary or unique keys. Sequence-based values will be in a strict numeric order. Since primary or unique indexes are populated with these sequence values, recently generated values tend to be populated in the same leaf block. CBC latches protecting the hash chains of those buffers will be heavily accessed, leading to CBC latch contention.

4. Heavily accessed small table.
In a nested loops join method, if the inner row source is a small table, and if it is not indexed, then for every row from the outer row source, the inner row source will be accessed using a full table scan. The result can be much higher concurrent access to the few blocks used by that table, which in turn may lead to CBC latch contention for the latches protecting that small table’s blocks.

A heavily accessed small table
| 129 | NESTED LOOPS | | 1 |
| 130 |  TABLE ACCESS BY INDEX ROWID | RCV_TRANSACTIONS_INTERFACE |1339 |
|*131 |    INDEX RANGE SCAN | CUS_RCV_TXNS_INTERFACE_T3 |1339 |
|*132 |  FULL TABLE SCAN | MTL_PARAMETERS | 1 |

5. High consistent block generation.
This makes it necessary to create buffer clones consistent with the SELECT statement. Buffer clones are created by applying undo records to a copy of the buffers being cloned, creating consistent versions of the blocks—known as consistent read (CR) block generation. The process of cloning buffers can result in higher access to the undo header block, and to undo blocks

6. Many sessions performing full table scan on one or few tables
This means that many processes will compete for the same blocks or range of blocks, and that competition can in turn lead to CBC latch contention.

How-to Analyzing CBC Latch Contention?
Step 1: Review the Distribution of Gets
To analyze CBC latch contention, you need to find whether the latch contention is widespread across many child latches or is limited to a smaller set of child latches. You can use the view v$latch_children to determine which is the case.

Querying to see the distribution of child latches
Select * from (
select child#, gets, misses,sleeps,
rank() over (partition by latch# order by gets desc ) rnk_gets,
rank() over (partition by latch# order by misses desc )rnk_misses,
rank() over (partition by latch# order by sleeps desc ) rnk_sleeps
from v$latch_children where name = 'cache buffers chains'
)
where rnk_gets <=20 and rnk_misses <=20 and rnk_sleeps <=20 order by rnk_gets / 

Step 2: Identify the SQL Statements Involved Your next step is to identify SQL statements involved in the latch contention.
Use the SQL in Listing 12-10 to show the top SQL statements suffering from latch contention. Keep in mind that any given SQL statement identified by the query in Listing 12-10 could be merely a victim of latch contention, and not a root cause

Listing 12-10. Querying to see Latch Free events and SQL hash values
select substr(w.event, 1, 28) event, s.sql_hash_value, count(*)
from v$session_wait w, v$session s, v$process p
where s.sid=w.sid
and p.addr = s.paddr
and s.username is not null
and w.event not like '%pipe%'
and w.event not like 'SQL*%'
group by substr(w.event, 1, 28), sql_hash_value
order by 3
/

EVENT SQL_HASH_VALUE COUNT(*)
------------------------------ -------------- ----------
enqueue 3740270 1
enqueue 747790152 1
...
latch free 622474477 3
latch free 1509082258 58

You need to drill down to identify the specific latch type at the root of the contention. Column p2 in v$sesssion_wait identifies the latch# in the v$latch dynamic performance view.

Listing 12-11. Finding the latch type and SQL hash value
Set lines 160 pages 100
Column event format A35
Column name format A35
select x.event, x.sql_hash_value,
case when x.event like 'latch%' then
l.name
else ' '
end name,
x.cnt from (
select substr(w.event, 1, 28) event, s.sql_hash_value, w.p2,count(*) cnt
from v$session_wait w, v$session s, v$process p
where s.sid=w.sid
and p.addr = s.paddr
and s.username is not null
and w.event not like '%pipe%'
and w.event not like 'SQL*%'
group by substr(w.event, 1, 28), sql_hash_value,w.p2
) x,
v$latch l
where
x.p2 = l.latch#(+)
order by cnt
/
EVENT SQL_HASH_VALUE NAME CNT
----------------------------------- -------------- ------------------------- ----
…
Latch free 1509082258 cache buffers chains 56

From Oracle Database version 10g onward, Active Session History4 (ASH) provides sufficient information to identify SQL statements suffering from CBC latch contention.

Listing 12-12. Querying Active Session History to identify SQL_IDs
select event, sql_id, sql_child_number, count(*) cnt from v$active_session_history where event like 'latch%' and sample_time > sysdate-(1/24)
group by event, sql_id, sql_child_number
order by 4 desc
/

It is possible that another process is holding the latches in Listing 12-12 and thus not allowing SQL from the application in question to acquire those latches. This scenario is rare but needs to be analyzed. You need to verify that the hash value acquiring or holding the latch matches the hash value from Listing 12-11. You should execute the SQL in Listing 12-13 repeatedly (at least 10 times) to verify that the hash
values match.

Listing 12-13. Querying for SQL hash values acquiring or holding a latch
select s.sql_hash_value, lh.* from
v$latchholder lh, v$session s where
lh.sid=s.sid
order by s.sql_hash_value
/
SQL_HASH_VALUE PID SID LADDR NAME GETS
-------------- ------- ---------- ---------------- ------------------- ----------
0 418 10766 0000000ABBF5E3E8 library cache pin 16775103
1509082258 418 10766 0000000ACA8A1158 cache buffers chains 461959

Step 3: Identify Objects Causing the Contention
The script in Listing 12-14 queries to find objects that are hanging from the hash chain protected by those child latches. Output is sorted in part by the tch column, placing the more suspect objects first in the output. Objects with higher tch values are problematic and most probably causing the latch contention. You may need to execute the query a few times to get consistent data while latch contention is occurring. In the example output, you can see that the inv.mtl_onhand_quantities table is causing the contention.

Listing 12-14. Querying to find objects involved in latch contention
with bh_lc as
(select
lc.addr, lc.child#, lc.gets, lc.misses, lc.immediate_gets, lc.immediate_misses,
lc.spin_gets, lc.sleeps,
bh.hladdr, bh.tch tch, bh.file#, bh.dbablk, bh.class, bh.state, bh.obj
from
v$session_wait sw,
v$latchname ld,
v$latch_children lc,
x$bh bh
where lc.addr =sw.p1raw
and sw.p2= ld.latch#
and ld.name='cache buffers chains'
and lower(sw.event) like '%latch%'
and bh.hladdr=lc.addr
)
select bh_lc.hladdr, bh_lc.tch, o.owner, o.object_name, o.object_type,
bh_lc.child#,
bh_lc.gets, bh_lc.misses, bh_lc.immediate_gets,
bh_lc.immediate_misses, spin_gets, sleeps
from
bh_lc, dba_objects o
where bh_lc.obj = o.data_object_id(+)
order by 1,2 desc
/
HLADDR TCH OWNER OBJECT_NAME OBJECT_TYP
---------------- --- ------- ------------------------------ ----------...
0000001029B81010 203 INV MTL_ONHAND_QUANTITIES_DETAIL TABLE...
0000001029B81010 81 ONT OE_ORDER_HOLDS_ALL_C3 INDEX...
...

Step 4: Review Execution Plan of Problem Statements
In step 2, SQL statements causing latch contention were identified. In step 3 objects involved in CBC latch contention were identified. Data from steps 2 and 3 can be correlated and matched. This correlation can be done by reviewing the execution plans of the SQL statements from step 2.

How-to resolve CBC Latch Contention?
1. Eliminate Full Table Scan on Small Tables

2. Eliminate Full Index Scan on Small Indices
Similar to full scans on small tables, full index scans on smaller indices also will result in many concurrent processes accessing a small set of buffers, in turn leading to CBC latch contention. Eliminate full index scans with proper indexing techniques such as careful selection of index columns and choosing proper index types (such as btree or bitmap indexes).

3. Eliminate Leaf Block Contention with Partitioning
Processes that insert rows with primary key values generated from a sequence will insert rows in the rightmost leaf block of the associated primary or unique index btree. All concurrent inserts will suffer from CBC latch contention accessing that leaf block. The root cause here is that synthetic keys are inserted in just a few leaf blocks of index.
To eliminate leaf block contention, you can hash partition the involved table on the primary key.
Also, if a table cannot be partitioned, you may be able to convert the primary key and unique key indices into hash-partitioned indices over the nonpartitioned table

4. Favor Hash Joins Over Tightly-Nested Loop Joins
If the number of rows from the outer row source in a nested loops (NL) join is very high, and the inner probed table is small, processes can contend for the latches that protect the small inner table. Sometimes you can resolve this contention by converting the nested loops join to a hash join.

5. Tune Inefficient Indices
Table blocks are accessed for each row returned from an index scan. Excessive accesses to the table blocks can cause CBC latch contention. When too many blocks are scanned with index range scans (because of index inefficiency), concurrent processes will contend for the same set of buffers and latches, and that leads to CBC latch contention.

6. Reduce CPU Usage
If CPU usage is very high in the database server, that can also lead to latch contention. Ironically, high CPU usage is also a side effect of latch contention. Knowing whether high CPU usage is the cause or the result of contention can be a challenge. If you suspect latch contention from high CPU usage, you should reduce CPU usage and then measure for latch contention again.

What’s Shared Pool Latch ?
Shared pool latches are held while allocating or releasing space from the shared pool. Essentially, shared pool child latches protect changes to the shared pool free lists.
Prior to Oracle version 9i, the shared pool was just one big heap protected by just one shared pool latch, and it was a major contributor to latch contention. But from Oracle version 9i onward, the shared pool is split into multiple subheaps, and each subheap is protected by a shared pool child latch. This change has reduced the incidence of trouble with shared pool latch contention. Still, it is important to understand the problem of shared pool latch contention.

Structures in the Shared Pool
Each of the subheaps in the shared pool is further split into smaller subheaps. The fixed table x$ksmss externalizes shared pool subheaps.

Listing 12-20. Query to list shared pool subheaps
select ksmdsidx, count(*) , sum(ksmsslen) from x$ksmss
where ksmdsidx>0
group by ksmdsidx
order by 1
/
KSMDSIDX COUNT(*) SUM(KSMSSLEN)
---------- ---------- -------------
1 229 268435536
2 237 268435520
3 241 268435520
4 235 268435512
5 234 268439728
6 247 268435616
7 248 268435552

Listing 12-21. Query showing the number of shared pool latches
select count(*) from v$latch_children where name='shared pool'
COUNT(*)
----------
7

The number of subheaps (and so the number of shared pool latches) is dependent upon initialization factors such as shared pool size, granule size, a few underscore parameters, the Oracle Database version, and even a few OS parameters.

Shared Pool Free Lists
Oracle’s internal algorithm tries to find a chunk at least as big as the request, and the search for a free chunk is not cheap.
From Oracle Database release 9i onward, these free chunks of memory are grouped by their size and arranged as free lists from a hash bucket. This arrangement of free chunks grouped by size improves free space search.

Shared Pool Reserved Free Lists
A small percentage of the shared pool is set aside as the shared pool reserved area. Each subheap of the shared pool allocates a fraction of itself as part of the shared pool reserved area. And even though the shared pool reserved area has separate free lists, it shares the same latch with the non-reserved area.
If there is no free space in the main part of the shared pool, and if a requested chunk size is above a threshold determined by the _shared_pool_reserved_min_alloc initialization parameter, then the chunk is allocated from the shared pool reserved area.

Causes of Shared Pool Latch Contention?
Shared pool latch contention is almost always due to shared pool fragmentation. If the shared pool is fragmented, then the shared pool latch will be held longer during this search.
1. Shared pool fragmentation
While a process is trying to allocate a chunk big enough to satisfy an allocation request, recreatable or freeable chunks can be flushed off the shared pool and those chunks need to be coalesced to create a bigger chunk. That coalescing might involve walking shared pool freelists many times and flushing chunks until suitable chunks can be found or created. This activity is performed under the protection of the shared pool child latch leading to latch contention

2. Incorrect configuration
If a pool is incorrectly or inadequately configured, then chunks may be allocated from the shared pool instead of, say, the large pool. For example, if the large pool is not configured, then RMAN communication buffers or parallel query slave communication buffers may
be allocated from the shared pool. Allocation of these buffers can deplete and fragment shared pool free space, leading to shared pool latch contention.

3. Reduced shared pool subheaps.
Multiple shared pool subheaps are very important to performance, since it is critical to have multiple shared pool latches. Each shared pool subheap is protected by a shared pool child latch. If the number of shared pool subheaps is reduced, then the number of shared pool child latches also will be reduced

4. High CPU usage in the server.
If CPU usage is very high in the server, then the process holding a latch might not get enough CPU cycles and go to sleep prematurely before completing the code path. That sleep can result in the process holding the latch longer than is ideal. Fortunately, this problem has been solved in many platforms through the use of facilities to prevent a process from being preempted on the CPU. Such an option is referred to as the no_preempt option. But the no_preempt option is not available in all platforms, and without such an option it becomes important to reduce CPU usage to resolve latch contention.

5. Too-frequently flushed shared pool.
Flushing the shared pool too often can lead to latch contention. This typically happens in databases upgraded from Oracle Database version 8i or earlier. Flushing the shared pool involves flushing unpinned, recreatable chunks. Latches must be held during the time in which the flushing occurs. The result is increased activity on the shared pool latches.

How-to analyze Shared pool Latch Contention?
Step 1: Review Distribution Among Child Latches
Check whether the distribution among various child latches of the shared pool is uniform. If the distribution of gets and misses is skewed, meaning that a few child latches have much more demand than others in the shared pool, it may mean that those few subheaps are undergoing intense pressure.

Listing 12-24 shows a situation in which just a few shared-pool child latches are acquired much
more frequently than other child latches. Such a situation suggests higher activity in a few shared pool
subheaps.

select child#, gets , misses, immediate_gets, immediate_misses
from v$latch_children where name='shared pool';
CHILD# GETS MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
---------- ---------- ---------- -------------- ----------------
7 280 0 0 0
6 280 0 0 0
5 280 0 0 0
4 280 0 0 0
3 12907260 1035751 0 0
2 15153322 1276786 0 0
1 12826219 1154988 0 0
7 rows selected.

If the gets among children are skewed, then it is important to understand the type of chunks allocated in those subheaps

You can delve into the problem in at least two ways. Statspack and AWR reports have a section that will show the increase/decrease in various shared pool area sizes.

Listing 12-25. Statspack output showing SGA differences
SGA breakdown difference for DB: APDB Instance: APDB3 Snaps: 242425 -242426
Pool Name Begin value End value % Diff
------ ------------------------------ ---------------- ---------------- -------
shared library cache 44,448,832 44,463,992 0.03
shared messages 2,080,000 2,080,000 0.00
shared miscellaneous 40,601,264 38,964,144 -4.03
shared parameters 188,648 188,648 0.00
shared partitioning d 228,600 228,600 0.00
shared pl/sql source 3,288 3,288 0.00
shared processes 13,760,000 13,760,000 0.00
shared qmps connections 4,842,200 4,842,200 0.00

Another method of researching the problem is to query the fixed table x$ksmss. It can be queried to see which area has grown bigger. The SQL in Listing 12-26 prints the top 20 areas that are consuming space in the shared pool.
You may need to run a query or report multiple times. In some cases, it may necessary to monitor shared pool areas with a custom script to detect abnormal growth.

Listing 12-26. Query against x$ksmss to see which area is bigger
select ksmdsidx,ksmssnam, size_area from(
select ksmdsidx, ksmssnam,sum(ksmsslen) size_area from x$ksmss
where ksmssnam!='free memory'
group by ksmdsidx, ksmssnam
order by 3 desc
)
Where rownum<21; 

Step 2: Inspect Shared Pool Fragmentation
In most cases, shared pool latch contention is caused by shared pool free list fragmentation.

Step 3: Review Objects Causing Flushing
The fixed table x$ksmlru keeps track of the most recent flushes of the shared pool. If there is no chunk big enough to accommodate an allocation request, then chunks that can be recreated or freed might need to be flushed. This object flushing also will increase the activity on the shared pool latches. In Listing 12-30, column KSMLRNUM indicates the number of items flushed to accommodate an allocation request. KSMLRCOM shows the allocation comment for the allocation request. Reviewing that column will provide valuable clues about why there is higher activity on shared pool latches.

 Listing 12-30. Query to identify objects that are flushing the shared pool
Set lines 160 pages 100
Spool ksmlru.lst
Select * from x$ksmlru order by ksmlrnum;
Spool off

Interestingly, querying x$ksmlru will remove rows from that fixed table. Thus, it is better to spool the output to a file.

Step 4: Identify SQL Statements Using Literal Values
Another common reason—probably the most common—for shared pool fragmentation is repeated hard parsing of the SQL statements due to use of literal values in those statements However, most of these SQL statements will have the same execution plan, and hence the same plan hash value. The SQL in Listing 12-31 uses that fact to identify SQL statements that do not use bind variables.

Listing 12-31. Query to see the top 20 SQL statements that use literal values
select * from (
select plan_hash_value, count(distinct(hash_value)), sum(executions),
sum(parse_calls)
from v$sql
group by plan_hash_value
having count(distinct(hash_value)) > 10
order by 2 desc
) where rownum<21;
PLAN_HASH_VALUE COUNT(DISTINCT(HASH_VALUE)) SUM(EXECUTIONS) SUM(PARSE_CALLS)
--------------- --------------------------- --------------- ----------------
511709263 4939 4939 4939
1438036477 4907 4907 4907
3532438726 4814 4814 4814
...

How-to Resolve Shared Pool Latch Contention?
1. Avoid Unnecessary Heaps
Configure large pool, etc.

2. Avoid and Reduce Fragmentation by Sharing SQL Statements
SQL statements that are executed very frequently with literal values must be converted to use bind variables.

3. Avoid Setting _kghdsidx_count to 1
Set the _kghdsidx_count parameter to 1, oracle will allocates just one shared pool heap with just one shared pool latch. This means that the shared pool latch becomes a solitaire latch, and this can lead to shared pool latch contention.

4. Avoid Flushing the Shared Pool
You can flush the shared pool with the alter system flush shared_pool command. Flushing a shared pool throws away recreatable and freeable chunks. These chunks must be added to shared pool free lists under the protection of a shared pool child latch. Flushing the shared pool will increase activity against shared pool latches artificially and can lead to shared pool latch contention.
Sometimes flushing the shared pool can improve performance. However, if you find that flushing the shared pool improves performance, the correct action is not to continue to flush, but to find the root cause of the fragmentation and resolve that.

5. Avoid Shared Pool Reserved Free List Fragmentation
The shared pool reserved free list is searched only if there is not enough contiguous space in the main free list, and if an allocation request is larger than the value specified by the _shared_pool_reserved_min_alloc parameter. Up until version 11g, this parameter defaults to 4200. The
default value is sufficient in many cases. Setting the value of the parameter lower can lead to fragmentation in the reserved pool, in turn leading to latch contention.

What’s Library Cache Latch ?
The library cache holds parsed representations of SQL statements, execution heaps of procedures, functions, and packages. There are many other items, such as table definitions.

The library cache is organized as an array of hash buckets. Hash chains are hanging from these hash buckets. Library cache objects are attached to these hash chains, and these objects point to various heaps in the shared pool. Changes and inspections to these library cache hash chains are protected by library cache latches. The library cache latch is a non-solitaire latch, and there are many child latches protecting changes to the hash chains.

The parsing step involves too many operations to discuss in detail, but following is a subset of what happens:
1. First, a unique hash value is created from the SQL text, or from the text of the library cache object.
2. Using that hash value, a library cache bucket number is derived.
3. That library cache bucket is protected by a library cache child latch. The parsing process holds that latch while searching the hash chain for a match on the hash value (together with a few other criteria).
4. If match is found, then the parsed representation of the SQL statement (or other object) exists, and no additional hard parsing is needed.
5. If there is no match for a hash value, then an additional hard parsing step must be completed. A new library cache object must be added to the hash chain with pointers to various heaps for the object being parsed. Any change to library cache hash chains is done under the protection of a library cache child latch. Hard parsing can lead to contention for those child latches.

Causes of Library Cache Latch Contention?
Library cache latch contention is almost always caused by excessive parsing or by nonsharable child cursors. Use of literal values in SQL is the most common cause of library cache and shared pool latch contention.
1. Excessive parsing:
Caused by literal values in SQL
2. Side effects of hard parsing:
If there is no free space available to accommodate an incoming allocation request, existing cursor heaps might need to be flushed from the shared pool/library cache. This increases activity against shared pool and library cache latches
3. Creation of excessive child cursors:
That lack of sharing will result in longer hash chains, and in many objects hanging from any one chain. Processes will be searching through these longer hash chains, increasing activity against library cache child latches

How-to Analyze Library Cache Latch Contention?
The process for analyzing library cache latch contention is almost identical to analyzing shared pool latch contention.

How -to Resolve Library Cache Latch Contention
1. Use Bind Variables
It’s especially important to use bind variables for statements that have a high number of executions.

2. Avoid Flushing the Shared Pool
Unpinned recreatable chunks from the library cache are thrown away during a shared pool flush operation.Flushing artificially induces a higher amount of hard parse calls.

3. Adjust the session_cached_cursors Parameter
If a cursor is found in the session cursor cache, then the parsing step is cheaper than otherwise. Increase of this parameter comes at a cost, since cursors that are in the session cursor cache are pinned in the shared pool, and this can result in higher space usage in the shared pool.

4. Adjust the cursor_space_for_time Parameter to True
If there is a cursor open pointing to a cursor heap, then that cursor heap is less likely to be flushed out. Retaining the cursor heap improves the performance of subsequent executions of that SQL statement at the cost of increased shared pool usage.
As of Oracle Database version 10.2.0.5 and 11.1.0.7, this parameter is deprecated.

5. Control and Limit Histograms
If your application is collecting histograms on all columns, then rethink that practice, and narrow the scope of collection to only those columns that matter. That’s avoid unnecessary SQL plan for same SQL.

Library Cache Latches and Mutexes
From Oracle version 10g onward, most library cache latches have been replaced by mutexes.
Library cache objects are locked and pinned under the protection of a mutex associated with the parent cursor. Still, library cache hash chains are protected by library cache latches, and having longer hash chains can result in library cache latch contention.
Version 11g improved this area further. Each bucket in the library cache has its own mutex, and scanning through the library cache hash chain is also performed under the protection of a mutex governing that specific bucket.

What’s Enqueue Hash Chains Latch?
Enqueue hash chain latches protect structures handling enqueues, commonly known as locks. Higher amounts of enqueue activity can lead to enqueue hash chains latch contention

For example, if a process is trying to acquire a table level lock (lock type TM), then the process allocates a resource structure having
the combination of that table’s object_id and lock type as a resource key.

Following is a summary of the operational details of the enqueue latching mechanism
1. A hashing function is applied on a string with a combination of . For example, if the table scott.emp is to be locked, and that table has anobject ID of 6509, and if the object type is TM for table, then a hashing function is applied over the string “(TM, 6509)”.
2. The result of the hashing function uniquely identifies a hash bucket to which the resource will always be chained. Changes to or inspection of the hash chain are protected by the enqueue hash chains child latches. So, while holding an enqueue hash chains child latch, that hash chain is searched to see if that resource is already allocated.
3. If the resource exists in the hash chain, then the process tries to see whether the resource is locked in a compatible mode with the lock request mode (assuming that there are no other waiters). If the resource is available or if the resource is locked in compatible mode, then a locking structure is added to the holders’ queue.
4. If there is another process waiting to lock the resource, then a locking structure is added to the waiters’ queue, and the process waits until the resource is available.
5. If the resource does not exist in the hash chain at all, then a new resource structure is added to the enqueue hash chain. A locking structure is also added to the holders’ queue.
6. Changes to the enqueue hash chain are protected by enqueue hash chains latches.

If there is an enormous amount of locking activity, typically in the case of high-end OLTP applications, it is conceivable that there will be a higher amount of activity against enqueue structures and enqueue hash chains child latches. This leads to enqueue hash chains latch contention.

Causes of Enqueue Hash Chains Latch Contention?
1. Excessive locking activity:
OLTP applications tend to have thousands of short transactions per second. Thousands of such short transactions will result in excessive activity against resource structures, lock structures, and enqueue hash chains latches.

2. Deadlock detection algorithm:
Oracle’s deadlock detection algorithm is invoked periodically, and it holds a parent enqueue hash chains latch (thereby holding all child latches) while searching for deadlocks between processes. If there are numerous processes, locks, and resource combinations in the database concurrently, then the deadlock detection algorithm can take long enough to result in contention for enqueue hash chains latches.

How-to Analyze Enqueue Hash Chains Latch Contention?
Step 1: Review Distribution Among Child Latches.

Listing 12-35. Skew among enqueue hash chains child latches
select * from (
select child#, gets, misses, sleeps
from v$latch_children where name like 'enqueue hash%'
order by gets desc )
where rownum <=20;
CHILD# GETS MISSES SLEEPS
---------- ---------- ---------- ----------
28 12297672 211726 0
20 284429 2151 1
...

Step 2: Identify the Enqueue Causing Latch Contention
You can get the below result from the session wait event info in v$session also. p2 is the object_id.
An appropriate action is to enable SQL Trace at level 8 (with waits) on a process suffering from latch contention.

Listing 12-36. An example of SQL trace output lines (version 9.2.0.8)
WAIT #1: nam='enqueue' ela= 301 p1=1414332420 p2=9086 p3=0
WAIT #1: nam='latch free' ela= 1 p1=-1437405580 p2=19 p3=0
WAIT #1: nam='latch free' ela= 1 p1=-1437405580 p2=19 p3=1
WAIT #1: nam='latch free' ela= 1 p1=-1437405580 p2=19 p3=2

From the analysis in this section we can safely conclude that latches protecting TM enqueues are causing performance issues. We also know that the object_id causing trouble is 9086. Querying dba_objects for that ID will reveal the object_name and owner. You need to repeat this exercise for several times to confirm your analysis.

Step 3: Identify Lock Types with High Gets

Listing 12-39. Top 20 lock types by activity
-- v$enqueue_statistics top 20 only..
select * from (
select eq_name, eq_type, total_req#, total_wait#, succ_req#, failed_req#
from v$enqueue_statistics
order by total_Req# desc)
where rownum<21
/
EQ_NAME EQ TOTAL_REQ# TOTAL_WAIT# SUCC_REQ# FAILED_REQ#
------------------------- -- ---------- ----------- ---------- -----------
DML TM 18554 17630 18155 400
Session Migration SE 17709 0 17725 0
...

How-to Resolve Enqueue Hash Chains Latch Contention
1. Avoid Excessive Short Transactions
If possible, modify the code to avoid or reduce this type of activity, and try grouping small transactions into bigger ones.
Note that worrying about short transactions only matters if the lock type behind the contention is TM or TX.

2. Disable Table Level Locks
In some cases, it may be beneficial to avoid table-level locks

Listing 12-40. Disabling table level lock
Alter table emp disable table lock;
alter table backup.emp add (n2 number)
*
ERROR at line 1:
ORA-00069: cannot acquire lock -- table locks disabled for EMP
Alter table emp enable table lock;

Disabling table level locks means that there can be no DDL statements on the target table. But in a production OLTP database suffering from latch contention, not being able to issue DDLs against a table is a very small price to pay to resolve the problem. Of course, you can always enable locks during maintenance on that table

3. Reduce or Avoid Activity Against the Lock Type Causing Contention
Reduce or avoid activity against the lock type underlying the contention. For example, if the lock type is DX, try to reduce distributed transaction by redesigning the application architecture.

Advanced topic for Latch Contention Problems
There are a few exceptions and bugs that can cause excessive latch contention. This section tries to list them, but for informational and debugging purposes only.

v$latch_parent
The v$latch_parent view maintains statistics at the parent latch level. In some cases, if there are many child latches, then the RDBMS code will need to acquire all the child latches. For example, deadlock detection code holds the parent latch of enqueue hash chains during execution of the deadlock detection algorithm, and no child latch can be acquired by any other process until that parent latch is released.

spin_count
The parameter spin_count determines amount of time the process will spin on the CPU before going to sleep. This parameter affects the behavior of all latches. Rarely, adjusting the spin_count is a good solution. Adjusting the spin count is unnecessary, and there are only few situations in which it might need to be adjusted.

_latch_classes and _latch_class_N
If it is not possible to resolve the root cause of the latch contention, then the _latch_classes and _latch_class_n parameters can be used to modify the spin count just for few latches from Oracle Database version 9iR2. For example, if you want to increase the spin count for library cache latches, you can use the method demonstrated in Listing 12-41.
In the listing, which is from Oracle Database version 10.2.0.4. the spin_count for _latch_class_1 is set to 18000, and the _latch_classes parameter specifies that latch number 214 belongs to latch_class_1. Latch number 214 is for the library cache latch.

Listing 12-41. _latch_classes and _latch_class_1 parameter changes for library cache latches
Select latch#, name from v$latch where name='library cache';
LATCH# NAME
---------- --------------------------------------------------
214 library cache
*._latch_class_1=18000
*._latch_classes='214:1'

_latch_wait_posting and _enable_reliable_latch_waits
The parameter _latch_wait_posting (obsoleted as of 11g) controls how long wait latches such as library cache latches are handled. By default, this parameter is set to 1,If a latch is not available after a certain period, the process sleeps until posted by the latch holder.
If _enable_reliable_latch_waits is set to true, then all latch posts are considered reliable. A process waiting for a latch will go to sleep and not wake up until posted. The result is reduced CPU usage, tending to reduce the symptoms of latch contention.

Summary
As always, it is best to understand the root cause of a problem, in this case a latch contention problem, and to resolve that root cause. It is
always better to resolve a root cause than to apply a band-aid.

I read this chapter at least 3 times. I believe it’s very helpful on understanding what’s going on in oracle although maybe we cannot solve it from DB side. This is the last chapter I will write about # Expert Oracle Practices #.

How-to avoid ssh prompt for password

Sometimes, we don’t want to use password authentication when using ssh. For example, I wrapped ssh in my script. I want to it always uses key authentication. If there is no key setup, just exit.

I googled it. Most of them are given the option “PasswordAuthentication no”. But in my test, it didn’t work somehow (probably add it the ssh config file will work, but that’s not what I want):

 ssh dbox036 date
The authenticity of host 'dbox036(10.1.1.1)' can't be established.
RSA key fingerprint is 50:ca:af:6c:27:b3:8d:74:e7:27:97:46:36:43:67:78.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dbox036,10.1.1.1' (RSA) to the list of known hosts.
oracle@dbox036's password: 
Permission denied, please try again.
oracle@dbox036's password: 

I read ssh document and found parameter “NumberOfPasswordPrompts” can be used:

$ ssh  -o "NumberOfPasswordPrompts 0" dbox036
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

If you want to avoid the host key checking, add option “StrictHostKeyChecking no”

$ ssh -o "StrictHostKeyChecking no" -o "NumberOfPasswordPrompts 0" dbox036
Warning: Permanently added 'dbox036,10.1.1.1' (RSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

If you want to avoid the “Warning” message, add option “LogLevel=quiet”, you will get nothing if it didn’t work

$ ssh -o "StrictHostKeyChecking no" -o "NumberOfPasswordPrompts 0"  -o LogLevel=quiet dbox039 date
$

How-to Find & Replace a string in all files within a directory

I run into this several times. I’d like to share how I solve it with shell and perl. Here are the steps:

Step 1. Get a list

for f in `find . -type f`
do
  r=`grep SOURCE_STRING $f`
  if [ "alex${r}" != "alex" ] ; then
    echo $f >> update.lst
  fi
done;

Step 2. Check the list

wc update.lst
cat update.lst
for f in `cat update.lst`
do
  r=`grep SOURCE_STRING $f`
  echo $f
  echo $r
done;

Step 3. Update files

for f in `cat update.lst`
do
    echo $f
    perl -i -pe 's/SOURCE_STRING/TARGET_STRING/g' $f
done;

Step 4. Check it again run Step 2 again

# Expert Oracle Practices # Statistics

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Statistics by Jonathan Lewis, guides us on how to create a suitable gather statistics strategy.

Sometimes oracle optimizer just don’t know how many percent of data will return even with many many statistics, just like ourselves before we run the SQL. Take an example:

How many people in your company earn more than the average salary?
If everyone earns about the same amount, and there are no particularly high-paid or low-paid workers, it’s quite possible that the answer matches the default human response of “about half.”
But if you have even a small number of outliers, the figure could be a long way from 50 percent. Include a handful of highly paid executives, and they drag the average (mean) wage up a little bit, and suddenly nearly everyone is earning less than average.
If include a group of low-paid office cleaners who drag the average (mean) wage down a little bit, and suddenly nearly everyone is earning more than average.

So what does Oracle do? It “guesses” that the answer is 5 percent of the employees (for more than > or less than <). For equal (=), it guesses 1 percent.

According to Oracle, 5 percent more than average, 1 percent equal to average, and 5 percent less than average, so the optimizer has just lost 89 percent people somewhere!

Oracle does, indeed, give us a couple of possible strategies to address this type of problem.

  1. dynamic sampling, since 9i
  2. extended statistics, since 11g

Problems with Statistics

When problems appear, the DBA should be able to go to the developer (or the designer if there is one) and ask questions such as the following:

  1. How many trades do we do each day?
  2. How long does it take the status to change from new to closed?
  3. For deals that are not closed, how many should we assume are in each status?
  4. How many deals does the typical customer do per month?
  5. Are there any products that are traded far more than others?

Creating Statistics
The key ideas we have to hit with the strategy are as follows:

  1. We don’t want to spend all our time creating and maintaining the stats-gathering code.
  2. We don’t want the demand for machine resources to affect user activity.
  3. We would like to have a known upper limit on the work done collecting stats.
  4. Some statistics have to be created at specific points in time.
  5. Some statistics have to be carefully constructed lies.

A few guidelines that may be helpful in designing the strategy are as follows:

  1. Start by looking at the ideas Oracle Corp. has embedded in their automatic statscollection job.
  2. Remember to exclude system-generated objects from any driving SQL.
  3. Make sure you don’t exclude objects by accident.
  4. Some tables don’t need to have their statistics changed very often.
  5. Some tables need statistics changed more than once every 24 hours, perhaps as part of a loading process.
  6. You probably have time to use compute to gather stats on small tables—where small is a fairly arbitrary limit that you decide for your system.
  7. You can often gather reasonable stats on large tables (or individual partitions) with a very small sample size.
  8. At a global level, partitioned tables probably need special treatment (custom code).
  9. A few indexes will need special treatment to adjust their clustering_factor.
  10. You may want to adjust the number of distinct values for a few columns.
  11. A few columns will need histograms, and it’s probably best to write code to construct them.

We need to design a mechanism that is easy to manage but allows special handling for a table-driven approach for exceptions (critical objects at a detailed level).

In a nutshell,  create the mechanisms are as follows:

  1. A default mechanism that generates statistics for objects that need them (for example, Oracle’s gather stale)
  2. A mechanism to identify any objects that dropped out of sight of the default mechanism
  3. A mechanism for unlocking and locking stats on objects that must not be touched by the default mechanism
  4. A follow-up that checks whether any of the gathered stats need to be “fixed,” or whether any special cases need handling
  5. A custom mechanism for dealing with each partitioned object

When implementing such a system, remember three key points:

  1. Keep it simple—don’t let special cases proliferate.
  2. Think about how the system can be self-correcting.
  3. Keep an eye on how Oracle does the difficult bits. Don’t get left behind as Oracle changes strategies.

Finally, if there’s one thing you should remember above all others after reading this chapter, it’s this: the task of maintaining appropriate statistics should be part of the application code. Leaving it as a
DBA (or automatic) task is an abdication of responsibility that’s begging for performance problems to appear.

Others in this chapter
dbms_stats.set_column_stats()

dbms_stats.get_column_stats() — into pl/sql variables
adjust pl/sql variables — typically high-value, and num_distinct
dbms_stats.set_column_stats()

in 11g with the copy_table_stats()
dbms_stats.lock_table_stats()

Where I mention gathering stats, I generally use dbms_stats.gather_table_stats() with cascade set to true to collect index stats, method_opt set to for all columns size 1 to avoid histograms, and estimate_percent set to 100 because the tests usually involve small tables. I also tend to disable CPU costing (also known as system statistics) simply to ensure that my test cases are more likely to be repeatable.

# Expert Oracle Practices # Managing the Very Large Database

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Managing the Very Large Database by Tim Gorman, will tell us the key point to manage VLDB. The key feature that makes VLDB possible in oracle is partition. The important concept is still divide and conquer. Awareness of the power of the EXCHANGE PARTITION operation is the key to optimizing data manipulation, and ultimately this operation facilitates a very large database. Let’s highlight some contents:

Deleting or Updating Millions of Rows
Compare the two approaches :

  1. A parallel UPDATE statement generates insane volumes of rollback/undo as part of normal processing, generates equally insane volumes of redo as part of normal processing, requires searches for the rows to manipulate prior to manipulating them, and performs all work within the Oracle buffer cache, requiring each parallel worker process to latch/lock buffers after they’re retrieved, before manipulating them.
  2. A parallel unrecoverable/nologging CREATE TABLE … AS SELECT operation, using parallel direct-path sessions, has no contention with other sessions in the database nor between its own parallel worker processes. It does not generate rollback/undo or redo, except for the Data Definition Language (DDL) commands involved with extent allocation.

The basic lesson from this is that the fastest mass update is actually an insert. When you have a partitioned table, you can use an insert command to perform an update operation.

Loading Millions of Rows
The scenario described in the previous section shows the basic five-step method of loading data into a partitioned table using the exchange partition:
1. Create a “temporary” table that will later be switched with a target partition in the partitioned target table.
2. Load the temporary table using the fastest methods appropriate for the situation (for example, direct-path insert).
3. Gather cost-based optimizer statistics on the just-loaded temporary table.
4. Create indexes on the just-loaded temporary table to match the LOCAL partitioned indexes on the partitioned target table, if any.
5. Exchange the target partition in the target table with the just-loaded temporary table.

Partition Pruning Pitfalls

  1. The partition key column is in a function or expression : A very common problem arises because of a matter of programming style in dealing with the time component of the DATE datatype.
  2. Compare data with different type causes implicit datatype conversion.

Method of partitioning

  1. Range partitioning uses a finite to infinite list of data values, searched using equivalence operators (that is, =) and range operators ( >, >=, <, <=, BETWEEN,LIKE, and so forth).
  2. List partitioning uses a finite list of identified data values plus DEFAULT for everything else, searched using equivalence operators only.
  3. Hash partitioning uses an infinite list of data values, searched using equivalence operators only.

Information Life Cycle Management
Let’s consider creating tablespaces to hold a month’s worth of ORDER_LINES partitions and their index partitions.

Now, with time-variant tablespaces, we can create a new tablespace on tier 1 storage, initially in read-write status. After two months, we change the tablespace from read-write to read-only status. After six months, we move that tablespace from tier 1 to tier 2 storage. Then, after 12 months, we move that tablespace from tier 2 to tier 3 storage, and after seven years, we drop that tablespace and all the partitions with it. The key here is that the tablespace must be in read-only status before we can consider moving it.

Two major methods to migrate tablespaces from one tier of storage to another:

  1. Using RMAN to first copy the data files in the tablespace to the new location, and then switch the identity of the data files from the old location to the new location
  2. Creating the new tablespace in the new location, using CREATE TABLE … AS SELECT to copy the data from the existing partitions to the new tables, and then using the exchange partition technique to switch the old partitions with the new tables in the new tablespace

Each of these two techniques has its advantages and its disadvantages. The advantage of the RMAN technique is simplicity and no need to take another backup of the moved tablespaces. The advantage of the copy/exchange technique is the opportunity to compress a previously uncompressed partition during the copy phase.

Add one more method : transport tablespaces from an OLTP db to an archive db.

Backup Optimization and Guaranteed Recovery
Key point: Dividing line between read-write and read-only

The majority of the volume of a database that is configured for ILM is read-only and needs to be backed up only infrequently, perhaps once or twice per year, if not less frequently, depending on how much you trust your backup media. It is only the read-write portion of the database, along with the archived redo log files, that need to be backed up frequently, on a daily or weekly basis

Luckily, this planning and design coincides precisely with the goals of ILM, and Oracle provides the toolkit for this in the features of partitioning, read-only tablespaces, and incremental backups by Oracle RMAN.

Breaking large objects into smaller ones, just for the sake of doing so, does not provide much benefit. However, the way in which those smaller objects are managed, how they are stored, according to a life cycle, makes all the difference in the world.

Further Notes on Storage
The question: Which should be used, file systems, “raw” devices, or Automated Space Management (ASM)?

Lots of people are suspicious of Oracle ASM because they feel it is too new, unfamiliar, and buggy. But I have worked with Oracle ASM in some of the most extreme database environments I have ever experienced, and it nicely complements the best features of the Oracle database without limitations.
In the world of very large databases, removing limitations and enabling all of the features available is vital. Try ASM, if you haven’t already. There is a learning curve, but you’ll be glad you invested the effort.

Limits of Which to Be Aware
Database Block Size, there are numerous limits based on database block size, so it worthwhile to choose wisely:

  1. Maximum size of an extent in a segment (4 billion database blocks for dictionarymanaged tablespaces, 2 billion database blocks for locally managed tablespaces)
  2. Maximum control file size (20,000 database blocks)
  3. Maximum data file size (4 million database blocks for SMALLFILE tablespaces, 4 billion database blocks for BIGFILE tablespaces)
  4. Maximum database size (65,533 times the maximum data file sizes cited earlier)

Number of Files in a Database
As recently as Oracle 11g R2, the limit on the number of data files in a database is 65,533.

So, here are two bits of advice for the planning and designing of the very large database:

  1. Be sure to take advantage of data file autoextension.
  2. Consider using BIGFILE tablespaces.

Starting with Oracle 10g, Oracle provides the concept of BIGFILE tablespaces, which are tablespaces that are permitted one and only one data file. This single data file is capable of growing to 4 billion database blocks, whereas the data files of traditional (SMALLFILE) tablespaces can grow to only 4 million database blocks.

Storage That Can Migrate
One such concept is hierarchical storage management (HSM). All of the volumes in the HSM are available at all times, but if a volume is not used for a prespecified period of time, most of the volume (except for a small header element) is migrated transparently off to lower-cost storage. HSM volumes are often ideal as tier 3, or archival-class, storage for infrequently used data.

Tablespaces created on (or migrated to) HSM volumes had better be dictionary managed, so as not to run into the file-open latency problem every time the DBA_EXTENTS and DBA_FREE_SPACE views are queried.

Parameter READ_ONLY_OPEN_DELAYED
The parameter READ_ONLY_OPEN_DELAYED changes this behavior at instance startup. Data files belonging to tablespaces that are read-only are only touched when they are accessed, not when the database is opened. So, as the number of data files grows larger, and as the number of read-only tablespaces increases, you can keep the time needed to restart the database instance to a minimum by setting this parameter to TRUE, especially if you have read-only tablespaces based on migrateable HSM storage.

# Expert Oracle Practices # Choosing a Performance Optimization Method

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Choosing a Performance Optimization Method by Charles Hooper and Randolf Geist again. It’s a chapter to tell us how to make the right choice when there is a problem. Finding the right direction is the first very important step. Oracle performance issues are either system level or specified SQL/Module level most of the time. Sometimes system level problem may be caused by some SQL/Module level. Sometimes they are mixed.

The time model views V$SYS_TIME_MODEL and V$SESS_TIME_MODEL are very important to performance analysis because performance is all about time. I think OEM (oracle enterprise management) used it a lot. I didn’t pay enough attention to it at the past. I will revisit the “Sampling Performance with Low Overhead” script by the same authors in previous chapter.

Once again, this chapter mentioned “GATHER_PLAN_STATISTICS DISPLAY_CURSOR ALLSTATS LAST” method again and again

SQL>  ALTER SESSION SET STATISTICS_LEVEL='ALL';
SQL>  run the sql
SQL>  SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));
--add hint /*+ GATHER_PLAN_STATISTICS */ to the SQL also works if you cannot set statistics_level to all
--This one is one of my favorite. By comparing the oracle estimated rows and actual rows of each step, you can quickly know where is wrong in the plan.

Some hints mentioned

/*+ ORDERED */ (Alternatively, use a/*+ LEADING */ hint with the first couple of table aliases specified.)
/*+ OPTIMIZER_FEATURES_ENABLE(’9.2.0′) */
/*+ CARDINALITY(t 200) */, undocumented, but in V$SQL_HINT
/*+ OPT_ESTIMATE(TABLE, t, ROWS=200) */, used by SQL profile, should be safe to use as long as SQL profile is not retired
/*+ dynamic_sampling(t 10) */
NO_EXPAND hint, which allows the optimizer in recent Oracle releases to use the IN-LIST ITERATOR operation instead.

Set table statistics
EXEC DBMS_STATS.SET_TABLE_STATS(OWNNAME=>USER, TABNAME=>’TEMP_PART_PRICE_DATE’, NUMROWS=>200, NUMBLKS=>15, NO_INVALIDATE=>FALSE);.

Clearly describe a problem
a. How the problem was identified (for example, end-user complaints, order system hanging)
b. The severity of the problem (for example, orders cannot be processed)
c. A description of the steps required to reproduce the issue
d. Clear distinction between what is working and what is not working (for example, the order entry form is hanging, whereas the invoice processing is working fine)
e. The expected or acceptable behavior
f. What has been done so far to rectify the issue (for example, object statistics have been checked, but no anomalies were identified)

Reproduces the issue at will 
a. Covers the inefficient SQL identified
b. Is as generic as possible, allowing you to easily transfer the identified issue to Oracle Support if required
c. Shows evidence that the statement identified is inefficient (for example, the SQL*Plus timing facility reports that the statement takes 306 seconds to complete instead of expected subsecond execution time)

Others
“Verify That the Issue Is a Database Issue” is also very important. It didn’t have to be a database issue but we absolutely need prove that before say it.

Starting with release 10.1, the default METHOD_OPT value is FOR ALL COLUMNS SIZE AUTO, which might generate histograms on columns that do not improve the join selectivity/cardinality estimates

Oracle 11.2 introduced the automatic parallel degree (AUTO DOP) feature, which handles the degree of parallelism differently (controlled by the new parameters PARALLEL_DEGREE_POLICY and PARALLEL_MIN_TIME_THRESHOLD). The session and systemwide statistics in V$SESSTAT, V$MYSTAT, and V$SYSSTAT may be checked to determine the number of parallel executions that have been downgraded by examining the delta values for the statistics like ‘Parallel operations%’

The optimizer environment that existed during the initial hard parse (visible in the view V$SQL_OPTIMIZER_ENV) must match that of the session (visible in the view V$SES_OPTIMIZER_ENV) submitting the SQL statement to the database instance.

Metalink Doc ID 744664.1 “High Parse Time and Memory Usage for Query with Complex OR Predicates or IN-Lists After Upgrade to 10.2 or 11.1

Metalink Doc ID 32895.1, “SQL Parsing Flow Diagram”

Metalink Doc IDs 604022.1 and 144194.1 Use hints to obtain the desired execution plan and lock the plan by using an outline

Follow

Get every new post delivered to your Inbox.