# Expert Oracle Practices # Statistics

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Statistics by Jonathan Lewis, guides us on how to create a suitable gather statistics strategy.

Sometimes oracle optimizer just don’t know how many percent of data will return even with many many statistics, just like ourselves before we run the SQL. Take an example:

How many people in your company earn more than the average salary?
If everyone earns about the same amount, and there are no particularly high-paid or low-paid workers, it’s quite possible that the answer matches the default human response of “about half.”
But if you have even a small number of outliers, the figure could be a long way from 50 percent. Include a handful of highly paid executives, and they drag the average (mean) wage up a little bit, and suddenly nearly everyone is earning less than average.
If include a group of low-paid office cleaners who drag the average (mean) wage down a little bit, and suddenly nearly everyone is earning more than average.

So what does Oracle do? It “guesses” that the answer is 5 percent of the employees (for more than > or less than <). For equal (=), it guesses 1 percent.

According to Oracle, 5 percent more than average, 1 percent equal to average, and 5 percent less than average, so the optimizer has just lost 89 percent people somewhere!

Oracle does, indeed, give us a couple of possible strategies to address this type of problem.

  1. dynamic sampling, since 9i
  2. extended statistics, since 11g

Problems with Statistics

When problems appear, the DBA should be able to go to the developer (or the designer if there is one) and ask questions such as the following:

  1. How many trades do we do each day?
  2. How long does it take the status to change from new to closed?
  3. For deals that are not closed, how many should we assume are in each status?
  4. How many deals does the typical customer do per month?
  5. Are there any products that are traded far more than others?

Creating Statistics
The key ideas we have to hit with the strategy are as follows:

  1. We don’t want to spend all our time creating and maintaining the stats-gathering code.
  2. We don’t want the demand for machine resources to affect user activity.
  3. We would like to have a known upper limit on the work done collecting stats.
  4. Some statistics have to be created at specific points in time.
  5. Some statistics have to be carefully constructed lies.

A few guidelines that may be helpful in designing the strategy are as follows:

  1. Start by looking at the ideas Oracle Corp. has embedded in their automatic statscollection job.
  2. Remember to exclude system-generated objects from any driving SQL.
  3. Make sure you don’t exclude objects by accident.
  4. Some tables don’t need to have their statistics changed very often.
  5. Some tables need statistics changed more than once every 24 hours, perhaps as part of a loading process.
  6. You probably have time to use compute to gather stats on small tables—where small is a fairly arbitrary limit that you decide for your system.
  7. You can often gather reasonable stats on large tables (or individual partitions) with a very small sample size.
  8. At a global level, partitioned tables probably need special treatment (custom code).
  9. A few indexes will need special treatment to adjust their clustering_factor.
  10. You may want to adjust the number of distinct values for a few columns.
  11. A few columns will need histograms, and it’s probably best to write code to construct them.

We need to design a mechanism that is easy to manage but allows special handling for a table-driven approach for exceptions (critical objects at a detailed level).

In a nutshell,  create the mechanisms are as follows:

  1. A default mechanism that generates statistics for objects that need them (for example, Oracle’s gather stale)
  2. A mechanism to identify any objects that dropped out of sight of the default mechanism
  3. A mechanism for unlocking and locking stats on objects that must not be touched by the default mechanism
  4. A follow-up that checks whether any of the gathered stats need to be “fixed,” or whether any special cases need handling
  5. A custom mechanism for dealing with each partitioned object

When implementing such a system, remember three key points:

  1. Keep it simple—don’t let special cases proliferate.
  2. Think about how the system can be self-correcting.
  3. Keep an eye on how Oracle does the difficult bits. Don’t get left behind as Oracle changes strategies.

Finally, if there’s one thing you should remember above all others after reading this chapter, it’s this: the task of maintaining appropriate statistics should be part of the application code. Leaving it as a
DBA (or automatic) task is an abdication of responsibility that’s begging for performance problems to appear.

Others in this chapter

dbms_stats.get_column_stats() — into pl/sql variables
adjust pl/sql variables — typically high-value, and num_distinct

in 11g with the copy_table_stats()

Where I mention gathering stats, I generally use dbms_stats.gather_table_stats() with cascade set to true to collect index stats, method_opt set to for all columns size 1 to avoid histograms, and estimate_percent set to 100 because the tests usually involve small tables. I also tend to disable CPU costing (also known as system statistics) simply to ensure that my test cases are more likely to be repeatable.

About Alex Zeng
I would be very happy if this blog can help you. I appreciate every honest comments. Please forgive me if I'm too busy to reply your comments in time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: