# Expert Oracle Practices # Managing the Very Large Database

I am reading Expert Oracle Practices by Oracle Database Administration from the Oak Table. I’d like to summary and share what I learned.

This chapter, Managing the Very Large Database by Tim Gorman, will tell us the key point to manage VLDB. The key feature that makes VLDB possible in oracle is partition. The important concept is still divide and conquer. Awareness of the power of the EXCHANGE PARTITION operation is the key to optimizing data manipulation, and ultimately this operation facilitates a very large database. Let’s highlight some contents:

Deleting or Updating Millions of Rows
Compare the two approaches :

  1. A parallel UPDATE statement generates insane volumes of rollback/undo as part of normal processing, generates equally insane volumes of redo as part of normal processing, requires searches for the rows to manipulate prior to manipulating them, and performs all work within the Oracle buffer cache, requiring each parallel worker process to latch/lock buffers after they’re retrieved, before manipulating them.
  2. A parallel unrecoverable/nologging CREATE TABLE … AS SELECT operation, using parallel direct-path sessions, has no contention with other sessions in the database nor between its own parallel worker processes. It does not generate rollback/undo or redo, except for the Data Definition Language (DDL) commands involved with extent allocation.

The basic lesson from this is that the fastest mass update is actually an insert. When you have a partitioned table, you can use an insert command to perform an update operation.

Loading Millions of Rows
The scenario described in the previous section shows the basic five-step method of loading data into a partitioned table using the exchange partition:
1. Create a “temporary” table that will later be switched with a target partition in the partitioned target table.
2. Load the temporary table using the fastest methods appropriate for the situation (for example, direct-path insert).
3. Gather cost-based optimizer statistics on the just-loaded temporary table.
4. Create indexes on the just-loaded temporary table to match the LOCAL partitioned indexes on the partitioned target table, if any.
5. Exchange the target partition in the target table with the just-loaded temporary table.

Partition Pruning Pitfalls

  1. The partition key column is in a function or expression : A very common problem arises because of a matter of programming style in dealing with the time component of the DATE datatype.
  2. Compare data with different type causes implicit datatype conversion.

Method of partitioning

  1. Range partitioning uses a finite to infinite list of data values, searched using equivalence operators (that is, =) and range operators ( >, >=, <, <=, BETWEEN,LIKE, and so forth).
  2. List partitioning uses a finite list of identified data values plus DEFAULT for everything else, searched using equivalence operators only.
  3. Hash partitioning uses an infinite list of data values, searched using equivalence operators only.

Information Life Cycle Management
Let’s consider creating tablespaces to hold a month’s worth of ORDER_LINES partitions and their index partitions.

Now, with time-variant tablespaces, we can create a new tablespace on tier 1 storage, initially in read-write status. After two months, we change the tablespace from read-write to read-only status. After six months, we move that tablespace from tier 1 to tier 2 storage. Then, after 12 months, we move that tablespace from tier 2 to tier 3 storage, and after seven years, we drop that tablespace and all the partitions with it. The key here is that the tablespace must be in read-only status before we can consider moving it.

Two major methods to migrate tablespaces from one tier of storage to another:

  1. Using RMAN to first copy the data files in the tablespace to the new location, and then switch the identity of the data files from the old location to the new location
  2. Creating the new tablespace in the new location, using CREATE TABLE … AS SELECT to copy the data from the existing partitions to the new tables, and then using the exchange partition technique to switch the old partitions with the new tables in the new tablespace

Each of these two techniques has its advantages and its disadvantages. The advantage of the RMAN technique is simplicity and no need to take another backup of the moved tablespaces. The advantage of the copy/exchange technique is the opportunity to compress a previously uncompressed partition during the copy phase.

Add one more method : transport tablespaces from an OLTP db to an archive db.

Backup Optimization and Guaranteed Recovery
Key point: Dividing line between read-write and read-only

The majority of the volume of a database that is configured for ILM is read-only and needs to be backed up only infrequently, perhaps once or twice per year, if not less frequently, depending on how much you trust your backup media. It is only the read-write portion of the database, along with the archived redo log files, that need to be backed up frequently, on a daily or weekly basis

Luckily, this planning and design coincides precisely with the goals of ILM, and Oracle provides the toolkit for this in the features of partitioning, read-only tablespaces, and incremental backups by Oracle RMAN.

Breaking large objects into smaller ones, just for the sake of doing so, does not provide much benefit. However, the way in which those smaller objects are managed, how they are stored, according to a life cycle, makes all the difference in the world.

Further Notes on Storage
The question: Which should be used, file systems, “raw” devices, or Automated Space Management (ASM)?

Lots of people are suspicious of Oracle ASM because they feel it is too new, unfamiliar, and buggy. But I have worked with Oracle ASM in some of the most extreme database environments I have ever experienced, and it nicely complements the best features of the Oracle database without limitations.
In the world of very large databases, removing limitations and enabling all of the features available is vital. Try ASM, if you haven’t already. There is a learning curve, but you’ll be glad you invested the effort.

Limits of Which to Be Aware
Database Block Size, there are numerous limits based on database block size, so it worthwhile to choose wisely:

  1. Maximum size of an extent in a segment (4 billion database blocks for dictionarymanaged tablespaces, 2 billion database blocks for locally managed tablespaces)
  2. Maximum control file size (20,000 database blocks)
  3. Maximum data file size (4 million database blocks for SMALLFILE tablespaces, 4 billion database blocks for BIGFILE tablespaces)
  4. Maximum database size (65,533 times the maximum data file sizes cited earlier)

Number of Files in a Database
As recently as Oracle 11g R2, the limit on the number of data files in a database is 65,533.

So, here are two bits of advice for the planning and designing of the very large database:

  1. Be sure to take advantage of data file autoextension.
  2. Consider using BIGFILE tablespaces.

Starting with Oracle 10g, Oracle provides the concept of BIGFILE tablespaces, which are tablespaces that are permitted one and only one data file. This single data file is capable of growing to 4 billion database blocks, whereas the data files of traditional (SMALLFILE) tablespaces can grow to only 4 million database blocks.

Storage That Can Migrate
One such concept is hierarchical storage management (HSM). All of the volumes in the HSM are available at all times, but if a volume is not used for a prespecified period of time, most of the volume (except for a small header element) is migrated transparently off to lower-cost storage. HSM volumes are often ideal as tier 3, or archival-class, storage for infrequently used data.

Tablespaces created on (or migrated to) HSM volumes had better be dictionary managed, so as not to run into the file-open latency problem every time the DBA_EXTENTS and DBA_FREE_SPACE views are queried.

The parameter READ_ONLY_OPEN_DELAYED changes this behavior at instance startup. Data files belonging to tablespaces that are read-only are only touched when they are accessed, not when the database is opened. So, as the number of data files grows larger, and as the number of read-only tablespaces increases, you can keep the time needed to restart the database instance to a minimum by setting this parameter to TRUE, especially if you have read-only tablespaces based on migrateable HSM storage.


About Alex Zeng
I would be very happy if this blog can help you. I appreciate every honest comments. Please forgive me if I'm too busy to reply your comments in time.

One Response to # Expert Oracle Practices # Managing the Very Large Database

  1. Nikhil Mistry says:

    this is very informative, it would be very helpful if presented with examples, command etc for newbe like myself in VLDB field.
    thanks very much for this article
    Nikhil Mistry

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: