Case Study: Optimize a delete job

I optimized a delete job recently. I’d like to share what I did as follows.

Backgrounds:

It’s a 11gr2 DB for third-party a application. There is a table to store application performance data. The application has a job to purge rows beyond retention time. As we deployed the software quiet widely, this table becomes very large. It generates about 100M rows every day and 10M/hour at peak time. That means the delete job need be able to delete that much data in time.

Problem:

The delete job cannot catch up, as a result the table became bigger and bigger.

Immediate reaction:

  1. Truncate the table
  2. Let application gather performance data at a lower level, so the purge job can catch up before we get it fixed.

Long-term solution:

Partition the table. This is my first thought to this problem as this situation is exactly what partitions are for. While I don’t want to create partitions every day/month, I can create partitions beforehand of course. But I want to reuse the purged partitions, like what RRD did, so that I don’t need to keep creating/dropping partitions. To make it easy to reference, I will call it circular partition. I will describe the detail later. Later the vendor said they can add this change to the next-next release. So I still need to find a short term solution.

Short-term solution:

  1. Separate the delete job from other jobs. Previously the delete job is part of a big job which includes lots of other steps which takes time.
  2. Use the purge by delete script instead of a simple delete SQL.
  3. Run parallel delete jobs.
  4. Pre-load the rowid to be deleted as we already know which rows will be deleted at next hour.
  5. Later, I found the first 3 changes are enough to delete 10M rows in 40 minutes. With the 4th one, the delete job can be done in 5 minutes, far less than the target time 1 hour.

That’s the high level description of the issue and solutions. If you are interested in the detail, let’s continue.

Background details:

There were 2 tables involved:

  1. TAB_SAMPLE_TIME has the map of sample_time and time_id.
  2. TAB_HIST_STAT has the detail data and a time_id column.  

In our case, the application takes a snap shot every 2 minutes that means a new row is inserted into TAB_SAMPLE_TIME per 2 minutes and hundreds of thousands of rows inserted into TAB_HIST_STAT at the same time.

The original delete SQL are like this:

delete from TAB_SAMPLE_TIME
 WHERE SAMPLE_TIME AND ROLLUP_COUNTER IS NOT NULL;

delete from TAB_HIST_STAT  A
 WHERE NOT EXISTS(SELECT 1 FROM TAB_SAMPLE_TIME
 WHERE TIME_ID=A.TIME_ID);

Long-term solution details:

Let’s say we need to keep at most 7 days (currently 3 days) data. We have a partition for each hour. We need to add a partition_key column. It can be updated by application, or a trigger, or using virtual column(11g).

– 1*24 partition to hold the current data.

– 7*24 partions to keep, guarantee 7 days history data

– 8 paritions to be truncated, have 8 free slots in case truncate didn’t success so we have 8 hours to fix it

Total 8*24+8=200 partitions

CREATE TABLE “TAB_HIST_STAT _PAR”

   (    “CID” NUMBER(38,0) NOT NULL ENABLE,

        “TIME_ID” NUMBER(38,0) NOT NULL ENABLE,

        “STAT_VAL” NUMBER(38,0) NOT NULL ENABLE,

         CONSTRAINT “PK_TAB_HIST_STAT_P” PRIMARY KEY (“CID”, “TIME_ID”,”CREATE_DATE”) ENABLE,

        “CREATE_DATE” DATE DEFAULT SYSDATE

        PARTITION_KEY number        

   )

ORGANIZATION INDEX NOCOMPRESS

PARTITION BY RANGE (PARTITION_KEY) (

PARTITION BY RANGE

(PARTITION_KEY)

(

PARTITION PART000 VALUES LESS THAN (1),

PARTITION PART001 VALUES LESS THAN (2),

PARTITION PART002 VALUES LESS THAN (3),

PARTITION PART199 VALUES LESS THAN (200),

PARTITION PARTMAX VALUES LESS THAN (MAXVALUE)

);

Calculate the current partition:

mod((trunc(sysdate) – to_date(’01-JAN-2011′,’DD-MON-YYYY’))*24+to_number(to_char(sysdate,’hh24′)), 200)

Truncate 8 partitions ahead every hour:

SELECT ‘PART’||TO_CHAR(mod((trunc(sysdate) – to_date(’01-JAN-2011′,’DD-MON-YYYY’))*24+to_number(to_char(sysdate,’hh24′))+1, 200),’FM000′) FROM DUAL;

SELECT ‘PART’||TO_CHAR(mod((trunc(sysdate) – to_date(’01-JAN-2011′,’DD-MON-YYYY’))*24+to_number(to_char(sysdate,’hh24′))+2, 200),’FM000′) FROM DUAL;

SELECT ‘PART’||TO_CHAR(mod((trunc(sysdate) – to_date(’01-JAN-2011′,’DD-MON-YYYY’))*24+to_number(to_char(sysdate,’hh24′))+8, 200),’FM000′) FROM DUAL;

 

Short-term solution details:

  1.  For the original purge job, it has 2 parts: load rowid and delete rows. I found the load part took most time (> 30 minutes) and it will increase as the base table size increasing. So the idea is to separate load and delete. I created purge_load_rowid and purge_delete_by_rowid to do so.
  2.  We keep 3 days data and always delete the oldest data so we can pre-load the rowid to be deleted. To do that, I added “+1/24” to the where clause in load part. “WHERE T.SAMPLE_TIME<=sys_extract_utc(systimestamp)-v_purge_interval+1/24”. So it will load the next hour’s rowid data.
  3. The coming problem is how to make load and delete procedures works at the same time. To do that, I created 2 tables TAB_ROWID_ODD  and TAB_ROWID_EVEN, one for load job and the other for delete job. So I need a table to record their status. That’s TAB_ROWID_TAB_status. The status’s life cycle is free -> loading -> loaded -> deleting -> free.
  4. To improve the delete speed, we need to know how the data were inserted to TAB_HIST_STAT. We know the data is inserted by SAMPLE_TIME, represented by TIME_ID in TAB_HIST_STAT. Also we know delete the continuous rows is the most efficient way in Oracle. To do this, we’d better let every delete process to delete continuous data, instead of scan all the data and delete 1/10 of them. The best way is to separate the data in range. That comes the idea to create range partition for TAB_ROWID_ODD  and TAB_ROWID_EVEN by TIME_ID. So I create the partitions by using mod(TIME_ID,parallel_num).
  5. That’s almost all except some little tricks. For instance, you need to run “execute immediate ‘grant drop any table to APP_USER’;” before “execute immediate ‘truncate table ‘||v_tab_name||’  reuse storage’;” in pl/sql procedure to avoid insufficient privilege issue.

Some code clips:

3 additional tables created:

create table TAB_ROWID_ODD (
RID UROWID,
PARTITION_KEY number
)PARTITION BY RANGE (PARTITION_KEY) (
PARTITION PART00 VALUES LESS THAN (1),
PARTITION PART01 VALUES LESS THAN (2),
PARTITION PART02 VALUES LESS THAN (3),
PARTITION PART03 VALUES LESS THAN (4),
PARTITION PART04 VALUES LESS THAN (5),
PARTITION PART05 VALUES LESS THAN (6),
PARTITION PART06 VALUES LESS THAN (7),
PARTITION PART07 VALUES LESS THAN (8),
PARTITION PART08 VALUES LESS THAN (9),
PARTITION PART09 VALUES LESS THAN (10),
PARTITION PARTMAX VALUES LESS THAN (MAXVALUE)
);

create table TAB_ROWID_EVEN (
RID UROWID,
PARTITION_KEY number
)PARTITION BY RANGE (PARTITION_KEY) (
PARTITION PART00 VALUES LESS THAN (1),
PARTITION PART01 VALUES LESS THAN (2),
PARTITION PART02 VALUES LESS THAN (3),
PARTITION PART03 VALUES LESS THAN (4),
PARTITION PART04 VALUES LESS THAN (5),
PARTITION PART05 VALUES LESS THAN (6),
PARTITION PART06 VALUES LESS THAN (7),
PARTITION PART07 VALUES LESS THAN (8),
PARTITION PART08 VALUES LESS THAN (9),
PARTITION PART09 VALUES LESS THAN (10),
PARTITION PARTMAX VALUES LESS THAN (MAXVALUE)
);

create table TAB_ROWID_TAB_STATUS(
tab_name varchar2(32),
status varchar2(100),  --deleting, free, loading, loaded
last_update_date date
);

Load job in pl/sql:

create or replace procedure purge_load_rowid
AS
v_purge_interval number;
v_total PLS_INTEGER:=0;
v_starttime date;
v_endtime date;
v_cnt number := 0;
v_tab_name varchar2(1000) := 'null';
v_code varchar2(4000);
v_errm varchar2(4000);
v_sql varchar2(4000) ;

begin
 begin
   select tab_name into v_tab_name
    from (select * from TAB_ROWID_TAB_status where status='free' order by LAST_UPDATE_DATE) where rownum  < 2 ;
 EXCEPTION
   WHEN NO_DATA_FOUND THEN
     --if no available table, use the olddest table no matter what's the status, maybe deleting, loading are wrong last time
     select tab_name into v_tab_name
      from (select * from TAB_ROWID_TAB_status order by LAST_UPDATE_DATE) where rownum end;

 select interval_length/86400 into v_purge_interval from vpx_STAT_interval_def WHERE INTERVAL_SEQ_NUM=1;

 --load 1 more hour data
 delete from TAB_SAMPLE_TIME  T
 WHERE T.SAMPLE_TIME AND T.ROLLUP_COUNTER IS NOT NULL;

 /* OLD DELETE Statement
delete from TAB_HIST_STAT A
 WHERE NOT EXISTS(SELECT 1 FROM TAB_SAMPLE_TIME  T
 WHERE T.TIME_id=A.time_id); */
commit;

execute immediate 'grant drop any table to APP_USER';
execute immediate 'truncate table '||v_tab_name||'  reuse storage';
update TAB_ROWID_TAB_status set status='loading',last_update_date = sysdate, deleting_thread=0 where tab_name = v_tab_name;
commit;

select sysdate into v_starttime from dual;

execute immediate 'alter session set db_file_multiblock_read_count=128';
execute immediate 'alter session set sort_area_size=80000000';
execute immediate 'alter session set hash_area_size=80000000';

v_sql := 'insert into '||v_tab_name;
v_sql := v_sql||' select/*+ index_ffs(A PK_TAB_HIST_STAT) */ rowid as rid, mod(TIME_ID,10) PARTITION_KEY';
v_sql := v_sql||' from TAB_HIST_STAT A';
v_sql := v_sql||' WHERE NOT EXISTS (SELECT 1 FROM TAB_SAMPLE_TIME  T WHERE T.TIME_ID=A.TIME_ID)';
--v_sql := v_sql||' order by rowid';
execute immediate v_sql;

v_total := SQL%ROWCOUNT;
update TAB_ROWID_TAB_status set status='loaded',last_update_date = sysdate where tab_name = v_tab_name;

commit;

select sysdate into v_endtime from dual;
insert into TAB_PURGE_LOG(begin_time,end_time, row_deleted, what) values (v_starttime,v_endtime,v_total, 'load');
commit;

exception
 when others then
 v_code := SQLCODE;
 v_errm := SUBSTR(SQLERRM, 1, 128);
 if v_tab_name <> 'null' then
   --if load failed, still set the status to free so it can be reused next time
   update TAB_ROWID_TAB_status set status='free',last_update_date = sysdate where tab_name = v_tab_name;
 end if;
 insert into TAB_PURGE_LOG(begin_time,end_time, what, msg) values (v_starttime,sysdate,'load','Error code ' || v_code || ': ' || v_errm);
 commit;
end purge_load_rowid;
/

Delete job in pl/sql:

create or replace procedure purge_delete_by_rowid(p_num number)
AS
v_purge_interval number;
v_deleletotal PLS_INTEGER:=0;
v_starttime date;
v_endtime date;
type array is table of urowid INDEX BY PLS_INTEGER;
rids array;
batch_rows  PLS_INTEGER := 1000;
v_cnt number := 0;
TYPE rowid_cur IS REF CURSOR;
c1 rowid_cur;
v_tab_name varchar2(1000) := 'null';
v_par_name varchar2(1000) := 'null';
v_code varchar2(4000);
v_errm varchar2(4000);
v_sql varchar2(4000);

begin
-- New Delete Statement
select sysdate into v_starttime from dual;

select tab_name into v_tab_name
  from (select * from TAB_ROWID_TAB_status where status='loaded' order by last_update_date desc) where rownum < 2 ;
update TAB_ROWID_TAB_status set deleting_thread=deleting_thread+1,last_update_date = sysdate where tab_name = v_tab_name;
update TAB_ROWID_TAB_status set status='deleting',last_update_date = sysdate where deleting_thread=10 and tab_name = v_tab_name;
commit;
dbms_lock.sleep(10);            --sleep a while to wait the status get updated to deleting

execute immediate 'alter session set db_file_multiblock_read_count=128';
execute immediate 'alter session set sort_area_size=80000000';
execute immediate 'alter session set hash_area_size=80000000';

execute immediate 'select count(*) from '||v_tab_name||' where PARTITION_KEY = '||p_num into v_cnt;

v_sql := 'select rid from '||v_tab_name||' where PARTITION_KEY = '||p_num;
open c1 for v_sql;
    loop
      fetch c1 bulk collect into rids limit batch_rows;
      exit when rids.count = 0;
      forall i in rids.first..rids.last
       delete from  TAB_HIST_STAT A where rowid=rids(i) ;
      v_deleletotal := v_deleletotal + SQL%ROWCOUNT;
      commit;
     dbms_application_info.set_client_info('parallel '||p_num||' deleted '||v_deleletotal||' rows out of '||v_cnt);
    end loop;
close c1;

update TAB_ROWID_TAB_status set deleting_thread=deleting_thread-1,last_update_date = sysdate where tab_name = v_tab_name;
update TAB_ROWID_TAB_status set status='free',last_update_date = sysdate where deleting_thread=0 and tab_name = v_tab_name;
commit;

select sysdate into v_endtime from dual;
insert into TAB_PURGE_LOG(begin_time,end_time, row_deleted, parallel_num, what) values (v_starttime,v_endtime,v_deleletotal, p_num, 'delete');
commit;

exception
 when others then
 v_code := SQLCODE;
 v_errm := SUBSTR(SQLERRM, 1, 128);
 insert into TAB_PURGE_LOG(begin_time,end_time, what, msg) values (v_starttime,sysdate,'delete','Error code ' || v_code || ': ' || v_errm);
 commit;
end purge_delete_by_rowid;
/

The load job will run once per hour. The delete job need to run 10 instances at the same time as follows:

BEGIN purge_load_rowid; END;
BEGIN purge_delete_by_rowid(0); END;
BEGIN purge_delete_by_rowid(1); END;
BEGIN purge_delete_by_rowid(2); END;
....
BEGIN purge_delete_by_rowid(9); END;

The object names are artificial for well-known reasons. As always, your comments are welcome.

Advertisements

About Alex Zeng
I would be very happy if this blog can help you. I appreciate every honest comments. Please forgive me if I'm too busy to reply your comments in time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: