How to use hinted SQL to fix bad SQL plan

As I said in a previously blog Outline, SQL profile and SQL Plan Management(SPM), outline is the oldest approach to fix SQL plan, but it’s the easiest one to use, and works for all oracle version, and do NOT need extra license.

I’d like to elaborate the most frequently used scenario: a SQL is using a bad SQL plan. We can use SQL hints to get a GOOD plan. We want to use outline to fix the SQL plan to the GOOD one. In another word, we want to use a SQL plan with different SQL text(hints) but logically the same.

Let’s create an test scenario:
1. Create a test table


ALEX@ZENG: SQL> create table t (id number, name varchar2(100));

Table created.

ALEX@ZENG: SQL> insert into t select 1, object_name from dba_objects;

74562 rows created.

ALEX@ZENG: SQL> create index t_idx1 on t(id);

Index created.

ALEX@ZENG: SQL> exec dbms_stats.gather_table_stats('ALEX','T');

PL/SQL procedure successfully completed.

ALEX@ZENG: SQL> show parameter optimizer_features_enable

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      10.2.0.4

2. Create the bad SQL plan scenario


ALEX@ZENG: SQL> update t set id=rownum;

74563 rows updated.

ALEX@ZENG: SQL> commit;

Commit complete.
ALEX@ZENG: SQL> set autotrace traceonly
ALEX@ZENG: SQL> select * from t where id=1;

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2153619298

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 74556 |  1820K|    74   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    | 74556 |  1820K|    74   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=1)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        319  consistent gets
          0  physical reads
          0  redo size
        593  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

3. Use hint to get a better plan

--it's better to use index scan because there's only one row with id=1 now
--We can get the plan by adding hint

ALEX@ZENG: SQL> explain plan for select /*+ index(t t_idx1) */ * from t where id=1;

Explained.

ALEX@ZENG: SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3292636276

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        | 74556 |  1820K|   150   (2)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      | 74556 |  1820K|   150   (2)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | T_IDX1 | 74556 |       |    55   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

14 rows selected.

Since INDEX RANGE SCAN is better plan, we want to use it to replace the FULL TABLE SCAN plan.

Step 1. Get the bad SQL hash value


ALEX@ZENG: SQL> select address, hash_value, sql_text from v$sql where sql_text like 'select * from t%';

ADDRESS          HASH_VALUE SQL_TEXT
---------------- ---------- ------------------------------------------------------------
000000045F900938  716428968 select * from t where id=1
...

Step 2. Create a SQL file with hints

$ cat good.sql
select /*+ index(t t_idx1) */ * from t where id=1;
$

Step 3. Fix the SQL plan using script fix_plan.sh

#!/bin/bash

if [ $# -lt 3 ]; then
cat <<EOF
Fix given SQL plan using given sql in file
  usage:    fix_plan.sh <hash_value> <hinted_sql_file> <owner>
  examples: fix_plan.sh 716428968 good.sql ALEX
EOF
exit 1
fi

HASH_VALUE=$1
SQL_FILE=$2
OL_OWNER=$3

echo "HASH_VALUE : $HASH_VALUE"
echo "SQL_FILE   : $SQL_FILE"
echo "OL_OWNER   : $OL_OWNER"
echo ""

#Create outline by hash_value
function create_ol_from_hashvalue {
HASH_VALUE=$1
OL_NAME=$2
OL_OWNER=$3

#generate create outline sql
#I didn't use dbms_outln.create_outline, because it cannot create given name outline
# and there's no hash value in V$SQL and DBA_OUTLINES to associate the two 
# according to "How To Match a SQL Statement to a Stored Outline (Doc ID 743312.1)"
sqlplus -S "/ as sysdba" > /tmp/tmp_$OL_NAME.sql <<EOF
set feedback off
set serveroutput on size unlimited
declare
v_sqltext varchar2(32000);
begin
  --get sql text
  select dbms_lob.substr(SQL_FULLTEXT, 30000, 1 ) into v_sqltext from v\$sql where hash_value = $HASH_VALUE and rownum=1;

  dbms_output.put_line('alter session set current_schema = $OL_OWNER;');
  v_sqltext := 'create or replace outline $OL_NAME for category TEMP_PLAN on ' || chr(10) || v_sqltext || chr(10) ||';';
  dbms_output.put_line(v_sqltext);
  dbms_output.put_line('exit;');
end;
/
EOF

sqlplus -S "/ as sysdba" @/tmp/tmp_$OL_NAME.sql
}

#Create outline from sql file
function create_ol_from_sqlfile {
SQL_FILE=$1
OL_NAME=$2
OL_OWNER=$3

#generate create outline sql
cat > /tmp/tmp_$OL_NAME.sql <<EOF
alter session set current_schema = $OL_OWNER;
create or replace outline $OL_NAME for category TEMP_PLAN on
`cat $SQL_FILE`
exit;
EOF

sqlplus -S "/ as sysdba" @/tmp/tmp_$OL_NAME.sql

}

#Exchange outlines, make GOOD_SQL plan to GOOD_PLAN category
#Ref: How to Edit a Stored Outline to Use the Plan from Another Stored Outline (Doc ID 730062.1)
function exchange_outline {
OL1=$1
OL2=$2
OL_OWNER=$3

sqlplus -S "/ as sysdba" <<EOF
set feedback off
alter session set current_schema = $OL_OWNER;
create private outline OLFROM from $OL1;
create private outline OLTO from $OL2;
update ol$ set hintcount=(select hintcount from ol$ where ol_name='OLTO') where ol_name='OLFROM';
delete from ol$ where ol_name='OLTO';
update ol$ set ol_name='OLTO' where ol_name='OLFROM';
commit;
execute dbms_outln_edit.refresh_private_outline('OLTO');
create or replace outline $OL1 from private OLTO for category GOOD_PLAN;
drop outline $OL2;
exit;
EOF

}

#display outline
function display_outline {
OL_NAME=$1
OL_OWNER=$2

sqlplus -S "/ as sysdba" <<EOF
set pagesize 1000 linesize 160
set long 32000
col hint format a55
col join_pos format a45
col owner    format a12
col name     format a18
col ts       format a14
col h        format 999
col category format a12
col sql_text format a80
col used     format a6

select name, sql_text, category, used, to_char(TIMESTAMP, 'YY-mm-dd hh24:MI')
 from  dba_outlines
where name = '$OL_NAME' and OWNER = '$OL_OWNER';

select ol_name name, category, hint#, stage# stage, hint_text hint, join_pred join_pos
 from outln.ol\$hints
 where ol_name = '$OL_NAME'
 order by ol_name, hint#;
exit;
EOF
}

#main function
echo "1. Create outline OL_$HASH_VALUE for SQL $HASH_VALUE"
create_ol_from_hashvalue $HASH_VALUE OL_$HASH_VALUE $OL_OWNER

echo "2. Create outline OL_TEMP for SQL in $SQL_FILE"
create_ol_from_sqlfile $SQL_FILE OL_TEMP $OL_OWNER

echo "3. Exchange outline OL_$HASH_VALUE with OL_TEMP, and drop OL_TEMP"
exchange_outline OL_$HASH_VALUE OL_TEMP $OL_OWNER

echo "4. Display final outline for SQL $HASH_VALUE : OL_$HASH_VALUE in category GOOD_PLAN "
display_outline OL_$HASH_VALUE $OL_OWNER

Run it:

$./fix_plan.sh 716428968 good.sql ALEX
HASH_VALUE : 716428968
SQL_FILE : good.sql
OL_OWNER : ALEX

1. Create outline OL_716428968 for SQL 716428968

Session altered.

Outline created.

2. Create outline OL_TEMP for SQL in good.sql

Session altered.

Outline created.

3. Exchange outline OL_716428968 with OL_TEMP, and drop OL_TEMP
4. Display final outline for SQL 716428968 : OL_716428968 in category GOOD_PLAN

NAME SQL_TEXT CATEGORY USED TO_CHAR(TIMEST
------------------ -------------------------------------------------------------------------------- ------------ ------ --------------
OL_716428968 select * from t where id=1 GOOD_PLAN UNUSED 13-12-29 19:39

1 row selected.

NAME CATEGORY HINT# STAGE HINT JOIN_POS
------------------ ------------ ---------- ---------- ------------------------------------------------------- ---------------------------------------------
OL_716428968 GOOD_PLAN 1 1 OUTLINE_LEAF(@"SEL$1")
OL_716428968 GOOD_PLAN 2 1 IGNORE_OPTIM_EMBEDDED_HINTS
OL_716428968 GOOD_PLAN 3 1 OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
OL_716428968 GOOD_PLAN 4 1 DB_VERSION('11.2.0.2')
OL_716428968 GOOD_PLAN 5 1 OPT_PARAM('_b_tree_bitmap_plans' 'false')
OL_716428968 GOOD_PLAN 6 1 OPT_PARAM('_index_join_enabled' 'false')
OL_716428968 GOOD_PLAN 7 1 OPT_PARAM('_optim_peek_user_binds' 'false')
OL_716428968 GOOD_PLAN 8 1 OPT_PARAM('_optimizer_skip_scan_enabled' 'false')
OL_716428968 GOOD_PLAN 9 1 OPT_PARAM('optimizer_index_cost_adj' 30)
OL_716428968 GOOD_PLAN 10 1 INDEX_RS_ASC(@"SEL$1" "T"@"SEL$1" ("T"."ID"))

10 rows selected.

Step 4. Flush the bad SQL plan

--if it's 10g, use grant to invalidate the SQL plan
SQL> grant select on ALEX.T to dba;

--if it's 11g, use procedure DBMS_SHARED_POOL.PURGE
SQL> select ADDRESS, HASH_VALUE from V$SQL where hash_value = 716428968;

ADDRESS          HASH_VALUE
---------------- ----------
000000045F900938  716428968

1 row selected.

SQL> exec DBMS_SHARED_POOL.PURGE ('000000045F900938, 716428968','C');

PL/SQL procedure successfully completed.

Step 5. Check the new plan

--it's IMPORTANT to set use_stored_outlines to use the stored outline category if not yet
--You can set it at session level and test the plan first if needed
SQL> alter system set use_stored_outlines = GOOD_PLAN;

System altered.

ALEX@ZENG: SQL> set autotrace traceonly
ALEX@ZENG: SQL> select * from t where id=1;

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3292636276

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    27 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    27 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX1 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

Note
-----
   - outline "OL_716428968" used for this statement

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        185  consistent gets
          0  physical reads
          0  redo size
        597  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

There is a problem caused by “ghost” parameter use_stored_outlines. Why?
You cannot set it in inital file, and you cannot see it after setting it, and it’s lost its value after DB restart.
So we need to create a trigger to set it automatically when DB start:

SYS@ZENG: SQL> show parameter use_stored_outlines
SYS@ZENG: SQL>
SYS@ZENG: SQL> create or replace trigger enable_outlines_trig
--Ref : How to Enable USE_STORED_OUTLINES Permanently (Doc ID 560331.1)
after startup on database
begin
execute immediate('alter system set use_stored_outlines=GOOD_PLAN');
end;
/

Trigger created.

Until then, you can have a good sleep without worry about the bad SQL plan causes DB high session/load.

how to setup HTTPS for apache

I’m working on setup authentication for an internal website recently.To protect usernames and passwords, setting up HTTPS is very necessary. Here is my first trial on HTTPS/SSL:

Step 1. Check if apache has ssl module compiled 

$ grep mod_ssl apache/conf/httpd.conf
If below line is there, and you can start apache without problem, you don't need to recompile apache
"LoadModule ssl_module modules/mod_ssl.so"

The other way is to check whether the module is there.
If it's there, but it's not in the httpd.conf file, you can add it and try start apache
$ ls modules/mod_ssl.so

Step 2. Compile apache if it didn’t have ssl module

--download apache from http://httpd.apache.org/, and unzip it
cd /home/alexzeng/httpd-2.2.24
--config
#./configure --prefix=/alexzeng/apache --with-config-file-path=/alexzeng/apache/conf --enable-ssl --enable-http --enable-rewrite --enable-track-vars --enable-cgi --with-config-file-path=/opt/apache/conf --enable-modules=all --enable-mods-shared=all --enable-file-cache --enable-disk-cache --enable-cache --enable-mem-cache --enable-dumpio --enable-logio --enable-mime-magic --enable-headers --enable-usertrack --enable-version --enable-proxy --enable-proxy-connect --enable-proxy-http --enable-proxy-ftp --enable-proxy-ajp --enable-proxy-balancer --enable-so

--make
#make

--make install
#make install

Step 3. Config apache SSL
A. Load ssl module, it should already have these lines:

LoadModule ssl_module modules/mod_ssl.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule dir_module modules/mod_dir.so
...

B. Enable httpd-ssl.conf, remove # from httpd-ssl.conf line:

# Secure (SSL/TLS) connections
Include conf/extra/httpd-ssl.conf

C. Config pages that need force using https at VirtualHost part:

NameVirtualHost *:80

   ServerName alexzeng.vip.wordpress.com
   DocumentRoot "/alexzeng/apache/htdocs"
   Redirect permanent /signin https://alexzeng.vip.wordpress.com/signin

D. Config extra/httpd-ssl.conf

Listen 443

#   General setup for the virtual host
DocumentRoot "/alexzeng/apache/htdocs"
ServerName alexzeng.vip.wordpress.com:443
...
#   SSL Engine Switch:
#   Enable/Disable SSL for this virtual host.
SSLEngine on

#   Server Certificate:
SSLCertificateFile "/alexzeng/apache/conf/server.crt"

#   Server Private Key:
SSLCertificateKeyFile "/alexzeng/apache/conf/server.key"
#   Server Certificate Chain:

Until now, the HTTPS is setup for the site. But we need to have 2 files: server cert server.crt, and its private key server.key.

The official way it to request it from public SSL certificate companies, such as VeriSign, because their cert is accepted by all browser by default. But it costs a few hundred dollar per year at least.

So many companies have their self-signed cert to reduce the cost. Especially for internal sites, a company can have their root cert installed for their employees OS image by default. In that way, access its signed certs site will be recognized as safe, no security alert, nor https crossed out in red. That’s the case in my company.

I got our IT team signed cert, but it’s in pfx format. I need to convert it to the 2 files server.crt and server.key. The processes are as follows:

$ openssl pkcs12 -in it.pfx  -nocerts -nodes -passin pass:"<password_from_IT>" | openssl rsa -out server.key
MAC verified OK
writing RSA key
$ openssl pkcs12 -in it.pfx  -clcerts -nokeys -nodes -passin pass:"<password_from_IT>" | openssl x509 -out server.crt
MAC verified OK
-- The pfx is created by Windows tools, so I use openssl rsa/x509 to remove some "Bag Attributes" lines.
-- Otherwise, you can just use -out option at the first without the sencond command in pipeline

--copy key to apache directory if needed
$ cp server.crt  server.key /alexzeng/apache/conf

--restart apache
$ sudo ./apachectl stop
$ sudo ./apachectl start

If a browser didn’t installed the companies’ root cert, it’ll get security alert when access the site. It can be avoided by import their root cert.

Besides that, we need to test the HTTPS by ourselves even without an internal team to sign the cert for us. We’ll make ourselves a certificate authority (CA) :)

How-to create self-signed cert for test:
a. Create a server private key: server.key

$ openssl genrsa -out server.key 1024
Generating RSA private key, 1024 bit long modulus
...................++++++
......++++++
e is 65537 (0x10001)

b. Create certificate signing request (CSR) : server.csr (using server.key)

$ openssl req -new -out server.csr -key server.key
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:US
State or Province Name (full name) []:aa
Locality Name (eg, city) [Default City]:aa
Organization Name (eg, company) [Default Company Ltd]:aa
Organizational Unit Name (eg, section) []:aa
Common Name (eg, your name or your server's hostname) []:alexzeng.vip.wordpress.com
Email Address []:
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
$

--Note:
"Country Name" must be a valid name
"Common Name" must be the same as ServerName in httpd.conf
"State or Province Name" must has some value

c. Create a certificate authority (CA) private key: ca.key

$ openssl genrsa  -out ca.key 1024
Generating RSA private key, 1024 bit long modulus
.......++++++
........++++++
e is 65537 (0x10001)

d. Create CA certificate : ca.crt (using ca.key)

$ openssl req  -new -x509 -days 365 -key ca.key -out ca.crt
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:US
State or Province Name (full name) []:aa
Locality Name (eg, city) [Default City]:aa
Organization Name (eg, company) [Default Company Ltd]:aa
Organizational Unit Name (eg, section) []:aa
Common Name (eg, your name or your server's hostname) []:alexzeng.vip.wordpress.com
Email Address []:
$

e. Sign a certificate by my own CA: server.crt (using certificate signing request server.csr, CA private key ca.key, and CA certificate ca.crt)

$ sudo openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 1 (0x1)
        Validity
            Not Before: Dec 26 07:20:02 2013 GMT
            Not After : Dec 26 07:20:02 2014 GMT
        Subject:
            countryName               = US
            stateOrProvinceName       = aa
            organizationName          = aa
            organizationalUnitName    = aa
            commonName                = alexzeng.vip.wordpress.com
        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:FALSE
            Netscape Comment:
                OpenSSL Generated Certificate
            X509v3 Subject Key Identifier:
                70:45:AB:98:23:51:BB:88:23:20:EA:21:21:3C:6A:8A:E2:0A:97:B8
            X509v3 Authority Key Identifier:
                keyid:46:73:F6:1F:85:74:10:D6:B4:5B:AB:B6:2E:1C:5D:A8:97:08:55:4C
Certificate is to be certified until Dec 26 07:20:02 2014 GMT (365 days)
Sign the certificate? [y/n]:y

1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated
$ ls -lt
total 140
-rw-r--r-- 1 root  root   3048 Dec 26 00:20 server.crt
-rw-rw-r-- 1 dbbox dbbox   948 Dec 26 00:19 ca.crt
-rw-rw-r-- 1 dbbox dbbox   887 Dec 26 00:18 ca.key
-rw-rw-r-- 1 dbbox dbbox   643 Dec 26 00:18 server.csr
-rw-rw-r-- 1 dbbox dbbox   887 Dec 26 00:10 server.key

Usage of these files:

2 files are used in httpd-ssl.conf:
server.key (Server private key) -> this one will keep at server side
server.crt (Server certificate) -> this one will send to client when users access HTTPS

The other 3 files owned by CA (Certificate Authority) are used only during sign processes:
server.csr (certificate signing request)
ca.key (certificate authority private key)
ca.crt (certificate authority certificate)

Problems I got during these processes:
1. index.txt and serial file are missing when sign the cert

$ sudo openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
/etc/pki/CA/index.txt: No such file or directory
unable to open '/etc/pki/CA/index.txt'
140528819345224:error:02001002:system library:fopen:No such file or directory:bss_file.c:355:fopen('/etc/pki/CA/index.txt','r')
140528819345224:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:357:

$ sudo openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
/etc/pki/CA/serial: No such file or directory
error while loading serial number
140414051186504:error:02001002:system library:fopen:No such file or directory:bss_file.c:355:fopen('/etc/pki/CA/serial','r')
140414051186504:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:357:

--create them to avoid the issue
$ sudo su -
# touch /etc/pki/CA/index.txt
# echo 01 > /etc/pki/CA/serial

2. openssl permission issue

$ openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
I am unable to access the /etc/pki/CA/newcerts directory
/etc/pki/CA/newcerts: Permission denied

--solution
My openssl is installed as root, so run the command with sudo
If openssl is created by the login account itself, you don't need to sudo, and the configure file of openssl will located at your installation place.

3. Error “libtool: install: invalid libtool wrapper script xxx” when “make install” apache.

--error message
libtool: install: invalid libtool wrapper script `htpasswd'
libtool: install: invalid libtool wrapper script `htdigest'
libtool: install: invalid libtool wrapper script `rotatelogs'
libtool: install: invalid libtool wrapper script `logresolve'
libtool: install: invalid libtool wrapper script `ab'
libtool: install: invalid libtool wrapper script `htdbm'
libtool: install: invalid libtool wrapper script `htcacheclean'
libtool: install: invalid libtool wrapper script `httxt2dbm'
libtool: install: invalid libtool wrapper script `checkgid'
make[2]: *** [program-install] Error 1

--It's caused by libtool is not installed at this host.
--Here is fix steps;
$ rpm -qa | grep libtool
$ sudo yum search libtool
$ sudo yum install libtool

--remove all installed files, and redo make, and make install
$ rm -rf /alexzeng/apache/
$ make
$ make install

This is a basic HTTPS setting for newbies like me :)

If you need more advanced features, you can reference more options and how-to at apache site:
http://httpd.apache.org/docs/2.2/ssl/ssl_faq.html
http://httpd.apache.org/docs/current/ssl/ssl_howto.html

How to add hint for SQL using oracle view

A developer asked me to tune a SQL recently. It turned out that the column is very skewed, and Oracle cannot find the optimized plan. Analyzing the table column with histograms may solve the problem, may not solve the issue because it’s quite sensitive to data selected, and it may have side impacts to other SQLs. As the developer can change the SQL easily, I choose to add hints for the SQL. The SQL is based on view which is different from adding hints on tables. Here is the steps to add hints for a SQL on view:

Step 1. Get the sub-query name of the table we want to add hint on:

SQL>  explain plan for select parent_task_id,task_id, title, status_label from bugtracking where BUGTRACKING_STATUS <5;

Explained.

ARADMIN@TRACE:prod SQL> select * from table(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2453567761

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       | 25202 |  2928K|  3280   (5)| 00:00:19 |
|   1 |  NESTED LOOPS OUTER|       | 25202 |  2928K|  3280   (5)| 00:00:19 |
|*  2 |   TABLE ACCESS FULL| T508  | 25202 |  2534K|  3272   (5)| 00:00:19 |
|*  3 |   INDEX UNIQUE SCAN| IB508 |     1 |    16 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$E213D7C1
   2 - SEL$E213D7C1 / T508@SEL$3  <-- this is the subquery name in SQL/view
   3 - SEL$E213D7C1 / B508@SEL$2

....

We want to add an index hint on table T508. It’s in query block SEL$3.

Step 2. Add hints at the subquery

SQL>  explain plan for       select/*+ index(@SEL$3 T508 I508_660000000_1) */ parent_task_id,task_id, title, status_label from bugtracking where BUGTRACKING_STATUS <5;

Explained.

ARADMIN@TRACE:prod SQL> select * from table(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2546157666

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  | 25202 |  2928K|  9024   (1)| 00:00:51 |
|   1 |  NESTED LOOPS OUTER          |                  | 25202 |  2928K|  9024   (1)| 00:00:51 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T508             | 25202 |  2534K|  9016   (1)| 00:00:51 |
|*  3 |    INDEX RANGE SCAN          | I508_660000000_1 | 25202 |       |    40   (3)| 00:00:01 |
|*  4 |   INDEX UNIQUE SCAN          | IB508            |     1 |    16 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$E213D7C1
   2 - SEL$E213D7C1 / T508@SEL$3
   3 - SEL$E213D7C1 / T508@SEL$3
   4 - SEL$E213D7C1 / B508@SEL$2

As you can see, the index hint on view format is /*+ index(SUBQUERY_NAME TABLE_NAME INDEX_NAME) */

Cacti host run out of capacity

Our cacti host in production run out of  capacity recently. We use cacti to create graphs for MySQL, including Innodb and Memory engine db, and MongoDB. The key benefits of cacti is that it’s easy for users to understand, and our developers can easily check DB performance metrics by themselves. It’s also easy for DBA to setup them because it didn’t need to setup/maintain agents at each DB host.

The benefit of easy-setup is also causes problem: all poller actions have to be done at cacti server. We run into performance problem about a year ago: the poller cannot finish all poll items in 1 minute. I replaced the php poller with spine which is is written in native C and more powerful. It started to work fine without problem. As we have more and more hosts added, I adjusted the “Maximum Concurrent Poller Processes” and “Maximum Threads per Process” at the same steps, and cacti kept hold its position to finish pollers in 1 minute.

At the same time, the hosts load kept increase, and reached 45 recently on this 24 virtual CPU(2 cores) physical host. It starts to run timeout for some hosts recently. I tried to adjust “Maximum Concurrent Poller Processes” and “Maximum Threads per Process”, but it didn’t help. The host load 45 is already much more than its 24 CPU number. It’s already overloaded. We can upgrade cacti host to more powerful host to scale-up, but it didn’t solve the scale-out problem. It’ll run into the same problem sooner or later.

At this time, cacti handles ~800 hosts with 18k datasources and 18k RRDs in 1 minute. The  “Maximum Concurrent Poller Processes”  is 3 and “Maximum Threads per Process” is 60. It finishes each round in 57 seconds in average. The serever CPU mode is ” Intel(R) Xeon(R) CPU   X5670  @ 2.93GHz”, 2 cores with 24 VCPU.

Although cacti has “Distributed and remote polling” in it’s road map, but the release date is unknown. That’ll help solve the problem of putting all load on a standalone host. We decided to stop adding more hosts to cacti, and pursuit the other solution.

Debug and fix cacti graph trees and hosts missing problem

Recently we run into a problem that some trees and hosts were disappeared suddenly from graph page, but they existed in Management “Graph Trees” list.

I googled around and didn’t find a useful clue. So I debug the problem myself. I checked around and opened the graph page to see if there’s any error. I guess it may caused by some JavaScript errors, and silently ignored by web browser. So I opened chrome’s console. I found there’s an error as follow:

cacti_error

It showed something’s wrong with the host “crp-wikidbstg02_3307″. So it’s a problem caused or triggered by this host, either run into a cacti bug, or this host has some problems in configuration. One way is to check the detail config data of this host in cacti and find out the flaw. I chose the other way, an easy way: just delete this host, and let the auto-add job to re-add it later. As expected, the missed trees and hosts were back in graph page, even after the host is re-added. 

If you run into the same problem, you may try this way to see if it’s a similar problem.

Debug a simple PK update SQL takes 500+ seconds in Oracle

We run into a problem recently in Oracle database at Solaris. A simple PK update SQL takes 500+ seconds to finish. It didn’t happen all the time, less than 1% chances. It causes app timeout when it happens.

We observed more wait time on “SQL*Net Message to client” in statspack data, no other obviously abnormal events beside this. We suspect it’s caused by network issue. SA told us the network usage is only 20%~30%. We did ping test from db host to client. It didn’t show any obviously problems.

Later, we did a test to switch 2 DB’s host in a db family. It makes the problem moved to the other host. At this point, we are sure it’s not DB problem. It must be host or network issue. Later we found the “TCP retransmission” rate is high at the problematic host. Finally, SA changed the network interface to fix this issue.

We can check the “TCP retransmission” using below scripts:

while true loop
do
 echo -n "`date` : "
 netstat -s -P tcp | grep tcpRetrans
 sleep 1
done;

You can reference this webpage for more about the TCP retrasmission issue.

How-to fix Mysql slave after relay log corrupted

We run into issues like “Relay log read failure: Could not parse relay log event entry” many times on Mysql slave node. Most of time in our cases, it’s due to Linux host crashed unexpectedly.

The detail error is like this:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: alexzeng.wordpress.com
                  Master_User: mysql_rep
                  Master_Port: 3306
                Connect_Retry: 60
             Master_Log_File: LBMS-bin.000012
          Read_Master_Log_Pos: 239005305
               Relay_Log_File: LBMS-relay-bin.000004
                Relay_Log_Pos: 221245258
        Relay_Master_Log_File: LBMS-bin.000012
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 221245113
              Relay_Log_Space: 239007065
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

In this case, we can check if the bin log at master node is still available.

        Relay_Master_Log_File: LBMS-bin.000012
        ...
          Exec_Master_Log_Pos: 221245113

If it’s still available at master node, we can reset slave to that point, and let it recover by itself as follows:

Be careful, using Relay_Master_Log_File value in “show slave status\G” for master_log_file, not Master_Log_File value. They’re the same in this case.


mysql> slave stop;
Query OK, 0 rows affected (0.00 sec)

mysql> change master to master_log_file='LBMS-bin.000012',master_log_pos=221245113;
Query OK, 0 rows affected (0.04 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

Check slave status again and again, make sure Exec_Master_Log_Pos is increasing. Seconds_Behind_Master should be decreasing. If master is too busy, the delay may still increase, but it’ll catch up sooner or later.


mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: alexzeng.wordpress.com
                  Master_User: mysql_rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: LBMS-bin.000012
          Read_Master_Log_Pos: 239014770
               Relay_Log_File: LBMS-relay-bin.000002
                Relay_Log_Pos: 1905303
        Relay_Master_Log_File: LBMS-bin.000012
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 223150164
              Relay_Log_Space: 17770064
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 159147
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

I’d like to end this blog with Peter Zaitsev‘s article How SHOW SLAVE STATUS relates to CHANGE MASTER TO words :

It is very important to understand the difference between IO Thread position in Master logs (Master_Log_File:Read_Master_Log_Pos) and SQL Thread position in Master logs (Relay_Master_Log_File:Exec_Master_Log_Pos).

It’s really help us understanding the differences of these values.

Follow

Get every new post delivered to your Inbox.