how to setup HTTPS for apache

I’m working on setup authentication for an internal website recently.To protect usernames and passwords, setting up HTTPS is very necessary. Here is my first trial on HTTPS/SSL:

Step 1. Check if apache has ssl module compiled 

$ grep mod_ssl apache/conf/httpd.conf
If below line is there, and you can start apache without problem, you don't need to recompile apache
"LoadModule ssl_module modules/mod_ssl.so"

The other way is to check whether the module is there.
If it's there, but it's not in the httpd.conf file, you can add it and try start apache
$ ls modules/mod_ssl.so

Step 2. Compile apache if it didn’t have ssl module

--download apache from http://httpd.apache.org/, and unzip it
cd /home/alexzeng/httpd-2.2.24
--config
#./configure --prefix=/alexzeng/apache --with-config-file-path=/alexzeng/apache/conf --enable-ssl --enable-http --enable-rewrite --enable-track-vars --enable-cgi --with-config-file-path=/opt/apache/conf --enable-modules=all --enable-mods-shared=all --enable-file-cache --enable-disk-cache --enable-cache --enable-mem-cache --enable-dumpio --enable-logio --enable-mime-magic --enable-headers --enable-usertrack --enable-version --enable-proxy --enable-proxy-connect --enable-proxy-http --enable-proxy-ftp --enable-proxy-ajp --enable-proxy-balancer --enable-so

--make
#make

--make install
#make install

Step 3. Config apache SSL
A. Load ssl module, it should already have these lines:

LoadModule ssl_module modules/mod_ssl.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule dir_module modules/mod_dir.so
...

B. Enable httpd-ssl.conf, remove # from httpd-ssl.conf line:

# Secure (SSL/TLS) connections
Include conf/extra/httpd-ssl.conf

C. Config pages that need force using https at VirtualHost part:

NameVirtualHost *:80

   ServerName alexzeng.vip.wordpress.com
   DocumentRoot "/alexzeng/apache/htdocs"
   Redirect permanent /signin https://alexzeng.vip.wordpress.com/signin

D. Config extra/httpd-ssl.conf

Listen 443

#   General setup for the virtual host
DocumentRoot "/alexzeng/apache/htdocs"
ServerName alexzeng.vip.wordpress.com:443
...
#   SSL Engine Switch:
#   Enable/Disable SSL for this virtual host.
SSLEngine on

#   Server Certificate:
SSLCertificateFile "/alexzeng/apache/conf/server.crt"

#   Server Private Key:
SSLCertificateKeyFile "/alexzeng/apache/conf/server.key"
#   Server Certificate Chain:

Until now, the HTTPS is setup for the site. But we need to have 2 files: server cert server.crt, and its private key server.key.

The official way it to request it from public SSL certificate companies, such as VeriSign, because their cert is accepted by all browser by default. But it costs a few hundred dollar per year at least.

So many companies have their self-signed cert to reduce the cost. Especially for internal sites, a company can have their root cert installed for their employees OS image by default. In that way, access its signed certs site will be recognized as safe, no security alert, nor https crossed out in red. That’s the case in my company.

I got our IT team signed cert, but it’s in pfx format. I need to convert it to the 2 files server.crt and server.key. The processes are as follows:

$ openssl pkcs12 -in it.pfx  -nocerts -nodes -passin pass:"<password_from_IT>" | openssl rsa -out server.key
MAC verified OK
writing RSA key
$ openssl pkcs12 -in it.pfx  -clcerts -nokeys -nodes -passin pass:"<password_from_IT>" | openssl x509 -out server.crt
MAC verified OK
-- The pfx is created by Windows tools, so I use openssl rsa/x509 to remove some "Bag Attributes" lines.
-- Otherwise, you can just use -out option at the first without the sencond command in pipeline

--copy key to apache directory if needed
$ cp server.crt  server.key /alexzeng/apache/conf

--restart apache
$ sudo ./apachectl stop
$ sudo ./apachectl start

If a browser didn’t installed the companies’ root cert, it’ll get security alert when access the site. It can be avoided by import their root cert.

Besides that, we need to test the HTTPS by ourselves even without an internal team to sign the cert for us. We’ll make ourselves a certificate authority (CA) ūüôā

How-to create self-signed cert for test:
a. Create a server private key: server.key

$ openssl genrsa -out server.key 1024
Generating RSA private key, 1024 bit long modulus
...................++++++
......++++++
e is 65537 (0x10001)

b. Create certificate signing request (CSR) : server.csr (using server.key)

$ openssl req -new -out server.csr -key server.key
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:US
State or Province Name (full name) []:aa
Locality Name (eg, city) [Default City]:aa
Organization Name (eg, company) [Default Company Ltd]:aa
Organizational Unit Name (eg, section) []:aa
Common Name (eg, your name or your server's hostname) []:alexzeng.vip.wordpress.com
Email Address []:
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
$

--Note:
"Country Name" must be a valid name
"Common Name" must be the same as ServerName in httpd.conf
"State or Province Name" must has some value

c. Create a certificate authority (CA) private key: ca.key

$ openssl genrsa  -out ca.key 1024
Generating RSA private key, 1024 bit long modulus
.......++++++
........++++++
e is 65537 (0x10001)

d. Create CA certificate : ca.crt (using ca.key)

$ openssl req  -new -x509 -days 365 -key ca.key -out ca.crt
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:US
State or Province Name (full name) []:aa
Locality Name (eg, city) [Default City]:aa
Organization Name (eg, company) [Default Company Ltd]:aa
Organizational Unit Name (eg, section) []:aa
Common Name (eg, your name or your server's hostname) []:alexzeng.vip.wordpress.com
Email Address []:
$

e. Sign a certificate by my own CA: server.crt (using certificate signing request server.csr, CA private key ca.key, and CA certificate ca.crt)

$ sudo openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 1 (0x1)
        Validity
            Not Before: Dec 26 07:20:02 2013 GMT
            Not After : Dec 26 07:20:02 2014 GMT
        Subject:
            countryName               = US
            stateOrProvinceName       = aa
            organizationName          = aa
            organizationalUnitName    = aa
            commonName                = alexzeng.vip.wordpress.com
        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:FALSE
            Netscape Comment:
                OpenSSL Generated Certificate
            X509v3 Subject Key Identifier:
                70:45:AB:98:23:51:BB:88:23:20:EA:21:21:3C:6A:8A:E2:0A:97:B8
            X509v3 Authority Key Identifier:
                keyid:46:73:F6:1F:85:74:10:D6:B4:5B:AB:B6:2E:1C:5D:A8:97:08:55:4C
Certificate is to be certified until Dec 26 07:20:02 2014 GMT (365 days)
Sign the certificate? [y/n]:y

1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated
$ ls -lt
total 140
-rw-r--r-- 1 root  root   3048 Dec 26 00:20 server.crt
-rw-rw-r-- 1 dbbox dbbox   948 Dec 26 00:19 ca.crt
-rw-rw-r-- 1 dbbox dbbox   887 Dec 26 00:18 ca.key
-rw-rw-r-- 1 dbbox dbbox   643 Dec 26 00:18 server.csr
-rw-rw-r-- 1 dbbox dbbox   887 Dec 26 00:10 server.key

Usage of these files:

2 files are used in httpd-ssl.conf:
server.key (Server private key) -> this one will keep at server side
server.crt (Server certificate) -> this one will send to client when users access HTTPS

The other 3 files owned by CA (Certificate Authority) are used only during sign processes:
server.csr (certificate signing request)
ca.key (certificate authority private key)
ca.crt (certificate authority certificate)

Problems I got during these processes:
1. index.txt and serial file are missing when sign the cert

$ sudo openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
/etc/pki/CA/index.txt: No such file or directory
unable to open '/etc/pki/CA/index.txt'
140528819345224:error:02001002:system library:fopen:No such file or directory:bss_file.c:355:fopen('/etc/pki/CA/index.txt','r')
140528819345224:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:357:

$ sudo openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
/etc/pki/CA/serial: No such file or directory
error while loading serial number
140414051186504:error:02001002:system library:fopen:No such file or directory:bss_file.c:355:fopen('/etc/pki/CA/serial','r')
140414051186504:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:357:

--create them to avoid the issue
$ sudo su -
# touch /etc/pki/CA/index.txt
# echo 01 > /etc/pki/CA/serial

2. openssl permission issue

$ openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
Using configuration from /etc/pki/tls/openssl.cnf
I am unable to access the /etc/pki/CA/newcerts directory
/etc/pki/CA/newcerts: Permission denied

--solution
My openssl is installed as root, so run the command with sudo
If openssl is created by the login account itself, you don't need to sudo, and the configure file of openssl will located at your installation place.

3. Error “libtool: install: invalid libtool wrapper script xxx” when “make install” apache.

--error message
libtool: install: invalid libtool wrapper script `htpasswd'
libtool: install: invalid libtool wrapper script `htdigest'
libtool: install: invalid libtool wrapper script `rotatelogs'
libtool: install: invalid libtool wrapper script `logresolve'
libtool: install: invalid libtool wrapper script `ab'
libtool: install: invalid libtool wrapper script `htdbm'
libtool: install: invalid libtool wrapper script `htcacheclean'
libtool: install: invalid libtool wrapper script `httxt2dbm'
libtool: install: invalid libtool wrapper script `checkgid'
make[2]: *** [program-install] Error 1

--It's caused by libtool is not installed at this host.
--Here is fix steps;
$ rpm -qa | grep libtool
$ sudo yum search libtool
$ sudo yum install libtool

--remove all installed files, and redo make, and make install
$ rm -rf /alexzeng/apache/
$ make
$ make install

This is a basic HTTPS setting for newbies like me ūüôā

If you need more advanced features, you can reference more options and how-to at apache site:
http://httpd.apache.org/docs/2.2/ssl/ssl_faq.html
http://httpd.apache.org/docs/current/ssl/ssl_howto.html

Advertisements

Cacti host run out of capacity

Our cacti host in production run out of ¬†capacity recently. We use cacti to create graphs for MySQL, including Innodb and Memory engine db, and MongoDB. The key benefits of cacti is that it’s easy for users to understand, and our developers can easily check DB performance metrics by themselves. It’s also easy for DBA to setup them because it didn’t need to setup/maintain agents at each DB host.

The benefit of easy-setup is also causes problem: all poller actions have to be done at cacti server. We run into performance problem about a year ago: the poller cannot finish all poll items in 1 minute. I replaced the php poller with spine which is¬†is written in native C and¬†more powerful. It started to work fine without problem. As we have more and more hosts added, I adjusted the “Maximum Concurrent Poller Processes” and “Maximum Threads per Process” at the same steps, and cacti kept hold its position to finish pollers in 1 minute.

At the same time, the hosts load kept increase, and reached 45 recently on this 24 virtual CPU(2 cores) physical host. It starts to run timeout for some hosts recently. I tried to adjust¬†“Maximum Concurrent Poller Processes” and “Maximum Threads per Process”, but it didn’t help. The host load 45 is already much more than its 24 CPU number. It’s already overloaded. We can upgrade cacti host to more powerful host to scale-up, but it didn’t solve the scale-out problem. It’ll run into the same problem sooner or later.

At this time, cacti handles ~800 hosts with 18k datasources and 18k RRDs in 1 minute. The¬†¬†“Maximum Concurrent Poller Processes” ¬†is 3 and¬†“Maximum Threads per Process” is 60. It finishes each round in 57 seconds in average. The serever CPU mode is ”¬†Intel(R) Xeon(R) CPU ¬† X5670 ¬†@ 2.93GHz”, 2 cores with 24 VCPU.

Although cacti has “Distributed and remote polling” in it’s road map, but the release date is unknown. That’ll help solve the problem of putting all load on a standalone host. We decided to stop adding more hosts to cacti, and pursuit the other solution.

Debug and fix cacti graph trees and hosts missing problem

Recently we run into a problem that some trees and hosts were disappeared suddenly from graph page, but they existed in Management “Graph Trees” list.

I googled around and didn’t find a useful clue. So I debug the problem myself. I checked around and opened the graph page to see if there’s any error. I guess it may caused by some JavaScript errors, and silently ignored by web browser. So I opened chrome’s console. I found there’s an error as follow:

cacti_error

It showed something’s wrong with the host “crp-wikidbstg02_3307”. So it’s a problem caused or triggered by this host, either run into a cacti bug, or this host has some problems in configuration. One way is to check the detail config data of this host in cacti and find out the flaw. I chose the other way, an easy way: just delete this host, and let the auto-add job to re-add it later.¬†As expected, the missed trees and hosts were back in graph page, even after the host is re-added.¬†

If you run into the same problem, you may try this way to see if it’s a similar problem.

Debug Cassandrar JVM thread 100% CPU usage issue

Recently, one of our cassandra nodes run into an issue: one CPU thread utilization is 100% while the others are almost idle. The node shows “Down” in the nodetool now and then.

Java supports threads, and threads can use different CPU thread without problem in theory. But why one CPU is 100% while the others are idle? What the 100% utilization CPU is doing? Here is what I did:

1. Find out the thread ID that uses 100% CPU
In this Linux host, I run top to see the CPU usage, and then type 1 to show each CPU thread uage, then type H to show thread of processes. When I see their is one CPU thread is 100% usage, I can see the top thread as below.

 35376 cassandr  20   0 28.8g  10g 1.2g R 99.7  8.0  11:13.08 java

Its CPU usage is 99.7%

2. Find out what the thread is doing
I use “jstack -l ” to dump all JVM thread calling stacks. I need to find the thread with ID 35376. The number of thread id in jstack dump file is hexadecimal format while it’s decimal format in top output. Decimal 35376 equals 8a30 HEX. Got it in the output:

...
"VM Thread" prio=10 tid=0x00007f2a78313000 nid=0x8a30 runnable
...

So I know that “VM Thread” is the culprit. From my basic understanding, its main job is GC(Garbage Collection). If there is no memory leak problem in the application java code, GC can be improved by adjust JVM HEAP SIZE parameters. First I need to check the current heap usage.

3. Get JVM heap usage
We can just “jmap -heap “:

$ ./jmap -heap 122576
Attaching to process ID 122576, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.10-b01

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 8589934592 (8192.0MB)
   NewSize          = 2147483648 (2048.0MB)
   MaxNewSize       = 2147483648 (2048.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 1932787712 (1843.25MB)
   used     = 1932773920 (1843.2368469238281MB)
   free     = 13792 (0.013153076171875MB)
   99.99928641930438% used
Eden Space:
   capacity = 1718091776 (1638.5MB)
   used     = 1718091768 (1638.4999923706055MB)
   free     = 8 (7.62939453125E-6MB)
   99.99999953436713% used
From Space:
   capacity = 214695936 (204.75MB)
   used     = 214682152 (204.73685455322266MB)
   free     = 13784 (0.01314544677734375MB)
   99.99357975737371% used
To Space:
   capacity = 214695936 (204.75MB)
   used     = 0 (0.0MB)
   free     = 214695936 (204.75MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 6442450944 (6144.0MB)
   used     = 6442450896 (6143.999954223633MB)
   free     = 48 (4.57763671875E-5MB)
   99.99999925494194% used
Perm Generation:
   capacity = 58068992 (55.37890625MB)
   used     = 34732264 (33.123268127441406MB)
   free     = 23336728 (22.255638122558594MB)
   59.81206630898639% used

New Generation is 99.999% usage. This could be a problem because when the code create a new object it will need get memory from this part. If it’s full, JVM needs to scan the memory area to release memory. Before it’s done, the program can do nothing but wait.

4. Adjust JVM heap size
I set the parameters in cassandra-env.sh :

MAX_HEAP_SIZE="16G"
HEAP_NEWSIZE="4G"

It will set it in JVM options as follows:

JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"

5. Check result
After restart cassandra, and let it run a while, I run jmap again:

$ ./jmap -heap 35339
Attaching to process ID 35339, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.10-b01

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 17179869184 (16384.0MB)
   NewSize          = 4294967296 (4096.0MB)
   MaxNewSize       = 4294967296 (4096.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 3865509888 (3686.4375MB)
   used     = 1271912480 (1212.9902648925781MB)
   free     = 2593597408 (2473.447235107422MB)
   32.90413210294704% used
Eden Space:
   capacity = 3436052480 (3276.875MB)
   used     = 1013629048 (966.671989440918MB)
   free     = 2422423432 (2310.203010559082MB)
   29.499812761881913% used
From Space:
   capacity = 429457408 (409.5625MB)
   used     = 258283432 (246.31827545166016MB)
   free     = 171173976 (163.24422454833984MB)
   60.14180386428449% used
To Space:
   capacity = 429457408 (409.5625MB)
   used     = 0 (0.0MB)
   free     = 429457408 (409.5625MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 12884901888 (12288.0MB)
   used     = 3744285600 (3570.8290100097656MB)
   free     = 9140616288 (8717.170989990234MB)
   29.059480875730515% used
Perm Generation:
   capacity = 62611456 (59.7109375MB)
   used     = 37563512 (35.82335662841797MB)
   free     = 25047944 (23.88758087158203MB)
   59.99463101449038% used

In another case, it’s tomcat and home-grown java application, we got similar 100% CPU usage on one thread. While the solution is opposite : I reduced the JVM heap size to fix it because the default MAX JVM heap size is 32GB which is much more than enough. The program performance is good at start, but it will run slower and slower. Because setting too large heap size may also causes similar problem because the bigger the JVM heap size, the more work GC need to do, similar to Oracle shared pool size.

jmap and jstack is available in JDK. If you don’t have JDK, you need to download the exact same version(including minor version) JDK of the corresponding JRE otherwise it will complain incompatible version.

Kindly reminder, I am a newbie to java, thus my understanding may be wrong. Please use your own judgement on the contents.

References:
http://java.sys-con.com/node/1611555
http://middlewaremagic.com/weblogic/?tag=young-generation

Backup Cacti script

This is a simple script to backup cacti.

#!/bin/bash

if [ $# -lt 2 ]; then
cat <<EOF
  Backup cacti binary
  usage:  backup_cacti.sh <BACKUP_DIR> <BACKUP_KEEP_DAYS>
  examples:      
          backup_cacti.sh /data01/cacti/cacti_backup 7       --backup cacti directory, and keep 7 days
EOF
  exit 1
fi

# arguments
BACKUP_DIR="$1"
BACKUP_KEEP_DAYS="$2"

# configuration
#The cacti home should be the real home, not softlink, otherwise tar will only tar the softlink
CACTI_DIR="cacti-0.8.7g"
CACTI_BASE="/home/mysql"
CACTI_HOME="$CACTI_BASE/$CACTI_DIR"
MAIL_TO="alexzeng@wordpress.com"

# prepare backup
DATE=`date +"%Y.%m.%d"`
BACKUP_FILE="$BACKUP_DIR/cacti.$DATE.tar.gz"

# convert rrd file to xml file, which can be restore to rra
cd $CACTI_HOME/rra
mkdir -p $CACTI_HOME/rraxml
rm -f $CACTI_HOME/rraxml/*
# only backup rra files updated in last 7 days 
for f in `find *.rrd -type f -mtime -7 `
do
  /usr/bin/rrdtool dump $f $CACTI_HOME/rraxml/$f.xml
done

# backup cacti all files except rra files because they cannot be used directly for restore
cd $CACTI_BASE
tar --exclude='rra' -zcf $BACKUP_FILE $CACTI_DIR
if [ $? -ne 0 ]; then
  mailx -s "Backup cacti cron job $0 failed" $MAIL_TO <<EOF
Backup files list :
`ls -lt $BACKUP_DIR/cacti.*.tar.gz`
EOF
else
  cat /dev/null > $CACTI_HOME/log/cacti.log
fi

#delete old files
cd $BACKUP_DIR
for backup in `find . -ctime +$BACKUP_KEEP_DAYS -name "cacti.*.tar.gz"`; do rm -f $backup; done;

It will convert the rra files to xml files for backup purpose (copying rra files directly may not work).

Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

This is specially for Alex Zeng. He is an Oracle Jockey.