One year later... fail.

A lot has changed in the last year... except for this blog. Today marks the one year anniversary of my most recent post. Perhaps this next year will be a better blog year.

Configuring Debian for Ruby on Rails

After configuring server after server over the years, I've settled on Debian as being my "distro of choice". Some time ago, I started keeping track of the commands I'd routinely type when setting up a new server and those commands have morphed into a Perl script which I now use instead. This script is designed to take an out-of-the-box Debian 4.0 standard installation and turn it into a lean mean Apache-Mongrel-Mysql-Rails serving machine with minimal effort.

Getting and using the script

This script assumes you have a standard Debian 4.0 installation (with networking already setup) and that you are logged in as root. If you choose to use this script at your own risk, type:

debian:~# wget http://svn.bountysource.com/fishplate/scripts/debian_install.pl debian:~# perl debian_install.pl Enter full hostname (example: server1.yourdomain.com): server1.mydomain.com Install ruby, rails, gem, etc (y,n) [Y]? Install mysql server (y,n) [Y]? Install apache (y,n) [Y]? Install mongrel/mongrel_cluster (y,n) [Y]? Reboot when done (y,n) [Y]?

Then sit back and watch as the script sets up everything for you and then reboots!

What the script does

  • updates the core debian packages and sets up a crontab to do so once a week
  • installs sshd just in case it's not there already
  • installs some common packages all servers should have (compile tools, dnstools/traceroute, rsync, subversion, mysql client, etc)
  • sets up the hostname
  • installs a exim4 for sending mail
  • sets the server's timezone to UTC and sets up a crontab to sync the time once a day
  • optionally installs ruby, rubygems, irb, rails, imagemagick, and a few other common gems
  • optionally installs mysql-server
  • optionally installs apache and enables some common apache modules
  • optionally installs and configures mongrel and mongrel_rails
  • optionally reboots

Plans for the future

Without a doubt, I'll be making enhancements to this script over time as I try to automate repetitive tasks I find myself doing again and again. This is just my take on the "ideal Debian setup". Feel free to suggest any changes/enhancements.

Apache Tuning: MaxClients and Keep-Alive

When a post about statisfy.net made it to digg's front page, we experienced an onslaught of traffic (shocker!). Within a few minutes of getting dugg, I noticed that Apache would sometimes disregard requests entirely. If I reloaded, it would work fine. This post goes through what I learned about how Apache handles requests and what I changed to allow more requests to be handled.

MaxClients

I started digging around on the server and found the Apache error log spitting out this message over and over:

[error] server reached MaxClients setting, consider raising the MaxClients setting

After some quick research (thanks google and Aaron from RHG), I learned a few things about Apache and how it handles requests. Out of the box, Apache is setup to have at most 16 child processes (ServerLimit). Each of these child processes will run with 25 threads (ThreadsPerChild). If you multiply these two numbers, you'll get that Apache should be able to handle 400 simultaneous requests... however Apache also has an out of the box limit of 150 simultaneous requests (MaxClients). At first, I changed just the MaxClients to 400.. but then I figured "more is better", right? So, I upped ServerLimit to 20 and set MaxClients to 500. My Apache config now had a section that looks like this:

<IfModule mpm_worker_module> ServerLimit 20 StartServers 5 MaxClients 500 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 25 MaxRequestsPerChild 0 </IfModule>

After restarting Apache, the load average immediately doubled. My first reaction was that I broke something, but I went to statisfy.net, reloaded a few times, and (to my surprise) watched it load right away without any problems. Aaron suggested that "the load average doubled because you're now handling more requests... before you were just discarding them"... seems logical.

Keep-Alive optimization

After a while, I noticed that Apache would still hit the 500 MaxClients from time to time. I setup mod_status so I could watch what was actually happening at the Apache level and I noticed probably 70% of the threads were in a Keep-Alive state. This means that the server isn't actually processing a request but rather just waiting for another request from that user. I tried disabling Keep-Alive entirely and the simultaneous connections problem went away, but the browser experience was noticeably slower... so a better solution was needed. I tried playing around with the Keep-Alive timeout but I couldn't find a good balance between simultaneous connections and browser experience.

Statisfy has two distinct use cases for people requesting files from the server. The first use case is the user who is actually on the statisfy.net site watching statistics go by. The second use case is a user visiting a remote site with the statisfy.net javascript embedded and their browser hitting the statisfy.net server to record their statistics. In this second use case, there are (almost) always two requests made and no more beyond that. The first request is for the static file stats.js and the second request is for /stats/record?blah_blah_blah. I realized that if a client requests /stats/record, we can safely close their connection rather than doing Keep-Alive because chances are they won't be making another request for a little while.

To accomplish this, I enabled mod_headers and mod_setenvif and added the following settings to my Apache config:

SetEnvIf Request_URI "/stat/record" CloseConnectionAfterRequest Header set Connection "close" env=CloseConnectionAfterRequest

I restarted Apache and watched mod_status for a while and noticed we went from close to 500 simultaneous connections down to about 100. Furthermore, the browser experience wasn't affected at all because Keep-Alive would be active for the browsers that were actually going to use it. Success!!!