Moving Over to Nginx

Running XP-Dev.com has its set of unique problems, and it has not always been easy. I’ve always tried to run the whole infrastructure on a shoe-string budget at the same time trying not to compromise on quality.

One of the problems is hardware resource.

The truth is: Apache is a memory hog, and to keep things scalable for serving Subversion repositories, I decided to remove all PHP websites out from apache and run them under nginx and PHP-CGI (sudo apt-get install php5-cgi). To be honest, I did not notice any difference in performance of the web sites (apache/mod_php vs nginx/fastcgi/php-cgi), however, the main motivation of this exercise is to limit the maximum amount of memory that my non-critical PHP web sites take, and at the same ti

me, giving apache more room to grow for serving the Subversion repositories. I could have had two apache installations, and give them different limits (by tweaking MaxSpare*MaxRequests* and friends), but that’s an outright pain to manage. Moreover, I needed a simple webserver that can just serve static content as well.

And lets not forget the users of virtual private servers (VPS) with limited amount of memory. Nginx and PHP-CGI is a much appropriate solution for those memory limited configurations.

I had a look around, and it was basically down to lighttpd or nginx as a replacement to serve the PHP websites, and I picked nginx as there were some odd bugs with lighttpd serving large files. The FastCGI performance is almost the same (I did not really do any scientific benchmarks). However, the part that really got me sold on these two was that it used a master-slave threading model, rat

her than the (out of date) one thread/process per client model, which does not scale at all. Both of them are event driven, rather than “client socket” driven. BTW, this includes the awesome J2EE web container Jetty (if you use the SelectChannelConnector).

Migrating the websites across from apache to nginx/fastcgi/php-cgi was an absolute breeze and here are a few pointers that will help ease the burden.

Strategy

Just to clarify, in the apache/mod_php world, PHP files are served via the apache process itself. The strategy under nginx is to get nginx to pass on the request to another set of long running php-cgi processes that do the actual PHP processing. The response will then be passed back to nginx, which will send it back to the web browser.

Documentation

Use the English Nginx wiki extensively. There’s a lot of documentation there on configuring and tweaking nginx, especially the module reference pages. Here’s a quick and dirty howto on getting nginx+fastcgi and php-cgi working.

PHP FastCGI Start/Stop Scripts

Save yourself the trouble of writing a custom PHP FastCGI start/stop script. Install lighttpd and use their spawn-fcgi script wrapper. Its really going to save you a lot of painful hours. I wrote a simple wrapper around that script as I wanted PHP cgi to startup on every server bootup, or if I wanted a quick restart of the processes. You might rant to adjust the variables pidfile and cgidir for your setup.

#!/bin/bash

me=`whoami`
if [ $me != "root" ]; then
        echo Not root!
        exit 1
fi

pidfile=/root/php.PID
pid=`cat $pidfile`
cgidir=/var/run/php-cgi
sock=$cgidir/unix.sock

[ ! -d $cgidir ] && echo creating $cgidir && mkdir $cgidir && chown www-data.www-data $cgidir

if [ "$pid" != "" ]; then
        echo Killing $pid
        kill $pid
        rm $pidfile
        sleep 1
fi

[ -f $sock ] && chown www-data.www-data $sock

/usr/bin/spawn-fcgi -f /usr/bin/php-cgi -s $sock -C 5 -P $pidfile -u www-data -g www-data

Stop serving .htaccess

Plenty of web apps out there have built in support for apache, and include .htaccess files in their distribution to reduce the configuration overhead for the installer. However, nginx will serve these files by default, which maybe fine for most of the cases, but its always good practice to deny access to it. Simple config for nginx does the trick

location ~ /\.ht {
    deny  all;
}

Serving PHP files

To serve PHP files, nginx will pass the request to the PHP-CGI handlers.

location ~ .*\.php$ {
	fastcgi_pass   unix:/var/run/php-cgi/unix.sock;
	fastcgi_index  index.php;
	include /etc/nginx/fastcgi_params;
	fastcgi_param  SCRIPT_FILENAME  /home/rs/local/wordpress/$fastcgi_script_name;
}

Notice that I’ve included a /etc/nginx/fastcgi_params file above. This file contains all the regular FastCGI directives, and I’ve put it in a seperate file to avoid too much repetition. The content of the file /etc/nginx/fastcgi_params is below:

fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;

fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  SERVER_PROTOCOL    $server_protocol;

fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;

fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
fastcgi_param  SERVER_ADDR        $server_addr;
fastcgi_param  SERVER_PORT        $server_port;
fastcgi_param  SERVER_NAME        $server_name;

WordPress Rewrite

The final tip is for all those WordPress junkies out there. To get nice urls for WordPress, you will need the following rewrite directive. If I’m not mistaken, one will be given to you for apache when you’re setting up custom urls via the admin screen, but not for nginx:

if (!-e $request_filename) {
    rewrite ^(.+)$ /index.php?q=$1 last;
}

And that’s about it. I really do hope these tips will help someone out there. I know it would have shaved a couple hours off my setup time had I known them beforehand.

Free Subversion Hosting

Many people over the past few months, have been asking the same questions over and over again about the services over at XP-Dev.com. I don’t mind answering them with the same answers, but I think it is time to put all of these questions into one place and discuss them.

Why are you offering Subversion Hosting for free ? Is it too good to be true ?

Let me set something straight:

I offer it free because I really do not believe that anyone should pay for something so simple to setup and run as Subversion.

Here is the reality: I setup Apache using mod_svn, mod_dav, mod_ssl and mod_auth_mysql once. Believe me: only once and never ever ever ever (ever!) touched it again. No, I am not kidding – only once! No tinkering needed, it just runs like Forrest Gump (no pun intended to all you Gump fans out there).

It does cost $$$ to host it, including my time to add more features to it. Disk space and bandwidth is getting cheaper. They are not free, but then again, if you average it across the number of users that I have on XP-Dev.com, the figure looks really, really small. It is a cost nonetheless, which I’ll try to cover below.

So, we’ve established it does cost money, how are you covering these costs ? Are you really rich ?

OK. I wish I was rich, but the truth is – I am not. I could claim I was rich and lie to you all, but then I would not get any glory every time I look at my monthly bank statements.

So, where does the money come from to pay for the services ? Well, at the moment, I am paying for it. But I won’t be doing this forever.

I have got a few models to generate revenue and these models will be implemented in the next few months. I can’t reveal them to the public just yet, but rest assured that the usage of Subversion and project tracking on XP-Dev.com will always remain free. This is how I started and envisaged XP-Dev.com, and that is how it will always be.

Free Subversion Hosting and Project Tracking on XP-Dev.com is a life-time guarantee.

You’re offering a free service. There’s a catch to it, right ? Are you selling our code to someone else ?

No. Nada. No catch. I am not a petty code trader. I don’t go around knocking on other peoples doors saying “PHP codez $4 per line! .. $3.50 per line! .. $3.40 per line! ..”. I could not even be the least bothered about what everyone else is coding. I have my own ideas to push forward and materialise (one of them is XP-Dev.com, there are a lot more in the pipeline).

So, your code is safe on our servers. No one else other than the ones you have permissioned are looking at your repositories. We do have backups that run every night and copied over off-site, but they are all encrypted before leaving the server.

I put all my code on XP-Dev.com. I am a consumer of my own service. I believe that anyone who offers a service should always be their own users/clients/customers. You should see your service from the customers point of view.

If someone else looked at my code and data, I’d be really worried. I respect that tremendously and try my very best to lock down the server.

What you see is what you get – WYSIWYG. There are no catches at all. Your code and data are safe. We have a “no prying eyes” and “mind your own business” policy.

OK. So it is a genuine service that is FREE with no strings attached. Then I suppose it will have to be an overloaded, slow service ?

Never! This is one of the things that come out from being a consumer of your own service. If the services do get slow, there’s going to be one really noisy, angry, verbal user – me. And I’m really scared of him.

On a serious note, I’d be disappointed with myself if the service ever comes to a unacceptable quality. At the moment it’s fast and quick and I intend on keeping it that way. If it every becomes slow, I’ll be there in front of the queue shouting.

I’m not too sure if this is a good thing, or a bad thing – I’ve only ever worked in the Front Office for Investment Banks building real-time (well, its near real-time) trading and pricing system. They are all high performance scalable systems. The systems I work on can cost a trader anywhere between $100,000 to $500,000 if latency went up a nudge above 10ms (yes, that’s milliseconds!). XP-Dev.com is a testament of my experience building & architecting these crazy systems (trust me, they are crazy!). If performance degrades, it will be a major failure on my part and I’m a really proud person 🙂 .

It is a great service. How can I help ?

This reply is a cliche. There are a few ways you can help.

If you are not a user, register now!

If you are a user, and have any problems, queries or just want to say thank you, then please tell me, or email admin@xp-dev.com. Every single non-spam email that goes there gets a reply. If you don’t get a reply in a few hours, then it’s probably SpamAssassin acting up. You should use this form instead.

If you are a user, or not even one just yet – you can help by telling your friends, mom, dad, brothers, sisters, relatives, neighbours, cats, dogs, fish and everyone else about XP-Dev.com. Digg it, Buzz it, Reddit. Do whatever. Just keep spreading the word. I really appreciate it.

If you have any other questions or concerns, please post them as comments to this blog entry, or do contact me directly.

Ext3 – handling large number of files in a directory

If you’ve used Linux in the past, I am pretty sure that you’ve heard of the Ext3 file system. It is one of the most common file system format out there, used mainly on Linux based systems.

I’ve noticed something really annoying about how it handles large number of files in a single directory. Essentially, I have a directory with almost a million files and I found that creating a new file in this directory took ages (in the region of tens of seconds), which is not ideal at all for my purpose.

After some reading, and much research, I learnt that Ext3 stores directory indices in a flat table, and this causes much of the headache when a directory has many files in a directory. There are a couple of options.

One, restructure the directory so that it does not contain that many files. I did some tests, and in a default (untuned) Ext3 partition, each subsequent write degrades horribly past the 2000 file limit. So, keeping the items in a directory to within 2000 files should be fine.

Second, is to enable the dir_index option on the Ext3 file system. Run the following as root and you should find that it improves a lot. Do note that the indexing will take up much more space, but then hard disk space is not too expensive nowadays:

$ sudo tune2fs -O dir_index /dev/hda1

Finally, just use something like ReiserFS which stores directory contents in a balanced tree, which is pretty darn fast and you don’t have to muck around tweaking things.

If you’ve got your main partition as an Ext3, and can’t really afford to reformat it into ReiserFS, there might be an alternative: create a blank file and format that as a ReiserFS file system and mount it using loopback.

So, lets create the file first. This depends on how much data you need to handle, and in this example, I’ll just create a ~100MB file full of zeros:

$ dd if=/dev/zero of=reiser.img bs=1k count=100000

Next, format the file using ReiserFS as below. It will complain about the file ‘reiser.img’ not being a special block device (and we know that!). Just say yes and carry on.

$ mkreiserfs -f reiser.img

Finally, mount it where you would like to read/write files into it (need to do this as root):

$ sudo mount -t reiserfs -o loop reiser.img /tmp/listdir

You might need to do some chown so that your normal user can write into it. Moreover, if you need it to startup during boot, do remember to put it in /etc/fstab !

FYI, I used a Python script below to see how long it took to write new files:

import os
import time

count = 1000000
total = 0.0
for i in range(count):
	if i % 1000 == 0:
		print 'Creating %i' % i
	start = time.time()
	open('/tmp/listdir/%s' % i, 'w').close()
	total += (time.time() - start)
print 'Avg is %0.8f' % (total / count)