Posts Tagged ‘web’

5 Tips for Building a Web SaaS

Sunday, February 15th, 2009

I have encountered tons of problems, some small, some large while building XP-Dev.com as a web software as a service (SaaS). To be honest, I was a little too naive and didn’t foresee some of these issues, and I really do hope it will help someone out there who’s thinking of building a SaaS for the masses.

1. Case Sensitivity

Case sensitivity of your unique keys is really important. For example, when you’re building a user database, you need to consider whether User1 is the same as uSeR1 and vice versa. Do note that email addresses are case in-sensitive and you’ll need to be able to cope with that in your application code. If you’re hosting your application on a Linux box, do remember that in general, Unix filesystems are case sensitive, and if you had a directory for User1 on your server, you could have another directory for uSeR1 as well.

On the database side, MySQL has a small bug nifty feature that will actually help you solve this headache a little. If you declare a column as a VARCHAR, searches on it are case insensitive, i.e. if you search for uSeR1, you will get back User1.

If you’re finding some weirdness around Hibernate, MySQL and case sensitivity, do have a look at by past blog about it.

2. Internationalization and Unicode

Just build everything on UTF-8, from the beginning, on each-and-every-file, on each-and-every-request. It will save you a whole load of headache later on when you’re considering releasing your SaaS to the non-english speaking world (and that’s a HUGE motha-** of a world that you don’t want to miss out on).

Use UTF-8 database tables. Depending on your installation, you’ll find that MySQL uses latin1, and that doesn’t bode too well with them accented and asian characters. The trick is to use the ‘CHARACTER SET‘ option when creating your database, and setting ‘charset=utf8‘ when creating tables.

Do use the awesome W3C HTML Validator to ensure that web browsers are reading your SaaS using the correct encoding:

3. Login/Register Lifecycle

Ahh yes, authentication! There’s a ton of research put into answering the question ‘How do I authenticate users on a website ?’. But my gripe is not about the authentication itself – it’s about doing the right thing after authentication.

Here’s a common scenario – User1 visits http://example.com/some/private/service/ which is an authenticated service – i.e. User1 needs to login to example.com to be able to access it. The problem is that some SaaS out there immediately redirect User1 to their ‘dashboard‘ or ‘homepage‘ on example.com – http://example.com/userhomepage.

This will frustrate users as they have to:

  • Remember the initial URL http://example.com/some/private/service/
  • Login
  • Retype http://example.com/some/private/service/ in the browser’s address bar
  • Press Enter.

The same applied if the user has not even registered for your SaaS.

The solution here seems pretty obvious – keep track of the last URL that a user hit before reaching your authentication pages, and upon successful registration or authentication, just redirect the user back to the original URL.

Most web frameworks will have support for this functionality in one form or another. Do look it up and get it in before the site goes live.

4. URLs and Permalinks

Keep everything in nice encapsulated URLs. This is a subjective area (that has been debated to an extent that it’s no longer funny), but I think having URLs that do not contain query paramaters are:

  • Easier to remember
  • Search engine/SEO friendly
  • Cleaner to regenerate in code

For example, instead of having:

http://example.com/some/private?service=login

You could instead have:

http://example.com/some/private/login

If you’re using a modern web server like Apache, Nginx or Lighttpd, they all provide some mechanism of rewriting URLs so you don’t have to modify your code too much.

5. Application Level Permissions Layers

Most SaaS are essentially database driven applications, and they all access the database under a single user. In some complicated setups, this can actually go to 2 users – one for reading and one for writing. An even more complicated setup, each user of the SaaS will have a database login.

All of these are essentially not enough.

And here’s why – in a world where a normalised database structure is all the hype, there’s a high degree of certainty that data for User1 sits on the same table as data for User2. As far as I know most databases don’t really have row level permissioning and hence, having to rely on your database as your permission layer just does not work.

There is one setup where I thought that it might work – give each user a new table or database. But clearly this is a solution that simply won’t scale.

So, what’s the alternative ? Embed it into your application code. The decorator or facade patterns are extremely powerful for implementing this. Moreover, you can do complex permissioning, for e.g. User1 can read the business object during weekends, but not when User2 is logged in at the same time. OK, fine – ts a bad example, but you get the point.

Why bother going through all this trouble ? Well, here’s a generic use case:

MyCalendar app is a web SaaS online calendar offering. Each user can have multiple calendars, and they are all private to the user. To retrieve a calendar, all a user has to do is visit http://example.com/calendar/<calendar id>/ where calendar id is an identifier on a database table.

Say User1 has calendars with calendar ids 240, 252 and 362. If MyCalendar app didn’t have application level permissioning, User2 would happily be able to view all 3 of User1’s calendar.

So, the natural question to ask is “Do users actually try to do that ?”. YES! They will. I’m not sure whether they are curious, or looking for a security hole, but you will find some users exploring the URLs. What I mean by that is, say User2 has a calendar id of 5442. He/She will try to visit the URLs for calendar id 5440-5449, even though there are no direct links to those calendars that they can see (except 5442).

Using a database driven web framework like Rails and Django is all well and good, but remember to implement some application level permissioning if you have any private data.

There You Have It

5 simple tips that will save you a ton of hassle if you’re building an SaaS. As always, feedback is appreciated.

Moving Over to Nginx

Tuesday, December 16th, 2008

nginxRunning XP-Dev.com has its set of unique problems, and it has not always been easy. I’ve always tried to run the whole infrastructure on a shoe-string budget at the same time trying not to compromise on quality.

One of the problems is hardware resource.

The truth is: Apache is a memory hog, and to keep things scalable for serving Subversion repositories, I decided to remove all PHP websites out from apache and run them under nginx and PHP-CGI (sudo apt-get install php5-cgi). To be honest, I did not notice any difference in performance of the web sites (apache/mod_php vs nginx/fastcgi/php-cgi), however, the main motivation of this exercise is to limit the maximum amount of memory that my non-critical PHP web sites take, and at the same ti

me, giving apache more room to grow for serving the Subversion repositories. I could have had two apache installations, and give them different limits (by tweaking MaxSpare*, MaxRequests* and friends), but that’s an outright pain to manage. Moreover, I needed a simple webserver that can just serve static content as well.

And lets not forget the users of virtual private servers (VPS) with limited amount of memory. Nginx and PHP-CGI is a much appropriate solution for those memory limited configurations.

I had a look around, and it was basically down to lighttpd or nginx as a replacement to serve the PHP websites, and I picked nginx as there were some odd bugs with lighttpd serving large files. The FastCGI performance is almost the same (I did not really do any scientific benchmarks). However, the part that really got me sold on these two was that it used a master-slave threading model, rat

her than the (out of date) one thread/process per client model, which does not scale at all. Both of them are event driven, rather than “client socket” driven. BTW, this includes the awesome J2EE web container Jetty (if you use the SelectChannelConnector).

Migrating the websites across from apache to nginx/fastcgi/php-cgi was an absolute breeze and here are a few pointers that will help ease the burden.

Strategy

Just to clarify, in the apache/mod_php world, PHP files are served via the apache process itself. The strategy under nginx is to get nginx to pass on the request to another set of long running php-cgi processes that do the actual PHP processing. The response will then be passed back to nginx, which will send it back to the web browser.

Documentation

Use the English Nginx wiki extensively. There’s a lot of documentation there on configuring and tweaking nginx, especially the module reference pages. Here’s a quick and dirty howto on getting nginx+fastcgi and php-cgi working.

PHP FastCGI Start/Stop Scripts

Save yourself the trouble of writing a custom PHP FastCGI start/stop script. Install lighttpd and use their spawn-fcgi script wrapper. Its really going to save you a lot of painful hours. I wrote a simple wrapper around that script as I wanted PHP cgi to startup on every server bootup, or if I wanted a quick restart of the processes. You might rant to adjust the variables pidfile and cgidir for your setup.

#!/bin/bash

me=`whoami`
if [ $me != "root" ]; then
        echo Not root!
        exit 1
fi

pidfile=/root/php.PID
pid=`cat $pidfile`
cgidir=/var/run/php-cgi
sock=$cgidir/unix.sock

[ ! -d $cgidir ] && echo creating $cgidir && mkdir $cgidir && chown www-data.www-data $cgidir

if [ "$pid" != "" ]; then
        echo Killing $pid
        kill $pid
        rm $pidfile
        sleep 1
fi

[ -f $sock ] && chown www-data.www-data $sock

/usr/bin/spawn-fcgi -f /usr/bin/php-cgi -s $sock -C 5 -P $pidfile -u www-data -g www-data

Stop serving .htaccess

Plenty of web apps out there have built in support for apache, and include .htaccess files in their distribution to reduce the configuration overhead for the installer. However, nginx will serve these files by default, which maybe fine for most of the cases, but its always good practice to deny access to it. Simple config for nginx does the trick

location ~ /\.ht {
    deny  all;
}

Serving PHP files

To serve PHP files, nginx will pass the request to the PHP-CGI handlers.

location ~ .*\.php$ {
	fastcgi_pass   unix:/var/run/php-cgi/unix.sock;
	fastcgi_index  index.php;
	include /etc/nginx/fastcgi_params;
	fastcgi_param  SCRIPT_FILENAME  /home/rs/local/wordpress/$fastcgi_script_name;
}

Notice that I’ve included a /etc/nginx/fastcgi_params file above. This file contains all the regular FastCGI directives, and I’ve put it in a seperate file to avoid too much repetition. The content of the file /etc/nginx/fastcgi_params is below:

fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;

fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  SERVER_PROTOCOL    $server_protocol;

fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;

fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
fastcgi_param  SERVER_ADDR        $server_addr;
fastcgi_param  SERVER_PORT        $server_port;
fastcgi_param  SERVER_NAME        $server_name;

Wordpress Rewrite

The final tip is for all those Wordpress junkies out there. To get nice urls for Wordpress, you will need the following rewrite directive. If I’m not mistaken, one will be given to you for apache when you’re setting up custom urls via the admin screen, but not for nginx:

if (!-e $request_filename) {
    rewrite ^(.+)$ /index.php?q=$1 last;
}

And that’s about it. I really do hope these tips will help someone out there. I know it would have shaved a couple hours off my setup time had I known them beforehand.

Pareto principle applied to referer links

Thursday, November 13th, 2008

Anyone else notice how 80% of your traffic comes from 20% of your referers ?

To be honest, for me and the sites I run (including XP-Dev.com) the rule is more 90%-10%. The top 10% of the referers bring in 90% of the visitors.