Mac OS X – List Open Ports

In the world of Linux, you could use netstat to list all ports that are open on your system. I tend to use the following in Linux:

netstat -aep | grep ':\*'

However, netstat in Mac OS X behaves extremely differently. To be able to list open ports on Mac OS X, you could use something along the lines of:

sudo lsof -i -P | grep -i "listen"

For example:

localhost:~$ sudo lsof -i -P | grep -i "listen" 
launchd 1 root 27u IPv6 0xf38ca75ebb725cfd 0t0 TCP localhost:631 (LISTEN)
launchd 1 root 28u IPv4 0xf38ca75ebb726c1d 0t0 TCP localhost:631 (LISTEN)
launchd 1 root 30u IPv6 0xf38ca75ebb72591d 0t0 TCP *:22 (LISTEN)
launchd 1 root 31u IPv4 0xf38ca75ebb7264cd 0t0 TCP *:22 (LISTEN)
polard 79 root 6u IPv4 0xf38ca75ebdb15c1d 0t0 TCP localhost:49152 (LISTEN)
...

Freedom and All That

I’ve just finished posting a new blog entry on XP-Dev.com on the definition of Stories, Tasks and Bugs from XP-Dev.com’s point of view. However, the beauty of it all is that users make XP-Dev.com their own. They end up using it for various other purposes other than software development (and agile at that!). They are free to use it whichever way they see fit.

In my blog entry, I said:

However, this article (and future ones) are only guidelines – you are free to decide on how you’d like to use XP-Dev.com.

“Free to decide” – these words triggered a memory from the past.

Back in 2003 I was doing an summer internship/job with Lec here in the UK and was rushing back to their office from Victoria station in London. I was running late and did not have enough time to read the notice board on which platform I was meant to catch the train from.

I ran to the first person who looked like he worked at the station, and I asked him which is the next train to Bognor Regis (Lec HQ). Apparently there were 2 – one that was leaving at that moment, and another that was leaving in 30 minutes. I wasn’t too sure which one my ticket was for, and showed him my tickets and asked “I’ve got these tickets – which train can I board?”.

As soon as I finished my question, he immediately replied “Take which ever one you’d like – it’s a Free country”. He didn’t bother even looking at the tickets!

With adrenaline still rushing through my heart, I thanked him, ran to the train and boarded it. The train left a minute later.

I have always comeback to this little episode whenever I think of usages of a tool or even idea – let it be a handy tool (screwdriver, etc) or even an edit (one of the team asking whether its OK to use Vim instead of Emacs). I have a 2 second flashback to that moment in Victoria station and reply to them in the same manner – “Its a free country. Do whatever you want.”

And this is the beauty about building tools that people use, like XP-Dev.com. Everyone has their own way of using it and it should be just like that. No hard rules, everyone gets to do it their own way, and everyone is happy.

There’s a saying “With freedom comes responsibility” – so, just don’t do anything unlawful.

XP-Dev.com: Another Milestone

So, it has been 3 months and a bit since the upgrade to the new platform that runs the current version of XP-Dev.com, and it has been eventful. There was the release that took a whole day, and then were some functional releases as well. The latest release bring some really cool features to XP-Dev.com.

Another milestone has been reached, and the functionality gap has been narrowing drastically the past 3 months. However, I will be brave enough to admit that the gap is still there, and at least for me, there’s still a mountain to climb ahead.

Enjoy the new releases, and as usual, your feedback is appreciated. Do give a shout in the forums, or just raise a support ticket. You can contact me via this blog as well.

Managing People in 15 minutes, or is it Micro-Management ?

I enjoy reading HarvardBusiness.org. There are some really insightful articles there, and I think about some of the better ones days after reading them.

When I first read How to Manage People in 15 Minutes a Day by Daisy Wademan Dowling, I thought that it was a fairly good article. It was short and to the point. 10 minutes went by and that’s when its struck me: it’s rubbish.

Now, the article gives out 4 pointers on how to manage people in 15 minutes a day. My feeling is that its geared towards the busy manager/executive who will need to squeeze out every second of the working day to ensure that the team is working at 100%.

These are the 4 tips that were listed:

  • Turn dead time into development time. Walking back to your office after a meeting? Use those two minutes to give your direct report feedback on the presentation, and on how he could do better next time. He didn’t have a speaking role? Ask him how he thought the meeting went and how he might have made certain points differently — and then offer feedback on that. Direct, in-the-moment feedback is your single best tool for developing people.
  • Constantly spot dead time. Look for every two-minute stretch in your day during which you could be talking to someone else — most often, that’s travel time — and convert each into a coaching opportunity. Walking down to Starbucks to get a coffee? Driving to the airport? Headed out to your car at the end of the day? Ask one of your people to come along with — and talk to them about their goals and priorities.
  • Show up in their workspace. Employees expect you to stay in your seat. Don’t. Once per day, get up and walk over to the desk of someone you haven’t spoken to recently. Take two minutes to ask her what she’s working on. Once she’s done answering, respond “What do you need from me to make that project/transaction successful?” Message to employee: I know who you are, I’ve got high expectations — and I’ve got your back.
  • Make two calls per day. On your way home from work, call (or email) two people you met with that day, and offer “feedforward.” “I like what you’ve done with the Smithers account. Next time, let’s try to keep marketing costs down. Thanks for your hard work.” Always make “thank you” a part of the message. Employees who feel appreciated, and know that you’re trying to develop their skills, stay engaged over the long run.

I won’t go into the details and analyse each and every point. However, here’s my feedback to these 4 tips:

Points 1. and 2. is essentially time management. Its not rocket science and the subject been there since the dawn of modern management. So, lets not kid ourselves that we’re actually gaining new knowledge by coining it in new, fashionable ways. Nothing new here. Telling managers that they need to make sure every second of their time is used effectively is just pointless. Everyone knows about the importance of time management, but most get stuck on how to implement/execute it. For execution, there’s a plethora of seminars, books, papers, conferences around the subject.

When I was managing teams and projects, there were days that felt endless. I usually started work at 7am, and at 8am sharp 5 people decide to whack in 5 different meetings throughout the day at different times. Managing those days were challenging, but its was not rocket science. On days like those, every 5 minutes free time was spent with someone, which is a trait I picked up from my director at that time. I’ve sat in countless of meetings that I had lunch in. You just have to make every moment count.

However, the rubbish part comes on points 3. and 4. If you read them carefully and picture yourself as the manager, it makes sense. But lets remember – management is not about you, it’s about them. Now, put yourself in the place of the your direct report and all of a sudden points 3. and 4. suddenly feel like you’re being micro-managed. This is not the way to lead a team. To be honest, if I had done that to my teams in the past, the consequences would have been dire. Admittedly, managing developers is different from managing a team of say, sales people. However, micro-management is always a bad thing to do. It undermines, demoralises and makes the person feel utterly useless.

Now, it’s not normal to find such low quality content on HarvardBusiness.org, and this one has to be one of the worst I’ve read. Infact, head over to the article right now and just read the comments that were submitted. Not entirely good feedback at all.

5 Tips for Building a Web SaaS

I have encountered tons of problems, some small, some large while building XP-Dev.com as a web software as a service (SaaS). To be honest, I was a little too naive and didn’t foresee some of these issues, and I really do hope it will help someone out there who’s thinking of building a SaaS for the masses.

1. Case Sensitivity

Case sensitivity of your unique keys is really important. For example, when you’re building a user database, you need to consider whether User1 is the same as uSeR1 and vice versa. Do note that email addresses are case in-sensitive and you’ll need to be able to cope with that in your application code. If you’re hosting your application on a Linux box, do remember that in general, Unix filesystems are case sensitive, and if you had a directory for User1 on your server, you could have another directory for uSeR1 as well.

On the database side, MySQL has a small bug nifty feature that will actually help you solve this headache a little. If you declare a column as a VARCHAR, searches on it are case insensitive, i.e. if you search for uSeR1, you will get back User1.

If you’re finding some weirdness around Hibernate, MySQL and case sensitivity, do have a look at by past blog about it.

2. Internationalization and Unicode

Just build everything on UTF-8, from the beginning, on each-and-every-file, on each-and-every-request. It will save you a whole load of headache later on when you’re considering releasing your SaaS to the non-english speaking world (and that’s a HUGE motha-** of a world that you don’t want to miss out on).

Use UTF-8 database tables. Depending on your installation, you’ll find that MySQL uses latin1, and that doesn’t bode too well with them accented and asian characters. The trick is to use the ‘CHARACTER SET‘ option when creating your database, and setting ‘charset=utf8‘ when creating tables.

Do use the awesome W3C HTML Validator to ensure that web browsers are reading your SaaS using the correct encoding:

3. Login/Register Lifecycle

Ahh yes, authentication! There’s a ton of research put into answering the question ‘How do I authenticate users on a website ?’. But my gripe is not about the authentication itself – it’s about doing the right thing after authentication.

Here’s a common scenario – User1 visits http://example.com/some/private/service/ which is an authenticated service – i.e. User1 needs to login to example.com to be able to access it. The problem is that some SaaS out there immediately redirect User1 to their ‘dashboard‘ or ‘homepage‘ on example.com – http://example.com/userhomepage.

This will frustrate users as they have to:

Remember the initial URL http://example.com/some/private/service/
Login
Retype http://example.com/some/private/service/ in the browser’s address bar
Press Enter.

The same applied if the user has not even registered for your SaaS.

The solution here seems pretty obvious – keep track of the last URL that a user hit before reaching your authentication pages, and upon successful registration or authentication, just redirect the user back to the original URL.

Most web frameworks will have support for this functionality in one form or another. Do look it up and get it in before the site goes live.

4. URLs and Permalinks

Keep everything in nice encapsulated URLs. This is a subjective area (that has been debated to an extent that it’s no longer funny), but I think having URLs that do not contain query paramaters are:

Easier to remember
Search engine/SEO friendly
Cleaner to regenerate in code

For example, instead of having:

http://example.com/some/private?service=login

You could instead have:

http://example.com/some/private/login

If you’re using a modern web server like Apache, Nginx or Lighttpd, they all provide some mechanism of rewriting URLs so you don’t have to modify your code too much.

5. Application Level Permissions Layers

Most SaaS are essentially database driven applications, and they all access the database under a single user. In some complicated setups, this can actually go to 2 users – one for reading and one for writing. An even more complicated setup, each user of the SaaS will have a database login.

All of these are essentially not enough.

And here’s why – in a world where a normalised database structure is all the hype, there’s a high degree of certainty that data for User1 sits on the same table as data for User2. As far as I know most databases don’t really have row level permissioning and hence, having to rely on your database as your permission layer just does not work.

There is one setup where I thought that it might work – give each user a new table or database. But clearly this is a solution that simply won’t scale.

So, what’s the alternative ? Embed it into your application code. The decorator or facade patterns are extremely powerful for implementing this. Moreover, you can do complex permissioning, for e.g. User1 can read the business object during weekends, but not when User2 is logged in at the same time. OK, fine – ts a bad example, but you get the point.

Why bother going through all this trouble ? Well, here’s a generic use case:

MyCalendar app is a web SaaS online calendar offering. Each user can have multiple calendars, and they are all private to the user. To retrieve a calendar, all a user has to do is visit http://example.com/calendar/<calendar id>/ where calendar id is an identifier on a database table.

Say User1 has calendars with calendar ids 240, 252 and 362. If MyCalendar app didn’t have application level permissioning, User2 would happily be able to view all 3 of User1’s calendar.

So, the natural question to ask is “Do users actually try to do that ?”. YES! They will. I’m not sure whether they are curious, or looking for a security hole, but you will find some users exploring the URLs. What I mean by that is, say User2 has a calendar id of 5442. He/She will try to visit the URLs for calendar id 5440-5449, even though there are no direct links to those calendars that they can see (except 5442).

Using a database driven web framework like Rails and Django is all well and good, but remember to implement some application level permissioning if you have any private data.
There You Have It

5 simple tips that will save you a ton of hassle if you’re building an SaaS. As always, feedback is appreciated.

Taking the Stairs

I have been awfully quiet since the beginning of the new year and I have to apologise. It has been extremely busy at work, and along with the ambitious plans that I have for XP-Dev.com, I never seem to have any spare time left.

Now, one of the things that I have missed out was blogging about my resolution for this year. 2008 has been a real roller coaster for me – both, professionally and personally, and I do want to share one of my observations which happened at the end of December 2008.

Lazy ?

I work (at an investment bank doing IT “stuff”) on the 6th floor of my building, and one day, the lift stopped at the 2nd floor to let someone in. This lady (she looked young, maybe late 20s) steps in and calls for the 3rd floor. Yes, she just went up one floor, and to make matters worse, there’s a fully functional escalator literally 20 metres away from the lefts.

At that moment, I was just thinking “Jeez, one floor? AND there’s an escalator just there. This is just pushing it. She should have just taken the stairs” (escalators in this case).

And that’s when it struck me – 2008 is the year the world paid the price for the finance industry not taking the stairs.

Her action to take the lifts instead of the escalators, cost me and 4 others in the lift a few seconds each, but it was just plain lazy and inconsiderate.

Reflection

I have only ever worked for 3.5 years, all of it in investment banking. In my short professional career, I have been a developer, a mentor, a manager, an architect, and a general noise generator (the actual label was worse, filled with profanity). I have seen systems and businesses fall to their knees, while others have run perfectly, some flourished and even some failures to take off (i.e. projects/businesses that seem to just go on with no delivery/product/end in sight).

Looking back on these 3.5 years, I noticed that the ones that failed had just taken outright shortcuts. The finance industry, in their goal of getting rich fast took plenty of shortcuts. They were not prudent enough to sit back, reanalyse the situation and figure out the right way to move forward (and actually do it even though its hard work).

Everyone is paying the price for it. Jobs are being cut everywhere. Businesses are failing – all due to the finance industry not taking the stairs. Their actions of taking shortcuts to generate revenue has cost everyone else a lot of grief – physical (monetary, etc) and emotional (job loss, security, etc).

Wake Up

Personally, I hate having to do a half-ass job. I hate taking shortcuts. I like to get to the bottom of things and nip the problem in the bud. I have to admit that there have been times when I did take the shortcuts.

So, my goal for 2009 is simple – take the freaking stairs, and you should too. Whenever you’re in a situation that requires a decision, or solving a problem, just keep asking yourself “What is the actual problem? What is the right solution?” and just do it!

Asymmetric Follow, Pub/Sub and Systems Design

James Governor mentions a very interesting pattern of web 2.0 – asymmetrical follow. In a nutshell, it’s basically an unbalanced communication network – you have some nodes on a network (hubs) that tend to have a lot of inbound links compared to others. In other words, it is a situation where popular people whose words/thoughts/opinions/tweets get read by the masses, and they (the popular people) do not reciprocate.

The thing is, asymmetrical follow exists everywhere. Celebrities have a lot of inbound links (tabloids, fans, press, etc), but they do not necessarily have a reverse link back. Blogs by their nature are asymmetrical as well – the blogger publishes a post thats read by visitors, and comments on the blogs, or even pingbacks do not necessarily get read or responded to by the poster. Its not being rude or anything anti-social about it, but its a pattern, and James is right – it is core to Web 2.0. Back in Dec ‘07, JP mentions that Twitter is neither a push or a pull network, but it is actually publish-subscribe.

The point of this post surrounds what James mentions in his article:

But Twitter wasn’t designed for whales. It was designed for small shoals of fish. Which brings us to one of the big issues with Asymmetrical Follow – it introduces unexpected scaling problems. Twitter’s architecture didn’t cope all that well at first, but has performed a lot better since the message broker was re-architected using Scala LIFT, a new web application programming framework). The technical approach that is most appropriate to support Asymmetrical Follow is well known in the world of high scale enterprise messaging- its called Publish And Subscribe.

Publish-Subscribe is a very common pattern in technology. Having worked in 2 investment banks, I have seen plenty of implementations that do the exact same thing: Publishers fire data once to a middleware, and that middleware layer sends that data off to many subscribers. Sounds simple enough to implement, right ? Well, it’s not.

Designing a good, reliable, highly performant Publish-Subscribe framework is not easy. Getting the initial bits is simple and trivial, but the problem that a lot of people face is scalability. If you are looking to build a Pub/Sub layer on your own, then the first thing you have to do is stop and take a reality check. Its not worth the trouble. Buy it from someone, or reuse another framework (like Twitter have done with Scala LIFT). I am not kidding. I have seen millions of dollars go down the drain in missed opportunities, direct trading losses, etc all due to poorly designed and implemented Pub/Sub layers.

Pub/Sub frameworks are a lot like caches (for e.g. memcached) but with a twist. Not only do you have to cache data, but you have to tell subscribers when this data has changed. In fact, they are closer to finite state machines than caches.

Here are a few that I have used in the past and highly recommend any of them:

Now, if you’re still stubborn and think you’re up for the challenge, then here are a few pointers:

Design it really, really well

Sit down with a few people and walk them through your design. Your design has to look into how memory is managed, the threading model, the communication mechanism, etc. Find as many defects as possible and don’t take it personally. Do this before you even write a single line of code.

Have a solid, clean API

I have used some really arcane APIs in the past, and oddly, some of them are provided by electronic markets (no names mentioned here). Remember, the API will be used by publishers and subscribers. The cleaner the API, the less bugs it will introduce in publishers/subscribers code.

Non-blocking IO

If you’re using TCP sockets as a communication layer, do use select() (non-blocking IO). You need to break away from the one-thread-per-client model. That model, while easy to code to, just does not scale at all. I have been in way too many situations where I have inherited a system that uses the one-thread-per-client model, and all of a sudden it does not work in production because they’ve just scaled from 30 connected clients to 3000. BTW, if you’re developing in Java, I highly recommend using Apache MINA to reduce the stress of writing non-blocking IO code.

Watch out for data state inconsistencies

A common approach that a lot of frameworks use is to send a snapshot followed by updates of changes.

Publishers should send the following messages upon startup:

  1. An initial message saying “This is the beginning of my initial data”
  2. The initial data itself
  3. A final message saying “This is the end of my initial data”

From then on, publishers should just send updates.

Subscribers will get the reverse. A call to subscribe() should result in at least the following callbacks:

  1. A callback saying “This is the beginning of the data”
  2. The initial snapshot data itself
  3. A callback saying “This is the end of the data”

From then on, subscribers should just send updates.

Handling the subscribe() call in your framework is going to be tricky. You’ll need to be careful of locking your cache to ensure that no one updates it while you’re taking the initial data for the subscriber. Alternatively, you could create a snapshot copy of the cache, but keep an eye on your memory usage.

Correctness over performance

Don’t worry so much about reducing latency from 200ms to 20ms. Getting your implementation correct is far more important than performance. I’m not saying performance is not important, BUT you need to get it to work correctly before looking into performance.

Build a load test framework

You will definitely need one of these. There have been way too many times in the past where I need to reproduce a production problem related to scalability, only to find that the original authors of the system did not bother building a load test framework.

I hope by now you would realise that designing and implementing a Publish/Subscribe framework is not trivial. Buy it off the shelf as someone out there has gone through all of this pain for you.

I am a big believer that your system architecture should reflect the underlying business. It will not be a good fit if you are trying to retrofit an incorrect architecture as you will end up having loads of problems (scalability, maintenence, etc) in the long run. Asymmetric follow is here to stay, and in your next project think about how it is going to affect your architecture and what you need to do at the initial outset to get it right.

Moving Over to Nginx

Running XP-Dev.com has its set of unique problems, and it has not always been easy. I’ve always tried to run the whole infrastructure on a shoe-string budget at the same time trying not to compromise on quality.

One of the problems is hardware resource.

The truth is: Apache is a memory hog, and to keep things scalable for serving Subversion repositories, I decided to remove all PHP websites out from apache and run them under nginx and PHP-CGI (sudo apt-get install php5-cgi). To be honest, I did not notice any difference in performance of the web sites (apache/mod_php vs nginx/fastcgi/php-cgi), however, the main motivation of this exercise is to limit the maximum amount of memory that my non-critical PHP web sites take, and at the same ti

me, giving apache more room to grow for serving the Subversion repositories. I could have had two apache installations, and give them different limits (by tweaking MaxSpare*MaxRequests* and friends), but that’s an outright pain to manage. Moreover, I needed a simple webserver that can just serve static content as well.

And lets not forget the users of virtual private servers (VPS) with limited amount of memory. Nginx and PHP-CGI is a much appropriate solution for those memory limited configurations.

I had a look around, and it was basically down to lighttpd or nginx as a replacement to serve the PHP websites, and I picked nginx as there were some odd bugs with lighttpd serving large files. The FastCGI performance is almost the same (I did not really do any scientific benchmarks). However, the part that really got me sold on these two was that it used a master-slave threading model, rat

her than the (out of date) one thread/process per client model, which does not scale at all. Both of them are event driven, rather than “client socket” driven. BTW, this includes the awesome J2EE web container Jetty (if you use the SelectChannelConnector).

Migrating the websites across from apache to nginx/fastcgi/php-cgi was an absolute breeze and here are a few pointers that will help ease the burden.

Strategy

Just to clarify, in the apache/mod_php world, PHP files are served via the apache process itself. The strategy under nginx is to get nginx to pass on the request to another set of long running php-cgi processes that do the actual PHP processing. The response will then be passed back to nginx, which will send it back to the web browser.

Documentation

Use the English Nginx wiki extensively. There’s a lot of documentation there on configuring and tweaking nginx, especially the module reference pages. Here’s a quick and dirty howto on getting nginx+fastcgi and php-cgi working.

PHP FastCGI Start/Stop Scripts

Save yourself the trouble of writing a custom PHP FastCGI start/stop script. Install lighttpd and use their spawn-fcgi script wrapper. Its really going to save you a lot of painful hours. I wrote a simple wrapper around that script as I wanted PHP cgi to startup on every server bootup, or if I wanted a quick restart of the processes. You might rant to adjust the variables pidfile and cgidir for your setup.

#!/bin/bash

me=`whoami`
if [ $me != "root" ]; then
        echo Not root!
        exit 1
fi

pidfile=/root/php.PID
pid=`cat $pidfile`
cgidir=/var/run/php-cgi
sock=$cgidir/unix.sock

[ ! -d $cgidir ] && echo creating $cgidir && mkdir $cgidir && chown www-data.www-data $cgidir

if [ "$pid" != "" ]; then
        echo Killing $pid
        kill $pid
        rm $pidfile
        sleep 1
fi

[ -f $sock ] && chown www-data.www-data $sock

/usr/bin/spawn-fcgi -f /usr/bin/php-cgi -s $sock -C 5 -P $pidfile -u www-data -g www-data

Stop serving .htaccess

Plenty of web apps out there have built in support for apache, and include .htaccess files in their distribution to reduce the configuration overhead for the installer. However, nginx will serve these files by default, which maybe fine for most of the cases, but its always good practice to deny access to it. Simple config for nginx does the trick

location ~ /\.ht {
    deny  all;
}

Serving PHP files

To serve PHP files, nginx will pass the request to the PHP-CGI handlers.

location ~ .*\.php$ {
	fastcgi_pass   unix:/var/run/php-cgi/unix.sock;
	fastcgi_index  index.php;
	include /etc/nginx/fastcgi_params;
	fastcgi_param  SCRIPT_FILENAME  /home/rs/local/wordpress/$fastcgi_script_name;
}

Notice that I’ve included a /etc/nginx/fastcgi_params file above. This file contains all the regular FastCGI directives, and I’ve put it in a seperate file to avoid too much repetition. The content of the file /etc/nginx/fastcgi_params is below:

fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;

fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  SERVER_PROTOCOL    $server_protocol;

fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;

fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
fastcgi_param  SERVER_ADDR        $server_addr;
fastcgi_param  SERVER_PORT        $server_port;
fastcgi_param  SERVER_NAME        $server_name;

WordPress Rewrite

The final tip is for all those WordPress junkies out there. To get nice urls for WordPress, you will need the following rewrite directive. If I’m not mistaken, one will be given to you for apache when you’re setting up custom urls via the admin screen, but not for nginx:

if (!-e $request_filename) {
    rewrite ^(.+)$ /index.php?q=$1 last;
}

And that’s about it. I really do hope these tips will help someone out there. I know it would have shaved a couple hours off my setup time had I known them beforehand.

Spring and Jetty Integration

Jetty is a pretty darn awesome J2EE web container. With amazing features like non-blocking IO, continuations and immediate integration with Cometd – I feel that it is a solid, production ready container.

I hate war files, I hate web.xml files – there’s just way too much black magic that needs to be done to get things up and running. It is nice once someone has done the dirty work, and got the initial web.xml constructed, but I wouldn’t want to be that person who starts it all off.

Oh – another thing – I absolutely LOVE dependency injection. Using the web.xml approach, you’ll almost always have to start off a servlet of some sort to initialise various services that you’ll need – moreover the easiest way to access these services on other servlets is to use singletons, and we all know why singletons are bad!

So, I end up using Jetty in an embedded setup, and used to write various wrappers around the configuration so that I can do most of the common things with minimal code. A good example will be to setup a bunch of contexts and a DefaultServlet for regular file serving. However, the way Jetty is written makes it really easy to be used in Spring – everything is a simple bean with a bunch of setters.

To start off, lets write down the bean. Notice I’ve added an init-method attribute to start(). If you don’t want Spring to kick off your server, then just grab hold of the bean and call start() on it explicitly.

<bean name="WebServer" init-method="start">
</bean>

Then, lets add some connectors to it:

<property name="connectors">
  <list>
  <bean name="LocalSocket">
      <property name="host" value="localhost"/>
      <property name="port" value="8080"/>
  </bean>
  </list>
</property>

You will need some handlers (one of them will be a context handler to serve your servlets). I’ve added a logging handler so that the server logs requests in the same format as apache’s combined log.

<property name="handlers">
  <list>
    <bean>
      <property name="contextPath" value="/"/>
      <property name="sessionHandler">
        <bean/>
      </property>
      <property name="resourceBase" value="/var/www"/>
      <property name="servletHandler">
        <bean>
          <property name="servlets"> <!-- servlet definition -->
            <list>
            <!-- default servlet -->
            <bean>
              <property name="name" value="DefaultServlet"/>
              <property name="servlet">
                <bean/>
              </property>
              <property name="initParameters">
                <map>
                  <entry key="resourceBase" value="/var/www"/>
                </map>
              </property>
            </bean>
            </list>
          </property>
          <property name="servletMappings">
            <list><!-- servlet mapping -->
            <bean>
              <property name="pathSpecs">
                <list><value>/</value></list>
              </property>
              <property name="servletName" value="DefaultServlet"/>
            </bean>
            </list>
          </property>
        </bean>
      </property>
    </bean>
    <!-- log handler -->
    <bean>
      <property name="requestLog">
        <bean>
          <property name="append" value="true"/>
          <property name="filename" value="/var/log/jetty/request.log.yyyy_mm_dd"/>
          <property name="extended" value="true"/>
          <property name="retainDays" value="999"/>
          <property name="filenameDateFormat" value="yyyy-MM-dd"/>
        </bean>
      </property>
    </bean>
  </list>
</property>

And thats about it. If you need to add more servlets, then all you have to do is add an entry to ServletHandler’s properties for servlets and servletMappings.

Now, image if I had to get a reference to a DAO, or some other service in my servlet – it’s just going to be a matter of adding a member, exposing it via a setter and whacking in the dependency on the servlet Spring config above. All done in nice dependency injected way. No more overriding init() on the servlet and picking up some context attribute via some magic string. The best part of this

Here’s the whole Spring config – hack away to your needs!

<bean name="WebServer" init-method="start">
<property name="connectors">
  <list>
  <bean name="LocalSocket">
    <property name="host" value="localhost"/>
    <property name="port" value="8080"/>
  </bean>
  </list>
</property>
<property name="handlers">
  <list>
    <bean>
      <property name="contextPath" value="/"/>
      <property name="sessionHandler">
        <bean/>
      </property>
      <property name="resourceBase" value="/var/www"/>
      <property name="servletHandler">
        <bean>
          <property name="servlets"> <!-- servlet definition -->
            <list>
            <!-- default servlet -->
            <bean>
              <property name="name" value="DefaultServlet"/>
              <property name="servlet">
                <bean/>
              </property>
              <property name="initParameters">
                <map>
                  <entry key="resourceBase" value="/var/www"/>
                </map>
              </property>
            </bean>
            </list>
          </property>
          <property name="servletMappings">
            <list><!-- servlet mapping -->
            <bean>
              <property name="pathSpecs">
                <list><value>/</value></list>
              </property>
              <property name="servletName" value="DefaultServlet"/>
            </bean>
            </list>
          </property>
        </bean>
      </property>
    </bean>
    <!-- log handler -->
    <bean>
      <property name="requestLog">
        <bean>
          <property name="append" value="true"/>
          <property name="filename" value="/var/log/jetty/request.log.yyyy_mm_dd"/>
          <property name="extended" value="true"/>
          <property name="retainDays" value="999"/>
          <property name="filenameDateFormat" value="yyyy-MM-dd"/>
        </bean>
      </property>
    </bean>
  </list>
</property>
</bean>

Converting PEM certificates and private keys to JKS

If there is one irritating, arcane issue about Java, it is their SSL and Crypto framework. It is a pile of mess. I remember using openssl as a library about 3-4 years ago in a project that was pretty crypto heavy and their library can be used by any junior developer – it’s that simple to use.

However, Java’s crypto framework is just absolutely irritating to use – tons of unnecessary boiler plate, and not enough of self discovery of file formats (as an example). Try to do SSL client certificate authentication from ground up and you’ll know what I mean. Knife, wrist – sound familiar ?

Last night, I had to convert some PEM formatted certificates and private keys to JKS (was getting SSL nicely configured under Jetty). I remember doing this a few years back and there were molehillsmountains of issues to jump across and I did pull my hair out back then. Last night was no different. However, I did manage to solve it and ended up with much less hair.

So, to save everyone else the trouble (and their hair!), I’m jotting down some notes here on how to convert a certificate and private key in PEM format into Java’s keystore and truststore in JKS format.

The Keystore

If we’re starting with PEM format, we need to convert the certificate and key to a PKCS12 file. We’ll use openssl for that:

Remember to use a password for the command below, otherwise, the Jetty converter (the following step) will barf in your face!

openssl pkcs12 -export -out cert.pkcs12 \
  -in cert.pem -inkey key.pem

Once that’s done, you need to convert the pkcs12 to a JKS. Here, I will be using a small utility that comes bundled with Jetty called PKCS12Import. You can download the necessary library (you’ll need the main jetty.jar) which can be a huge download for such a small thing, or just grab the jar from here. Run the following command and use the password from the step above and your keystore password:

java -cp /path/to/jetty-6.1.7.jar \
  org.mortbay.jetty.security.PKCS12Import \
  cert.pkcs12 keystore.jks

The Truststore

Next, you’ll almost definitely need to import the certificate into your truststore whenever you need to do anything related to SSL.

First, export the certificate as a DER:

openssl x509 -in cert.pem -out cert.der -outform der

Then import it into the truststore:

keytool -importcert -alias mycert -file cert.der \
  -keystore truststore.jks \
  -storepass password

And that’s it! You have your key in the keystore, and your certificate in the truststore. Hope this helps some of you out there.