New XP-Dev.com (finally!) Released

Folks, there’s a new version of XP-Dev.com out. This release features a new platform (which I will blog about soon!) and some key features that everyone has been asking for profusely, namely:

Subversion imports and exports (tons, upon tons of users have asked for this)
Allowing anonymous (public repositories) checkouts
Multiuser project and task management (tons of users have asked for this)

There are some obvious bug which I will sort out in the next few days. However, under the new platform, extending and adding more features to XP-Dev.com will be a breeze (and unit tested of course)! There is a whole lineup of features coming up, and will keep everyone posted about it. These are exciting times for XP-Dev.com and we really appreciate all the support that’s been given to us.

I personally would love to see XP-Dev.com being the best agile tool out there, and we’ll get there!

If you’re a current user, give it a whirl – any feedback will be great – good and bad!

If you’re a new user – register now and see what it can do to improve your development deliveries.

Free Subversion Hosting

Many people over the past few months, have been asking the same questions over and over again about the services over at XP-Dev.com. I don’t mind answering them with the same answers, but I think it is time to put all of these questions into one place and discuss them.

Why are you offering Subversion Hosting for free ? Is it too good to be true ?

Let me set something straight:

I offer it free because I really do not believe that anyone should pay for something so simple to setup and run as Subversion.

Here is the reality: I setup Apache using mod_svn, mod_dav, mod_ssl and mod_auth_mysql once. Believe me: only once and never ever ever ever (ever!) touched it again. No, I am not kidding – only once! No tinkering needed, it just runs like Forrest Gump (no pun intended to all you Gump fans out there).

It does cost $$$ to host it, including my time to add more features to it. Disk space and bandwidth is getting cheaper. They are not free, but then again, if you average it across the number of users that I have on XP-Dev.com, the figure looks really, really small. It is a cost nonetheless, which I’ll try to cover below.

So, we’ve established it does cost money, how are you covering these costs ? Are you really rich ?

OK. I wish I was rich, but the truth is – I am not. I could claim I was rich and lie to you all, but then I would not get any glory every time I look at my monthly bank statements.

So, where does the money come from to pay for the services ? Well, at the moment, I am paying for it. But I won’t be doing this forever.

I have got a few models to generate revenue and these models will be implemented in the next few months. I can’t reveal them to the public just yet, but rest assured that the usage of Subversion and project tracking on XP-Dev.com will always remain free. This is how I started and envisaged XP-Dev.com, and that is how it will always be.

Free Subversion Hosting and Project Tracking on XP-Dev.com is a life-time guarantee.

You’re offering a free service. There’s a catch to it, right ? Are you selling our code to someone else ?

No. Nada. No catch. I am not a petty code trader. I don’t go around knocking on other peoples doors saying “PHP codez $4 per line! .. $3.50 per line! .. $3.40 per line! ..”. I could not even be the least bothered about what everyone else is coding. I have my own ideas to push forward and materialise (one of them is XP-Dev.com, there are a lot more in the pipeline).

So, your code is safe on our servers. No one else other than the ones you have permissioned are looking at your repositories. We do have backups that run every night and copied over off-site, but they are all encrypted before leaving the server.

I put all my code on XP-Dev.com. I am a consumer of my own service. I believe that anyone who offers a service should always be their own users/clients/customers. You should see your service from the customers point of view.

If someone else looked at my code and data, I’d be really worried. I respect that tremendously and try my very best to lock down the server.

What you see is what you get – WYSIWYG. There are no catches at all. Your code and data are safe. We have a “no prying eyes” and “mind your own business” policy.

OK. So it is a genuine service that is FREE with no strings attached. Then I suppose it will have to be an overloaded, slow service ?

Never! This is one of the things that come out from being a consumer of your own service. If the services do get slow, there’s going to be one really noisy, angry, verbal user – me. And I’m really scared of him.

On a serious note, I’d be disappointed with myself if the service ever comes to a unacceptable quality. At the moment it’s fast and quick and I intend on keeping it that way. If it every becomes slow, I’ll be there in front of the queue shouting.

I’m not too sure if this is a good thing, or a bad thing – I’ve only ever worked in the Front Office for Investment Banks building real-time (well, its near real-time) trading and pricing system. They are all high performance scalable systems. The systems I work on can cost a trader anywhere between $100,000 to $500,000 if latency went up a nudge above 10ms (yes, that’s milliseconds!). XP-Dev.com is a testament of my experience building & architecting these crazy systems (trust me, they are crazy!). If performance degrades, it will be a major failure on my part and I’m a really proud person :) .

It is a great service. How can I help ?

This reply is a cliche. There are a few ways you can help.

If you are not a user, register now!

If you are a user, and have any problems, queries or just want to say thank you, then please tell me, or email admin@xp-dev.com. Every single non-spam email that goes there gets a reply. If you don’t get a reply in a few hours, then it’s probably SpamAssassin acting up. You should use this form instead.

If you are a user, or not even one just yet – you can help by telling your friends, mom, dad, brothers, sisters, relatives, neighbours, cats, dogs, fish and everyone else about XP-Dev.com. Digg it, Buzz it, Reddit. Do whatever. Just keep spreading the word. I really appreciate it.

If you have any other questions or concerns, please post them as comments to this blog entry, or do contact me directly.

Python and Multi-threading

It has been a few days since Python 2.6 has been out, and the word on the street is that it’s meant to ease the transition into Python 3k. Python3k is not backwards compatible to the 2.X releases. I haven’t had much time on my hands to get down and dirty with the new 2.6 release, but have had some time to read up on it.

Most people know that Python does have a threading API that is pretty darn close to Java’s. However, the way it has been implemented is that all threads need to grab hold of the Global Interpreter Lockto ensure that only one thread at any one time can execute within the Python VM. This is to ensure that all threads have the same “view” of all variables. Apparently they tried to avoid this by making the Python VM thread safe, but it did get a terrible performance hit.

Java tends to get around this by having a rather complex memory model within the Java VM where each thread has it’s own virtual memory. That’s why you have to synchronize various sections of your code to ensure that threads see the same variable states. I highly recommend reading up Doug Lea’s article on synchronization and the Java Memory Model for anyone who wants to do very intensive multi-threaded applications in Java.

So, what are the implications of having to grab hold of the Global Interpreter Lock in Python ? The problem is that it is not TRUE multi threading. You, as the programmer and designer (you DO design your solutions first, right?), will have to  plans on when threads should just go to sleep and allow other threads to run. The VM will not do this for you, and one might say that it really is closer to a single threaded VM. From past experience, I’ve found Python’s Threads to be really useful when I’m making blocking calls (for e.g. grabbing a DB connection, blocking APIs (yuck!)), and can do something else in the background while the main thread is sleeping. You could get around this problem by using sub-processes, but there was no easy way to do it, and you had to add a lot of boiler plate code every single time. There was just no support for a clean true multi-threaded interface out of a standard installation.

Now, in Python 2.6, there’s a new package for creating sub-processes called multiprocessing. After a quick glance, it looks very similar to the threading API, BUT instead of running threads, it creates a child process which has it’s own memory and in turn does not need to share it’s Global Interpreter Lock. My own prediction is that it comes at a cost of creating a new process and memory space efficiency. However, you do end up with a TRUE multi-threaded application that really uses all the available processor cores on a multi-core CPU. Considering that RAM is getting cheaper, and processors getting more cores built into them, I think this is a fair trade off.

As always, and this applies to Java as well – writing a true multi-threaded application is not trivial, and always do your homework before you get started! In the past, I always had to fallback to Java for the more intensive applications that I wrote because I always thought creating sub-processes in Python was too tedious. From now on, I have no excuses! The new package in Python 2.6 looks very neat and removes the need to write tons of boiler plate.

Over the top

Telford and Wrekin Council have decided to interrogate any adult visitors to their Telford Town Park who are not accompanied by children, to “safe-guard” children. There will be a backlash to this, and the council will end up paying for it – there will be tangible and non-tangible costs associated to enforcing such a silly policy.

While I am all up for protecting children from the less sociable elements in society, I strongly feel that there are the right ways to implement them that does not have a drastic effect on the general public. I feel sorry for the folks who frequent the park for a quick stroll, jog or just to let out some steam: the council tax that they have dutifully paid for has now has kept them out of their own park.

I just wonder whether the council will be able to keep their 4 stars the next time the auditors come around.

One down for EBay

Apparently the French courts have ruled against EBay to the tune of £30m in a bizarre case (at least from my point of view).

LVMH (who own a number of designer brands from Christian Dior to Louis Vuitton) sued EBay for allowing EBay’s users to sell counterfeit designer goods. I think this is just ridiculous. EBay are in the business of providing a service to perform online auctions, and really should not have to take blame for what their users sell. LVMH really should have gone out for the counterfeiters themselves.

The concept is pretty simple: there are many companies out there that provide a “service” for their end consumers, and it’s their consumers who actually make the conscience decision to utilise it. If someone doesn’t like it, the service provider really should not be blamed here. The end consumer is at fault. The same argument can be applied when an organisation like RIAA sues an ISP – it’s not the ISP who downloaded all those songs, its the users!

So, what’s next ? A person who just had his house robbed sues the car company that the robbers used as a getaway vehicle ?

Ext3 – handling large number of files in a directory

If you’ve used Linux in the past, I am pretty sure that you’ve heard of the Ext3 file system. It is one of the most common file system format out there, used mainly on Linux based systems.

I’ve noticed something really annoying about how it handles large number of files in a single directory. Essentially, I have a directory with almost a million files and I found that creating a new file in this directory took ages (in the region of tens of seconds), which is not ideal at all for my purpose.

After some reading, and much research, I learnt that Ext3 stores directory indices in a flat table, and this causes much of the headache when a directory has many files in a directory. There are a couple of options.

One, restructure the directory so that it does not contain that many files. I did some tests, and in a default (untuned) Ext3 partition, each subsequent write degrades horribly past the 2000 file limit. So, keeping the items in a directory to within 2000 files should be fine.

Second, is to enable the dir_index option on the Ext3 file system. Run the following as root and you should find that it improves a lot. Do note that the indexing will take up much more space, but then hard disk space is not too expensive nowadays:

$ sudo tune2fs -O dir_index /dev/hda1

Finally, just use something like ReiserFS which stores directory contents in a balanced tree, which is pretty darn fast and you don’t have to muck around tweaking things.

If you’ve got your main partition as an Ext3, and can’t really afford to reformat it into ReiserFS, there might be an alternative: create a blank file and format that as a ReiserFS file system and mount it using loopback.

So, lets create the file first. This depends on how much data you need to handle, and in this example, I’ll just create a ~100MB file full of zeros:

$ dd if=/dev/zero of=reiser.img bs=1k count=100000

Next, format the file using ReiserFS as below. It will complain about the file ‘reiser.img’ not being a special block device (and we know that!). Just say yes and carry on.

$ mkreiserfs -f reiser.img

Finally, mount it where you would like to read/write files into it (need to do this as root):

$ sudo mount -t reiserfs -o loop reiser.img /tmp/listdir

You might need to do some chown so that your normal user can write into it. Moreover, if you need it to startup during boot, do remember to put it in /etc/fstab !

FYI, I used a Python script below to see how long it took to write new files:

import os
import time

count = 1000000
total = 0.0
for i in range(count):
	if i % 1000 == 0:
		print 'Creating %i' % i
	start = time.time()
	open('/tmp/listdir/%s' % i, 'w').close()
	total += (time.time() - start)
print 'Avg is %0.8f' % (total / count)