Ext3 – handling large number of files in a directory

If you’ve used Linux in the past, I am pretty sure that you’ve heard of the Ext3 file system. It is one of the most common file system format out there, used mainly on Linux based systems.

I’ve noticed something really annoying about how it handles large number of files in a single directory. Essentially, I have a directory with almost a million files and I found that creating a new file in this directory took ages (in the region of tens of seconds), which is not ideal at all for my purpose.

After some reading, and much research, I learnt that Ext3 stores directory indices in a flat table, and this causes much of the headache when a directory has many files in a directory. There are a couple of options.

One, restructure the directory so that it does not contain that many files. I did some tests, and in a default (untuned) Ext3 partition, each subsequent write degrades horribly past the 2000 file limit. So, keeping the items in a directory to within 2000 files should be fine.

Second, is to enable the dir_index option on the Ext3 file system. Run the following as root and you should find that it improves a lot. Do note that the indexing will take up much more space, but then hard disk space is not too expensive nowadays:

$ sudo tune2fs -O dir_index /dev/hda1

Finally, just use something like ReiserFS which stores directory contents in a balanced tree, which is pretty darn fast and you don’t have to muck around tweaking things.

If you’ve got your main partition as an Ext3, and can’t really afford to reformat it into ReiserFS, there might be an alternative: create a blank file and format that as a ReiserFS file system and mount it using loopback.

So, lets create the file first. This depends on how much data you need to handle, and in this example, I’ll just create a ~100MB file full of zeros:

$ dd if=/dev/zero of=reiser.img bs=1k count=100000

Next, format the file using ReiserFS as below. It will complain about the file ‘reiser.img’ not being a special block device (and we know that!). Just say yes and carry on.

$ mkreiserfs -f reiser.img

Finally, mount it where you would like to read/write files into it (need to do this as root):

$ sudo mount -t reiserfs -o loop reiser.img /tmp/listdir

You might need to do some chown so that your normal user can write into it. Moreover, if you need it to startup during boot, do remember to put it in /etc/fstab !

FYI, I used a Python script below to see how long it took to write new files:

import os
import time

count = 1000000
total = 0.0
for i in range(count):
	if i % 1000 == 0:
		print 'Creating %i' % i
	start = time.time()
	open('/tmp/listdir/%s' % i, 'w').close()
	total += (time.time() - start)
print 'Avg is %0.8f' % (total / count)