Optimizing the EXT3 file system on CentOS

Ext3 is a very capable file system with excellent fault tolerance and a long track record of stability. While it performs well, it's by no means the fastest file system out there. There are some things you can do to give ext3 a boost when you just want speed.

Some of the methods listed here will reduce the information kept about your file system as a trade-off for speed. Not all users will see gains from these methods, as it really does depend on the type of I/O access you have. Please take some time to identify your I/O requirements before trying these optimization methods

Mount Options

noatime

This is one of the quickest and easiest performane gains. This mount option tells the system not to update inode access times. This is a good option for web servers, news servers or other uses with high access file systems. Example:

/dev/VolGroup00/LogVol00 / ext3 defaults,noatime 1 1

commit

This file system option controls how often the file system is told to sync data and metadata. The default value is 5 seconds, but you can extend this for a performance gain. The downside is that if your system loses power or crashes without writing out data, you could lose up that time value's worth of data. The values you set here are entirely up to you based on the performance of your system.

/dev/VolGroup00/LogVol00 / ext3 defaults,commit=120 1 1

data

This one has 3 separate options for you to choose from. When other journaled filesystems like XFS and JFS write metadata to the disk, they do just that. Ext3 goes the extra mile to protect your files, and writes the data associated with that metadata by default. This is basically the idea behind the 'data=ordered' method, which writes to the main file system before committing to the journal.

To make ext3 behave like XFS and other file systems, set 'data=writeback' in your mount options. The writeback mode does not preserve data ordering when writing to the disk, so commits to the journal may happen before the file system is written to. This method is faster because only the metadata is journaled, but is not quite as neurotic about protecting your data as the default.

The last data option, journal, is pretty much the polar opposite of the ordered option, forcing the data to write to the journal first, and then to the file system. This mode is usually the slowest, but can outperform the other options in limited cases where you need to read from AND write to the disk at the same time. As always, other people don't have exactly the same needs you do, so their benchmarks are a guide, not a rule. Play around and see which options work best for you.

/dev/VolGroup00/LogVol00 / ext3 defaults,data=writeback 1 1


Disk Elevators

CentOS4 has 4 disk elevators, which are there to minimize head seek by re-ordering and merging requests to read or write data from common areas of the disk. These options offer performance increases, but speed boosts may not be as pronounced on systems using RAID, as they do not take spindle striping into account.

A good explanation of Elevator options can be found in the June 2005 Redhat Magazine

Raid Math

The biggest performance gain you can achieve on a raid array is to make sure you format the volume aligned to your raid stripe size. This is referred to as the stride. By setting up the file system in such a way that the writes match the raid layout, you avoid overlap calculations and adjustments on the file system, and make it easier for the system to write out to the disk. The net result is that your system is able to write things faster, and you get better performance. To understand how the stride math actually works, you need to know a couple things about the raid setup you're using.

The drive calculation works like this: You take the number of disks and multiply it by the chunk size of the raid array. This gives you your stripe size. Then you take the stripe size, and divide it by the number of blocks in the filesystem. This gives you the stride value to use when formating the volume. This can be a little complex, so some examples are listed below.

For example if you have a 4 drive raid 5 and it is using 64K chunks, your stripe size will be 256K. Given a 4K filesystem block size you would then have a stride of 64 (256/4). If it was 4 disk RAID0 array, than it would be 64(4x64k/4k=64). If it was 4 disk RAID10 array, than it would be 32 ((4/2)*64k/4k=32)

When you create an ext3 partition in this manner, you would format it like this

mkfs.ext3 -E stride=64 -O dir_index /dev/XXXX

The dir_index listed above is the last tweak mentioned here. The dir_index option allows ext3 to use hashed b-trees to speed up lookups in large directories. It's not a big gain, but it will help.

HowTos/Disk Optimization (last edited 2007-03-22 14:27:40 by JimPerrin)