How to Convert a CentOS 5 System to RAID1 Using a Rescue Disk

1. Introduction

1.1. About

This document intends to describe the process of converting a default CentOS 5 system to RAID 1. Hopefully, you'll be able to extrapolate the technique to a more complex system, since describing all possible system configurations and their conversions to all possible RAID configurations is beyond the patience of this author.

Using the rescue method allows for copying the existing filesystems in a non-live, read-only way, which is safer for the system.

1.2. Caveats

To limit the process, a few restrictions...

1.3. Prerequisites

To effectively use this document, it is suggested you have available...

1.4. Disclaimer

There is absolutely no warranty of any kind. Systems vary too much for this document to cover every possibility. System administrator beware. If your system breaks, you get to keep all of the pieces (is this copyrighted somewhere?).

2. Boot into Rescue Mode

Rescue mode is the last resort for fixing a broken system. A rescue boot method of some kind should always be available.

2.1. Boot from the Install Media

This might be a CD, DVD, USB device or PXE image. Creating the install media is beyond the scope of this document, but assumed to exist since there is a running system that was installed. If install media is not available, please check http://www.centos.org.

2.2. Choose Rescue Mode

At the Boot prompt, type

linux rescue

then press enter.

2.3. Choose Language

Select desired language using arrow keys then press enter.

2.4. Choose Keyboard Type

Select desired keyboard type using arrow keys then press enter.

2.5. Network Interfaces

The network interfaces are unnecessary. Use arrow keys to highlight No then press enter.

2.6. Skip Mounting Current System

At the Rescue screen, use the arrow keys to choose Skip then press enter. We'll be mounting them later.

2.7. Rescue Mode

Congratulations! You're in rescue mode!

Rescue mode is useful if you need to...

3. Setup New Device

At this point, the new device should already be physically present in the system and correctly connected. Generally, this was done with the system powered down.

3.1. Determine New Device Name

Assuming the new storage device is already physically in the system and connected with the appropriate data and/or power cables, the existing and new device names can usually be determined by using...

fdisk -l

The example system returns...

Disk /dev/sda: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       30394   244035382+  8e  Linux LVM

Disk /dev/sdb: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

In this case, the existing system is using /dev/sda and the new device is /dev/sdb. They could easily be /dev/hda and /dev/hdc, if they are ATA. If possible, they should be on different interfaces. With SATA and SCSI, it isn't apparent from device names which interface they are on, but, with ATA, /dev/hda and /dev/hdb would indicate they are on the same interface. This might be unavoidable, though not desirable.

3.2. Create New Partitions

Tradition dating from SunOS days suggests putting /boot on a RAID 1 device is a potential security and fault tolerance issue. Manually replicating /boot on a second spindle (and as the tradition was extended, the / partition image on Solaris) provides a safety backup, because having two /boot accessible partitions increases the chances of keeping an usable /boot available.

The argument can be made that this makes for a more robust system, as it permits an alternative boot recovery path in the case of the primary getting corrupted by an update or malicious intent, at the minimal cost (a mount, a single rsync command, and a umount) of keeping the secondary synchronized to the master. If and when boot device recovery is needed, simple edits at the grub command line specifying the alternate /root device and so forth 'can 'save the day'. Additionally the 'two /root' approach does note require local console mode implementation to put in place

Once the primary /boot has been modified and successfully tested, it can be manually synchronized from time to time. It is not at all clear that there is a 'win' by using raid to 'spread' the content over two spindles, compared to keeping two plain old /boot partitions on differing spindles for recovery alternation and robustmess against failure. This is because using a raid layer adds additional complexity and failure points. That said, some people want to have raid eveywhere for reasons that others may not agree with, and this write-up continues to discuss that raid approach.

Our goal in this article is to document a system approach that tolerates device failure using raid, rather than other approaches. We are ignoring other considerations. We are going to put /boot on RAID to keep it simple.

/!\ Note: 'to keep it simple' asserted in the prior sentence is more complex than simply carrying two plain old partitions on different physical drives, and periodically refreshing the secondary from the master via rsync, supra.

The default install creates a relatively complex setup for / and we're going to make it even more complex by adding RAID 1. The advantage with LVM is that you can move stuff around to take advantage of changing space requirements.

Using the partition utility, fdisk...

fdisk /dev/sdb

For a new partition, type n and enter. For a primary partition, type p and enter. For the first partition, type 1 and enter. To start the new partition at the beginning of the device, press enter. To limit the new partition to 100M, type +100M and enter. To change the partition type to RAID, type t and enter, then fd and enter.

For a second new partition, type n and enter. Make it a primary partition, type p and enter. Make it the second partition, type 2 and enter. To start the new partition at the next cylinder, press enter. To use the remainder of the device, press enter. To change the partition type, type t and enter. Since there are multiple partitions, we need to select 2 and press enter. Choose RAID by typing fd and enter.

To verify partition configuration, type p and enter.

To write changes, type w and enter.

The system should respond...

The number of cylinders for this disk is set to 30394.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-30394, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-30394, default 30394): +100M

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (14-30394, default 14): 
Using default value 14
Last cylinder or +size or +sizeM or +sizeK (14-30394, default 30394): 
Using default value 30394

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14       30394   244035382+  fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

For those looking for a cut and paste method...

echo -e "n\np\n1\n\n+100M\nt\nfd\nn\np\n2\n\n\nt\n2\nfd\np\nw" | fdisk /dev/sdb

3.3. Make RAID Devices

Create a RAID 1 device for /boot. The second device doesn't exist, yet, so it needs a placeholder. The second device will be added later.

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 missing

The response should be...

mdadm: array /dev/md0 started.

Create a second RAID 1 device for /.

mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdb2 missing

The response should be...

mdadm: array /dev/md1 started.

3.4. Format New /boot Device

The default install puts an ext3 partition on /boot. Labels helps the system determine where the partition should be mounted. To stick with that standard...

mkfs.ext3 -L '/boot' /dev/md0

Expected response...

mke2fs 1.39 (29-May-2006)
Filesystem label=/boot
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
26104 inodes, 104320 blocks
5216 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
13 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

3.5. Setup New / Partition

Setup /dev/md1 as a physical volume using...

lvm pvcreate /dev/md1

Create a new volume group...

lvm vgcreate RaidSys /dev/md1

Create a new swap logical volume...

lvm lvcreate --size 4G --name swap RaidSys

Create a new root logical volume...

lvm lvcreate --size 200G --name Root RaidSys

Format the new swap device...

mkswap /dev/RaidSys/swap

Format the new / device...

mkfs.ext3 -L '/' /dev/RaidSys/Root

3.6. Mount Partitions

Mount all the devices where we can find them...

mkdir /mnt/boot.old
mount -o ro /dev/sda1 /mnt/boot.old

mkdir /mnt/boot.new
mount /dev/md0 /mnt/boot.new

lvm vgchange --available y VolGroup00
mkdir /mnt/root.old
mount -o ro /dev/VolGroup00/LogVol00 /mnt/root.old

mkdir /mnt/root.new
mount /dev/RaidSys/Root /mnt/root.new

4. Move Data to New Device

4.1. Synchronize /boot

The rsync utility is supposed to preserve ownership and modification dates and, with the X flag, extended attributes. Add H flag to preserve hard links (thanks Phil). Unfortunately, it doesn't appear to preserve SELinux attributes. Synchronize the data...

rsync -avXH /mnt/boot.old/* /mnt/boot.new/

For those that prefer tar, this will create an un-compressed archive from the old /boot and un-archive it to the new device...

tar -C /mnt/boot.old -cf - . | tar -C /mnt/boot.new -xf -

4.2. Synchronize /

Again, using the rsync utility...

rsync -avXH /mnt/root.old/* /mnt/root.new/

Or tar...

tar -C /mnt/root.old -cf - . | tar -C /mnt/root.new -xf -

4.3. SELinux Relabel

To request, on boot time, SELinux attributes get relabeled, create a flag file...

touch /mnt/root.new/.autorelabel

4.4. Make New Device Bootable

There's a few maintenance tasks to complete to modify the new system to make it bootable.

4.4.1. Install grub

Make new RAID device bootable...

grub
root (hd1,0)
setup (hd1)
quit

4.4.2. Edit grub Configuration File

The root device is different, grub needs to know so it can tell the kernel. Edit /mnt/boot.new/grub/menu.lst. The install media includes joe, which will emulate nano and vi. Use whichever command set you are comfortable with. Everywhere it says /dev/VolGroup00/LogVol00 change it to /dev/RaidSys/Root. Save the changes.

4.4.3. Edit File System Table

Once the system boots back up, it is going to need to know the new root device. Edit /mnt/root.new/etc/fstab; Changing all occurances of /dev/VolGroup00/LogVol00 to /dev/RaidSys/Root. Normally, /boot is mounted by label, but we now have a /boot on each drive. Let's change LABEL=/boot to /dev/md0. Change /dev/VolGroup00/LogVol01 to /dev/RaidSys/swap. Save the changes.

4.4.4. Create New initrd

The old system didn't need RAID drivers at boot time. It does now. We need to setup the initrd with RAID drivers. This is easiest with a running system. The virtual directories proc, dev, sys and selinux can be bound to the rescue system's copies to appear like a running system.

umount /mnt/boot.new
mount /dev/md0 /mnt/root.new/boot
mount -o bind /proc /mnt/root.new/proc
mount -o bind /dev /mnt/root.new/dev
mount -o bind /sys /mnt/root.new/sys
mount -o bind /selinux /mnt/root.new/selinux

Now, enter the new system...

chroot /mnt/root.new

Determine what your current system is by looking at current images...

ls /boot/*.img

On the test system, it returns...

/boot/initrd-2.6.18-128.1.6.el5.img  /boot/initrd-2.6.18-128.el5.img

This indicates 2.6.18-128.1.6.el5 is the latest kernel version. Make a new initrd...

mkinitrd -f /boot/initrd-2.6.18-128.1.6.el5.img 2.6.18-128.1.6.el5

Go back to rescue mode...

exit

4.5. Cleanup

Return the system to the unmounted state prior to rebooting. This tells the system it is time to write any cached changes, if it hasn't already.

umount /mnt/root.new/selinux
umount /mnt/root.new/sys
umount /mnt/root.new/dev
umount /mnt/root.new/proc
umount /mnt/root.new/boot
umount /mnt/root.new
umount /mnt/boot.old
umount /mnt/root.old

5. Boot from New Device

We are ready to test our single drive RAID 1 system. We haven't made any changes to our current device, so we could ignore all the work we've done and reboot from our current device.

5.1. Move Devices

We've cleanly unmounted our storage devices so we could simply power off the system or we can shutdown the system with...

exit

which reboots the system. Since we need it off for the next step, when it has finished shutting down and is starting to boot back up, turn it off and remove power.

Move the second storage device onto the first interface. Leave the original storage device disconnected for our tests. Update BIOS configuration for hardware changes.

5.2. Test System

Without the rescue media, turn the system on. It should come up as it did before we started.

Log in, go to a command line and verify we are running on RAID devices by typing...

mount

We should get something like...

/dev/mapper/RaidSys-Root on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

The important parts are /dev/mapper/RaidSys-Root on / type ext3 (rw) and /dev/md0 on /boot type ext3 (rw). Our system is now RAID/LVM.

6. Expand RAID Partitions to Second Device

We have a working single drive RAID 1 system. The system is verified as functional and the data is verified as existing. The system is ready to expand into a complete multi-drive RAID 1 system.

Past this point, the original storage device gets modified. Be very sure that our new RAID system is working.

The option exists to expand onto the second drive while the system is live. Except for simplicity, there isn't any reason not to.

6.1. Re-install Original/Second Storage Device

Shutdown system and install the original storage device, now secondary storage device, on the secondary interface.

6.2. Boot from Rescue Media

Follow the instructions from the previous section, Boot into Rescue Mode.

6.3. Activate RAID Devices

The installation media doesn't include a default mdadm.conf file, so it must be created. An easy method is...

mdadm --examine --scan > /etc/mdadm.conf

Next, activate all discovered RAID devices with...

mdadm --assemble --scan

6.4. Convert Partitions on Second Device

We need to setup the secondary storage device so it can be paired with the other RAID device(s). Using fdisk, first delete the old partitions, then create new ones...

fdisk /dev/sdb

Delete the second partition, type d and enter, then 2 and enter.

Delete the first partition, type d and enter.

Create the /boot partition, type n for new and enter, p for primary and enter, 1 for first partition and enter, enter for starting at the beginning of the available space, then +100M and enter. Change the /boot partition type by pressing t and enter, then fd and enter.

Create the new main partition by pressing n for new and enter, p for primary and enter, 2 for second partition and enter, enter to start at the beginning of available space, then enter for using all available space. Change the partition type by pressing t and enter, 2 and enter for second partition, then fd and enter.

Verify the new table by pressing p and enter to show it.

Write the new partition table to the device by pressing w and enter.

The number of cylinders for this disk is set to 30394.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): d
Partition number (1-4): 2
Command (m for help): d
Selected partition 1

Command (m for help): n

Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-30394, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-30394, default 30394): +100M

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): n

Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (14-30394, default 14): 
Using default value 14
Last cylinder or +size or +sizeM or +sizeK (14-30394, default 30394): 
Using default value 30394

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14       30394   244035382+  fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

6.5. Expand RAID /boot Partition to Second Device

To see the current RAID status...

cat /proc/mdstat

Should return something similar to...

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sda2[0]
      244035264 blocks [2/1] [U_]
      
md0 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
      
unused devices: <none>

6.5.1. Add Second /boot RAID Partition to RAID Device

Add the first partition of the second storage device to the first RAID device using mdadm

mdadm /dev/md0 -a /dev/sdb1

6.5.2. Verify /boot RAID Device

We want to verify the second device has been added...

cat /proc/mdstat

Should return...

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sda2[0]
      244035264 blocks [2/1] [U_]
      
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
unused devices: <none>

Notice how md0 has two in-use devices [UU].

6.6. Expand RAID / Partition to Second Device

6.6.1. Add Second / RAID Partition to RAID Device

Similar to /boot...

mdadm /dev/md1 -a /dev/sdb2

6.6.2. Verify / RAID Device

Immediately after adding second device to md1...

cat /proc/mdstat

Should return something like...

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sdb2[2] sda2[0]
      244035264 blocks [2/1] [U_]
      [>....................]  recovery =  0.1% (254976/244035264) finish=63.7min speed=63744K/sec
      
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
unused devices: <none>

This is a larger partition and takes longer to sync. The estimated completion time here is 63.7min.

Once md1 is completely synchronized, cat /proc/mdstat should return...

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sdb2[1] sda2[0]
      244035264 blocks [2/2] [UU]
      
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
unused devices: <none>

Sync completion is preferred prior to rebooting.

6.7. Install grub on Second Device

Make second storage device bootable...

grub
root (hd1,0)
setup (hd1)
quit

6.8. Reboot Complete RAID 1 System

To leave rescue mode, type...

exit

When appropriate, remove access to install media.

7. Conclusion

The process should now be complete and the system should be running.

8. Further Reading

CentOS 5 DocumentationDeployment GuideChapter 4. Redundant Array of Independent Disks (RAID)

CentOS 5 DocumentationLVM Administrator's GuideChapter 2. LVM Components

man pages for mdadm, lvm


This page created by Ed Heron.

HowTos/CentOS5ConvertToRAID (last edited 2010-07-16 15:35:50 by AlanBartlett)