How to repair a software RAID5 volume with more than one failed disk

Author: LevShamardin

It is usually bad when a disk fails in a RAID5 volume, but sometimes things may get even worse, and a RAID5 volume may lose more than one disk. In such cases mdadm would not allow you to start the volume. Due to the internal data organization of RAID5 if there is more than one failed disk it is not possible to recover all the data stored on the volume. But in many cases it is possible to rescue most of the data from the volume, because usually only part of the disk has actual failures, and the rest of the failed disk is still readable. But to recover data from such volume you need to start the volume. This page describes the steps which would allow to start a RAID5 volume which had more than one failed disks.

1. Big fat warning

These instructions may irreversibly damage your data. It is highly recommended to make a copy of all drives in your failed RAID volume, including the failed drives, and do all recovery procedures on the copy. Please notice, that even trying to copy data from a failing disk without use of special equipment may induce more damage to the failed disk. Proceed on your own risk.

2. How to start a volume with more than one failed disk

These instructions assume two failed disks in one RAID5 volume, but they may be generalized for any number of failed disks. We will assume that the RAID volume consists of disks /dev/sda1, /dev/sdb1, /dev/sdc1, /dev/sdd1, /dev/sde1, and consider that two drives /dev/sdc and /dev/sdd have failed.

You will need at least one new working hard drive, which would be used in place of one of the failed drives.
Make a full copy of one of the failed disks to a new disk. You should choose the disk which presumably has less failed areas. This failed disk will be called 'less failed disk', and the second failed disk will be called 'more failed disk' from now on. Replace the less failed disk with a copy you've just made. Do not follow these instructions using failing disks because a volume will stop and break again as soon as the disk fails. In our case we will make a copy of disk /dev/sdc to a new one, and replace old failing /dev/sdc with a copy.
Discover the parameters of the RAID5 volume. Run mdadm -E on any of the non-failed disks:
```
mdadm -E /dev/sdb1
```
Notice the Chunk Size and Layout.
Recreate the RAID volume in degraded mode. To do this you must know the following:
- XXX - Number of disks in your RAID volume.
- YYY - Size of the chunk of your RAID volume. This is determined in previous step.
- ZZZ - the layout of the RAID volume. This is determined in previous step.
- Which of the drives is missing (that is the more failed disk, /dev/sdd1 in our case).
Execute the command like:
```
mdadm --create /dev/md0 -n XXX -c YYY -l 5 -p ZZZ --assume-clean /dev/sda1 /dev/sdb1 /dev/sdc1 missing /dev/sde1 
```
You should adapt this to the exact position of your missing drive (which is a more failed disk) and your layout of the RAID volume and replace the more failed disk with 'missing' argument.

Now you have your RAID volume up and running in degraded mode, and you may try to recover your data. Or you can replace the missing drive with a new one and rebuild the RAID since now you have no failing disks in it. Most of the data on the recreated RAID volume should be uncorrupted, and if you were lucky to have disk failures on empty filesystem areas even all of the data may be safe.

This is a read-only archived version of wiki.centos.org

How to repair a software RAID5 volume with more than one failed disk

1. Big fat warning

2. How to start a volume with more than one failed disk