Print this article Edit this article
Replacing failed LVM RAID devices
This KB article is to show how to replace a failed disk (maybe NVMe) on a system using LVM mirroring.
The device that has failed may show up as missing, or failed. An example of a missing device:
root@host# pvscan
WARNING: Device for PV 7UX2U1-FI4a-6qvq-3jYe-hAx9-R1Ly-1757mG not found or rejected by a filter.
Couldn't find device with uuid 7UX2U1-FI4a-6qvq-3jYe-hAx9-R1Ly-1757mG.
PV [unknown] VG nvmevg lvm2 [2.91 TiB / 2.91 TiB free]
Pull the device and replace it. For NVME devices on a Dell system, I had to do this:
- Find the device that failed in dmesg. It should list the PCI bus ID in hex. You may have to find the working devices, and deduce which one is missing. In this example the bus ID is 3E.
[4854236.865225] nvme nvme1: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[4854236.887356] nvme 0000:3e:00.0: irq 144 for MSI/MSI-X
[4854297.736360] nvme nvme1: I/O 18 QID 0 timeout, disable controller
[4854297.750052] nvme nvme1: Identify Controller failed (7)
[4854297.750121] nvme nvme1: Removing after probe failure status: -5
[4854297.765763] nvme1n1: detected capacity change from 3200631791616 to 0
[4854297.765813] blk_update_request: I/O error, dev nvme1n1, sector 6227709184 - Use idracadm to find the slot matching the decimal PCI bus ID. In this example, the bus is 3E hex, or 62 decimal, and it is in Drive Bay 23.
root@host# /opt/dell/srvadmin/bin/idracadm7 hwinventory|less
...
[InstanceID: Disk.Bay.23:Enclosure.Internal.0-1:PCIeExtender.Slot.1]
Device Type = PCIDevice
BusNumber = 62
DataBusWidth = 4x or x4
Description = Express Flash NVMe 3.2TB 2.5" U.2 (P4600)
DeviceDescription = PCIe SSD in Slot 23 in Bay 1
...
Next, rebuild the LVM raid:
- Remove the failed device if it's still there. In this example, the device is named "nvme1n1", the working devices is "nvme0n1", and the volume group is "nvmevg".
root@host# vgreduce --removemissing nvmevg
- Make the new device a PV using pvcreate
root@host# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created.
root@host# pvscan
PV /dev/nvme0n1 VG nvmevg lvm2 [2.91 TiB / 11.21 GiB free]
PV /dev/nvme1n1 lvm2 [2.91 TiB] - Extend the VG onto the PV using vgextend
root@host# vgextend nvmevg /dev/nvme1n1
Volume group "nvmevg" successfully extended - List the logical volumes on the volume group
root@host# lvdisplay nvmevg
--- Logical volume ---
LV Path /dev/nvmevg/homea
LV Name homea
VG Name nvmevg
... - Change each logical volume to be RAID-1 on both the old (nvme0n1) and new device (nvme1n1). Answer "Y" to the "are you sure?"
root@host# lvconvert -m 1 /dev/nvmevg/homea /dev/nvme0n1 /dev/nvme1n1
Are you sure you want to convert linear LV nvmevg/homea to raid1 with 2 images enhancing resilience? [y/n]: y
WARNING: Monitoring nvmevg/homea failed.
Logical volume nvmevg/homea successfully converted. - Verify that the disk is now there with lsblk:
root@host# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 2.9T 0 disk
├─nvmevg-homea_rmeta_0 253:3 0 4M 0 lvm
│ └─nvmevg-homea 253:6 0 2.9T 0 lvm /home/ecegrid/a
└─nvmevg-homea_rimage_0 253:4 0 2.9T 0 lvm
└─nvmevg-homea 253:6 0 2.9T 0 lvm /home/ecegrid/a
nvme1n1 259:0 0 2.9T 0 disk
├─nvmevg-homea_rmeta_1 253:5 0 4M 0 lvm
│ └─nvmevg-homea 253:6 0 2.9T 0 lvm /home/ecegrid/a
└─nvmevg-homea_rimage_1 253:8 0 2.9T 0 lvm
└─nvmevg-homea 253:6 0 2.9T 0 lvm /home/ecegrid/a
You have now replaced a disk! Congratulations!
Last Modified:
May 17, 2023 4:07 pm GMT-4
Created:
May 17, 2023 4:06 pm GMT-4
by
finnegpt
JumpURL: None Found