I had the following setup: 3 HDDs 10Tb of size, in LVM Raid5 configuration On top a LUKS2 encryption and inside a BTRFS filesystem.
Since my storage got low i added another 16TB HDD (was cheaper than 10TB) added it as physcial volume in LVM, added it to the volume group, ran a resync, so that LVM can adjust the size of my RAID. I resized the btrfs partition to max.
I noticed, that in dmesg errors began to appear shortly after the btrfs resize when i write to it:
[53034.840728] btrfs_dev_stat_print_on_error: 299 callbacks suppressed
[53034.840731] BTRFS error (device dm-15): bdev /dev/mapper/data errs: wr 807, rd 0, flush 0, corrupt 0, gen 0
[53034.841289] BTRFS error (device dm-15): bdev /dev/mapper/data errs: wr 808, rd 0, flush 0, corrupt 0, gen 0
[53034.844993] BTRFS error (device dm-15): bdev /dev/mapper/data errs: wr 809, rd 0, flush 0, corrupt 0, gen 0
[53034.845893] BTRFS error (device dm-15): bdev /dev/mapper/data errs: wr 810, rd 0, flush 0, corrupt 0, gen 0
[53034.846154] BTRFS error (device dm-15): bdev /dev/mapper/data errs: wr 811, rd 0, flush 0, corrupt 0, gen 0
I can exclude hardware problems, since i tried that on another computer in a virtual machine. The problems in dmesg do appear when i write bigger files (400Mb) to the filesystem, but not something like a text file - the checksum is also wrong after a copy from one file of the raid to another:
gallifrey raid5 # dd if=/dev/urandom of=original.img bs=40M count=100
0+100 records in
0+100 records out
3355443100 bytes (3.4 GB, 3.1 GiB) copied, 54.0163 s, 62.1 MB/s
gallifrey raid5 # cp original.img copy.img
gallifrey raid5 # md5sum original.img copy.img
29867131c09cc5a6e8958b2eba5db4c9 original.img
59511b99494dd4f7cf1432b19f4548c4 copy.img
gallifrey raid5 # btrfs device stats /mnt/raid5
[/dev/mapper/data].write_io_errs 811
[/dev/mapper/data].read_io_errs 0
[/dev/mapper/data].flush_io_errs 0
[/dev/mapper/data].corruption_errs 0
[/dev/mapper/data].generation_errs 0
I already resynced the entire lvm raid, did a smartctl checkup multiple times (shouldn't be a hw problem, but still) and did btrfs scrub start -B /mnt/raid5 and btrfs check -p --force /dev/mapper/data while non of them returned any error whatsoever.
Hapened on kernel 5.15.11 and 5.10.27
lvm version:
gallifrey raid5 # lvm version
LVM version: 2.02.188(2) (2021-05-07)
Library version: 1.02.172 (2021-05-07)
Driver version: 4.45.0
My goal is that future writes to the drive are non-corrupted, while the already corrupted files can be deleted, but the good files I would like to save or at least not delete.
From the man page of btrfs it says, that write_io_errs means that the block device beneath does not succeed in writing. In my case that means, that lvm and or luks2 is the problem here.
Any suggestions, or any more information needed?
Cheers