Bug #19385
closedrbd stuck on read/write op to ceph_osd
0%
Description
Sometimes, after massive osd restart (server with shelf) rbd image (maped and mounted xfs) stuck on read or write operations:
cat /sys/kernel/debug/ceph/*/osdc
90611073 osd1455 7.bb66db73 rbd_data.4ded385238e1f29.0000000000906ea3 read
it's no change in time.
In monc:
have osdmap 499581
want next osdmap
...
this osdmap 499581 is last because ceph mon reports osdmap e499581: 2304 osds: 2304 up, 2304 in
Note: If i restart osd1455 the problem goes away and we working normaly
Updated by Ilya Dryomov about 7 years ago
There are known issues with resending code in older kernels and manually restarting OSDs is indeed one of the workarounds. All of those bugs should be fixed in kernels 4.7 and later. The patches were too extensive to be backported, so I'd recommend upgrading to 4.9 (latest longterm kernel).
Updated by Ilya Dryomov about 7 years ago
- Status changed from New to Need More Info
- Priority changed from Normal to Low
Let me know if you can reproduce this on 4.9.z.
Updated by Sergey Jerusalimov over 6 years ago
I see some strange (kern 4.9) in osdc (/sys/kernel/debug/ceph/)
REQUESTS 0 homeless 0
LINGER REQUESTS
18446462598732840963 osd214 7.3d786535 [214,1136,1358]/214 [214,1136,1358]/214 rbd_header.4dec4d3238e1f29 0x24 0 WC/0
18446462598732840962 osd516 7.8f5c1cf6 [516,1259,749]/516 [516,1259,749]/516 rbd_header.4decffe238e1f29 0x24 0 WC/0
18446462598732840961 osd2025 7.c5fc8f85 [2025,260,1099]/2025 [2025,260,1099]/2025 rbd_header.4dec4ce238e1f29 0x24 0 WC/0
but all works normal
It's fake ?
Updated by Ilya Dryomov over 6 years ago
These are so called watch requests, one per mapping. They appear when you map an image and disappear when you unmap it.
Updated by Sergey Jerusalimov over 6 years ago
Ilya, Hi.
I think, maybe ,i found some bug in kernel rbd (ver 4.9.43).
At moment, when few osd's receive up/down, i get error on client side (mounted rbd):
[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0
[2607280.364881] XFS (rbd1): Unmount and run xfs_repair
[2607280.364897] XFS (rbd1): First 64 bytes of corrupted metadata buffer:
[2607280.364917] ffff881ac70c7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364940] ffff881ac70c7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364963] ffff881ac70c7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364985] ffff881ac70c7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.365012] XFS (rbd1): metadata I/O error: block 0x17fff93ff0 ("xfs_trans_read_buf_map") error 117 numblks 8
[2607280.365038] XFS (rbd1): xfs_do_force_shutdown(0x1) called from line 315 of file fs/xfs/xfs_trans_buf.c. Return address = 0xffffffffa07a008e
[2607280.388672] XFS (rbd1): I/O Error Detected. Shutting down filesystem
[2607280.388693] XFS (rbd1): Please umount the filesystem and rectify the problem(s)
[2607280.388723] Buffer I/O error on dev rbd1, logical block 13078470483, lost async page write
[2607280.388747] Buffer I/O error on dev rbd1, logical block 13078470484, lost async page write
[2607280.388782] Buffer I/O error on dev rbd1, logical block 13078470485, lost async page write
[2607280.388814] Buffer I/O error on dev rbd1, logical block 13078470486, lost async page write
[2607280.388836] Buffer I/O error on dev rbd1, logical block 13078470487, lost async page write
[2607280.388857] Buffer I/O error on dev rbd1, logical block 13078470488, lost async page write
[2607280.388879] Buffer I/O error on dev rbd1, logical block 13078470489, lost async page write
[2607280.388901] Buffer I/O error on dev rbd1, logical block 13078470490, lost async page write
[2607280.388922] Buffer I/O error on dev rbd1, logical block 13078470491, lost async page write
[2607280.388944] Buffer I/O error on dev rbd1, logical block 13078470492, lost async page write
I did remount rbd, and the problem was solved.
Updated by Ilya Dryomov over 6 years ago
Hi Sergey,
Which version of ceph?
What precedes "[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0" in the kernel log (at least 100 lines)?
Any matches for "misdirected" in the cluster or OSD logs?
Updated by Sergey Jerusalimov over 6 years ago
Ilya Dryomov wrote:
Hi Sergey,
Which version of ceph?
What precedes "[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0" in the kernel log (at least 100 lines)?
Any matches for "misdirected" in the cluster or OSD logs?
Ceph version 10.2.9
I grep osd logs, when this event happened (zcat /var/log/ceph/ceph-osd.*.log.1.gz|grep -i misdirected), and no entry found
last ~100 lines:
[1797261.005743] libceph: osd2301 up
[1797261.005743] libceph: osd2303 up
[1797266.830113] libceph: osd405 up
[1797266.830115] libceph: osd570 up
[1797338.171855] libceph: osd1137 down
[1797343.302111] libceph: osd1137 up
[1797373.361325] libceph: osd2043 down
[1797379.213712] libceph: osd2043 up
[1797834.314543] INFO: task xfsaild/rbd2:2368 blocked for more than 300 seconds.
[1797834.314575] Not tainted 4.9.43-11 #1
[1797834.314588] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.314611] xfsaild/rbd2 D 0 2368 2 0x00000000
[1797834.314613] Call Trace:
[1797834.314616] ---[ now 2017-09-08 13:18:55+03 ]---
[1797834.318400] INFO: task kworker/u66:0:475176 blocked for more than 300 seconds.
[1797834.318434] Not tainted 4.9.43-11 #1
[1797834.318454] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.318487] kworker/u66:0 D 0 475176 2 0x00000000
[1797834.318503] Workqueue: writeback wb_workfn (flush-251:32)
[1797834.318516] Call Trace:
[1797834.318519] ---[ now 2017-09-08 13:18:55+03 ]---
[1797834.318529] INFO: task kworker/2:7:281782 blocked for more than 300 seconds.
[1797834.318562] Not tainted 4.9.43-11 #1
[1797834.318575] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.318596] kworker/2:7 D 0 281782 2 0x00000000
[1797834.318648] Workqueue: xfs-cil/rbd2 xlog_cil_push_work [xfs]
[1797834.318649] Call Trace:
[1797834.318650] ---[ now 2017-09-08 13:18:55+03 ]---
[1797834.318657] INFO: task kworker/2:2:327901 blocked for more than 300 seconds.
[1797834.318677] Not tainted 4.9.43-11 #1
[1797834.318690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.318711] kworker/2:2 D 0 327901 2 0x00000000
[1797834.318733] Workqueue: xfs-sync/rbd2 xfs_log_worker [xfs]
[1797834.318734] Call Trace:
[1797834.318735] ---[ now 2017-09-08 13:18:55+03 ]---
[1798145.630858] INFO: task xfsaild/rbd2:2368 blocked for more than 300 seconds.
[1798145.630888] Not tainted 4.9.43-11 #1
[1798145.630900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.630922] xfsaild/rbd2 D 0 2368 2 0x00000000
[1798145.630925] Call Trace:
[1798145.630927] ---[ now 2017-09-08 13:24:06+03 ]---
[1798145.631531] INFO: task kworker/u66:0:475176 blocked for more than 300 seconds.
[1798145.631552] Not tainted 4.9.43-11 #1
[1798145.631564] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.631585] kworker/u66:0 D 0 475176 2 0x00000000
[1798145.631592] Workqueue: writeback wb_workfn (flush-251:32)
[1798145.631593] Call Trace:
[1798145.631606] ---[ now 2017-09-08 13:24:06+03 ]---
[1798145.631615] INFO: task kworker/2:7:281782 blocked for more than 300 seconds.
[1798145.631642] Not tainted 4.9.43-11 #1
[1798145.631654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.631674] kworker/2:7 D 0 281782 2 0x00000000
[1798145.631735] Workqueue: xfs-cil/rbd2 xlog_cil_push_work [xfs]
[1798145.631736] Call Trace:
[1798145.631737] ---[ now 2017-09-08 13:24:06+03 ]---
[1798145.631744] INFO: task kworker/2:2:327901 blocked for more than 300 seconds.
[1798145.631763] Not tainted 4.9.43-11 #1
[1798145.631775] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.631795] kworker/2:2 D 0 327901 2 0x00000000
[1798145.631816] Workqueue: xfs-sync/rbd2 xfs_log_worker [xfs]
[1798145.631816] Call Trace:
[1798145.631817] ---[ now 2017-09-08 13:24:06+03 ]---
[1859420.360623] Process accounting resumed
[1919221.056578] init: atop main process (438154) terminated with status 15
[1945838.550847] Process accounting resumed
[1963013.589650] libceph: osd331 down
[1963340.964016] libceph: osd331 weight 0x0 (out)
[1974457.437649] libceph: osd331 does not exist
[1974548.082977] libceph: osd331 weight 0x10000 (in)
[1974548.082978] libceph: osd331 up
[2032247.410336] Process accounting resumed
[2118645.018933] Process accounting resumed
[2205035.944405] Process accounting resumed
[2212525.304262] libceph: osd984 down
[2212842.890254] libceph: osd984 weight 0x0 (out)
[2228479.105051] libceph: osd984 does not exist
[2228772.920789] libceph: osd984 weight 0x10000 (in)
[2228772.920790] libceph: osd984 up
[2325988.047460] init: atop main process (192190) terminated with status 15
[2325988.091022] Process accounting resumed
[2377845.778386] Process accounting resumed
[2391246.256656] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state OPEN)
[2391255.414440] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391255.713959] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391256.706088] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391258.690148] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391260.384495] libceph: osd336 down
[2391262.509868] libceph: osd336 up
[2464236.904167] Process accounting resumed
[2550678.256678] Process accounting resumed
[2587593.503805] libceph: osd308 down
[2587917.626492] libceph: osd308 weight 0x0 (out)
[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0
[2607280.364881] XFS (rbd1): Unmount and run xfs_repair
[2607280.364897] XFS (rbd1): First 64 bytes of corrupted metadata buffer:
[2607280.364917] ffff881ac70c7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364940] ffff881ac70c7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364963] ffff881ac70c7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364985] ffff881ac70c7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.365012] XFS (rbd1): metadata I/O error: block 0x17fff93ff0 ("xfs_trans_read_buf_map") error 117 numblks 8
[2607280.365038] XFS (rbd1): xfs_do_force_shutdown(0x1) called from line 315 of file fs/xfs/xfs_trans_buf.c. Return address = 0xffffffffa07a008e
[2607280.388672] XFS (rbd1): I/O Error Detected. Shutting down filesystem
[2607280.388693] XFS (rbd1): Please umount the filesystem and rectify the problem(s)
[2607280.388723] Buffer I/O error on dev rbd1, logical block 13078470483, lost async page write
[2607280.388747] Buffer I/O error on dev rbd1, logical block 13078470484, lost async page write
[2607280.388782] Buffer I/O error on dev rbd1, logical block 13078470485, lost async page write
[2607280.388814] Buffer I/O error on dev rbd1, logical block 13078470486, lost async page write
[2607280.388836] Buffer I/O error on dev rbd1, logical block 13078470487, lost async page write
[2607280.388857] Buffer I/O error on dev rbd1, logical block 13078470488, lost async page write
[2607280.388879] Buffer I/O error on dev rbd1, logical block 13078470489, lost async page write
[2607280.388901] Buffer I/O error on dev rbd1, logical block 13078470490, lost async page write
[2607280.388922] Buffer I/O error on dev rbd1, logical block 13078470491, lost async page write
[2607280.388944] Buffer I/O error on dev rbd1, logical block 13078470492, lost async page write
[2607393.391976] libceph: osd1651 down
[2607404.653040] libceph: osd1640 down
[2607438.638985] libceph: osd1465 down
[2607445.615269] libceph: osd1343 down
[2608145.709479] libceph: osd1651 up
[2608212.492767] libceph: osd1343 up
[2608376.422946] libceph: osd1465 up
[2608630.041840] libceph: osd1640 up
[2610304.474025] VFS: umount (789497) detach xfs /dev/rbd1 (47) / (16384) parent ext4 /dev/md2 (21) /ceph_rbd_test_prod (1179653)
[2610304.474028] VFS: umount (789497) detach xfs /dev/rbd1 (47) / (16384) parent ext4 /dev/md2 (21) /ceph_rbd_test_prod (1179653)
[2610304.655681] XFS (rbd1): Unmounting Filesystem
[2610306.723927] XFS (rbd1): Mounting V4 Filesystem
[2610307.830989] XFS (rbd1): Starting recovery (logdev: internal)
[2610315.253023] XFS (rbd1): Ending recovery (logdev: internal)
[2610315.253045] VFS: mount (789498) attach xfs /dev/rbd1 (47) / (16384) parent ext4 /dev/md2 (21) /ceph_rbd_test_prod (1179653)
[2637077.777978] Process accounting resumed
May be XFS bug in 4.9 kernel
Updated by Ilya Dryomov over 6 years ago
What about the cluster log, e.g. /var/log/ceph/ceph.log?
Updated by Sergey Jerusalimov over 6 years ago
Ilya Dryomov wrote:
What about the cluster log, e.g. /var/log/ceph/ceph.log?
No "misdirected" records found in /var/log/ceph/ceph.1.log too
Updated by Ilya Dryomov over 6 years ago
What about other, not .1, log files?
Are you sure that what you are grepping over covers the time of "[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0"?
Updated by Sergey Jerusalimov over 6 years ago
yes, try all logs
zcat /var/log/ceph/ceph-osd.*.log.*.gz|grep -i misdirected
cat /var/log/ceph/ceph-osd.*.log|grep -i misdirected
zcat /var/log/ceph/ceph.log.*.gz|grep -i misdirected
cat /var/log/ceph/ceph.log|grep -i misdirected
Updated by Sergey Jerusalimov over 6 years ago
I found many fixes in XFS
But not sure, that it affect 4.9 kern
Updated by Ilya Dryomov over 6 years ago
At first glance, it doesn't seem to be related to OSDs going up/down. No evidence of this being a ceph bug either.
If it's reproducible, perhaps try a newer kernel?
Updated by Sergey Jerusalimov over 6 years ago
Ok, I'll try to reproduce on test cluster with 4.9 kernel
Updated by Sergey Jerusalimov over 6 years ago
all xfs fixes now ported to 4.9 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=v4.9.51
i think it's resolve the problem
Updated by Sergey Jerusalimov over 6 years ago
the problem is not reproduced
I think ticket may be closed
Updated by Ilya Dryomov over 6 years ago
- Status changed from Need More Info to Closed