Bug #19385: rbd stuck on read/write op to ceph_osd - Linux kernel client - Ceph

Actions

Copy link

Bug #19385

closed

rbd stuck on read/write op to ceph_osd

Added by Sergey Jerusalimov about 7 years ago. Updated over 6 years ago.

Status:

Closed

Priority:

Low

Assignee:

Ilya Dryomov

Category:

libceph

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

Sometimes, after massive osd restart (server with shelf) rbd image (maped and mounted xfs) stuck on read or write operations:

cat /sys/kernel/debug/ceph/*/osdc
90611073 osd1455 7.bb66db73 rbd_data.4ded385238e1f29.0000000000906ea3 read

it's no change in time.

In monc:

have osdmap 499581
want next osdmap
...

this osdmap 499581 is last because ceph mon reports osdmap e499581: 2304 osds: 2304 up, 2304 in

Note: If i restart osd1455 the problem goes away and we working normaly

Actions

Copy link

Updated by Ilya Dryomov about 7 years ago

Assignee set to Ilya Dryomov

Which kernel is this on?

Actions

Copy link

Updated by Sergey Jerusalimov about 7 years ago

Ilya, Hi, it's 3.18.43-40

Actions

Copy link

Updated by Ilya Dryomov about 7 years ago

There are known issues with resending code in older kernels and manually restarting OSDs is indeed one of the workarounds. All of those bugs should be fixed in kernels 4.7 and later. The patches were too extensive to be backported, so I'd recommend upgrading to 4.9 (latest longterm kernel).

Actions

Copy link

Updated by Sergey Jerusalimov about 7 years ago

Ilya, thanks.
I try it

Actions

Copy link

Updated by Ilya Dryomov about 7 years ago

Status changed from New to Need More Info
Priority changed from Normal to Low

Let me know if you can reproduce this on 4.9.z.

Actions

Copy link

Updated by Sergey Jerusalimov over 6 years ago

I see some strange (kern 4.9) in osdc (/sys/kernel/debug/ceph/)

REQUESTS 0 homeless 0
LINGER REQUESTS
18446462598732840963 osd214 7.3d786535 [214,1136,1358]/214 [214,1136,1358]/214 rbd_header.4dec4d3238e1f29 0x24 0 WC/0
18446462598732840962 osd516 7.8f5c1cf6 [516,1259,749]/516 [516,1259,749]/516 rbd_header.4decffe238e1f29 0x24 0 WC/0
18446462598732840961 osd2025 7.c5fc8f85 [2025,260,1099]/2025 [2025,260,1099]/2025 rbd_header.4dec4ce238e1f29 0x24 0 WC/0

but all works normal

It's fake ?

Actions

Copy link

Updated by Sergey Jerusalimov over 6 years ago

It's monitor requests ?

Actions

Copy link

Updated by Ilya Dryomov over 6 years ago

These are so called watch requests, one per mapping. They appear when you map an image and disappear when you unmap it.

Actions

Copy link

Updated by Sergey Jerusalimov over 6 years ago

Ilya, Hi.

I think, maybe ,i found some bug in kernel rbd (ver 4.9.43).

At moment, when few osd's receive up/down, i get error on client side (mounted rbd):

[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0
[2607280.364881] XFS (rbd1): Unmount and run xfs_repair
[2607280.364897] XFS (rbd1): First 64 bytes of corrupted metadata buffer:
[2607280.364917] ffff881ac70c7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364940] ffff881ac70c7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364963] ffff881ac70c7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364985] ffff881ac70c7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.365012] XFS (rbd1): metadata I/O error: block 0x17fff93ff0 ("xfs_trans_read_buf_map") error 117 numblks 8
[2607280.365038] XFS (rbd1): xfs_do_force_shutdown(0x1) called from line 315 of file fs/xfs/xfs_trans_buf.c. Return address = 0xffffffffa07a008e
[2607280.388672] XFS (rbd1): I/O Error Detected. Shutting down filesystem
[2607280.388693] XFS (rbd1): Please umount the filesystem and rectify the problem(s)
[2607280.388723] Buffer I/O error on dev rbd1, logical block 13078470483, lost async page write
[2607280.388747] Buffer I/O error on dev rbd1, logical block 13078470484, lost async page write
[2607280.388782] Buffer I/O error on dev rbd1, logical block 13078470485, lost async page write
[2607280.388814] Buffer I/O error on dev rbd1, logical block 13078470486, lost async page write
[2607280.388836] Buffer I/O error on dev rbd1, logical block 13078470487, lost async page write
[2607280.388857] Buffer I/O error on dev rbd1, logical block 13078470488, lost async page write
[2607280.388879] Buffer I/O error on dev rbd1, logical block 13078470489, lost async page write
[2607280.388901] Buffer I/O error on dev rbd1, logical block 13078470490, lost async page write
[2607280.388922] Buffer I/O error on dev rbd1, logical block 13078470491, lost async page write
[2607280.388944] Buffer I/O error on dev rbd1, logical block 13078470492, lost async page write

I did remount rbd, and the problem was solved.

Actions

Copy link

#10

Updated by Ilya Dryomov over 6 years ago

Hi Sergey,

Which version of ceph?

What precedes "[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0" in the kernel log (at least 100 lines)?

Any matches for "misdirected" in the cluster or OSD logs?

Actions

Copy link

#11

Updated by Sergey Jerusalimov over 6 years ago

Ilya Dryomov wrote:

Hi Sergey,

Which version of ceph?

What precedes "[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0" in the kernel log (at least 100 lines)?

Any matches for "misdirected" in the cluster or OSD logs?

Ceph version 10.2.9

I grep osd logs, when this event happened (zcat /var/log/ceph/ceph-osd.*.log.1.gz|grep -i misdirected), and no entry found

last ~100 lines:

[1797261.005743] libceph: osd2301 up
[1797261.005743] libceph: osd2303 up
[1797266.830113] libceph: osd405 up
[1797266.830115] libceph: osd570 up
[1797338.171855] libceph: osd1137 down
[1797343.302111] libceph: osd1137 up
[1797373.361325] libceph: osd2043 down
[1797379.213712] libceph: osd2043 up
[1797834.314543] INFO: task xfsaild/rbd2:2368 blocked for more than 300 seconds.
[1797834.314575] Not tainted 4.9.43-11 #1
[1797834.314588] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.314611] xfsaild/rbd2 D 0 2368 2 0x00000000
[1797834.314613] Call Trace:
[1797834.314616] ---[ now 2017-09-08 13:18:55+03 ]---
[1797834.318400] INFO: task kworker/u66:0:475176 blocked for more than 300 seconds.
[1797834.318434] Not tainted 4.9.43-11 #1
[1797834.318454] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.318487] kworker/u66:0 D 0 475176 2 0x00000000
[1797834.318503] Workqueue: writeback wb_workfn (flush-251:32)
[1797834.318516] Call Trace:
[1797834.318519] ---[ now 2017-09-08 13:18:55+03 ]---
[1797834.318529] INFO: task kworker/2:7:281782 blocked for more than 300 seconds.
[1797834.318562] Not tainted 4.9.43-11 #1
[1797834.318575] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.318596] kworker/2:7 D 0 281782 2 0x00000000
[1797834.318648] Workqueue: xfs-cil/rbd2 xlog_cil_push_work [xfs]
[1797834.318649] Call Trace:
[1797834.318650] ---[ now 2017-09-08 13:18:55+03 ]---
[1797834.318657] INFO: task kworker/2:2:327901 blocked for more than 300 seconds.
[1797834.318677] Not tainted 4.9.43-11 #1
[1797834.318690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1797834.318711] kworker/2:2 D 0 327901 2 0x00000000
[1797834.318733] Workqueue: xfs-sync/rbd2 xfs_log_worker [xfs]
[1797834.318734] Call Trace:
[1797834.318735] ---[ now 2017-09-08 13:18:55+03 ]---
[1798145.630858] INFO: task xfsaild/rbd2:2368 blocked for more than 300 seconds.
[1798145.630888] Not tainted 4.9.43-11 #1
[1798145.630900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.630922] xfsaild/rbd2 D 0 2368 2 0x00000000
[1798145.630925] Call Trace:
[1798145.630927] ---[ now 2017-09-08 13:24:06+03 ]---
[1798145.631531] INFO: task kworker/u66:0:475176 blocked for more than 300 seconds.
[1798145.631552] Not tainted 4.9.43-11 #1
[1798145.631564] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.631585] kworker/u66:0 D 0 475176 2 0x00000000
[1798145.631592] Workqueue: writeback wb_workfn (flush-251:32)
[1798145.631593] Call Trace:
[1798145.631606] ---[ now 2017-09-08 13:24:06+03 ]---
[1798145.631615] INFO: task kworker/2:7:281782 blocked for more than 300 seconds.
[1798145.631642] Not tainted 4.9.43-11 #1
[1798145.631654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.631674] kworker/2:7 D 0 281782 2 0x00000000
[1798145.631735] Workqueue: xfs-cil/rbd2 xlog_cil_push_work [xfs]
[1798145.631736] Call Trace:
[1798145.631737] ---[ now 2017-09-08 13:24:06+03 ]---
[1798145.631744] INFO: task kworker/2:2:327901 blocked for more than 300 seconds.
[1798145.631763] Not tainted 4.9.43-11 #1
[1798145.631775] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1798145.631795] kworker/2:2 D 0 327901 2 0x00000000
[1798145.631816] Workqueue: xfs-sync/rbd2 xfs_log_worker [xfs]
[1798145.631816] Call Trace:
[1798145.631817] ---[ now 2017-09-08 13:24:06+03 ]---
[1859420.360623] Process accounting resumed
[1919221.056578] init: atop main process (438154) terminated with status 15
[1945838.550847] Process accounting resumed
[1963013.589650] libceph: osd331 down
[1963340.964016] libceph: osd331 weight 0x0 (out)
[1974457.437649] libceph: osd331 does not exist
[1974548.082977] libceph: osd331 weight 0x10000 (in)
[1974548.082978] libceph: osd331 up
[2032247.410336] Process accounting resumed
[2118645.018933] Process accounting resumed
[2205035.944405] Process accounting resumed
[2212525.304262] libceph: osd984 down
[2212842.890254] libceph: osd984 weight 0x0 (out)
[2228479.105051] libceph: osd984 does not exist
[2228772.920789] libceph: osd984 weight 0x10000 (in)
[2228772.920790] libceph: osd984 up
[2325988.047460] init: atop main process (192190) terminated with status 15
[2325988.091022] Process accounting resumed
[2377845.778386] Process accounting resumed
[2391246.256656] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state OPEN)
[2391255.414440] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391255.713959] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391256.706088] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391258.690148] libceph: osd336 [2a02:6b8:b000:20f:feaa:14ff:fedb:7b68]:6848 socket closed (con state CONNECTING)
[2391260.384495] libceph: osd336 down
[2391262.509868] libceph: osd336 up
[2464236.904167] Process accounting resumed
[2550678.256678] Process accounting resumed
[2587593.503805] libceph: osd308 down
[2587917.626492] libceph: osd308 weight 0x0 (out)
[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0
[2607280.364881] XFS (rbd1): Unmount and run xfs_repair
[2607280.364897] XFS (rbd1): First 64 bytes of corrupted metadata buffer:
[2607280.364917] ffff881ac70c7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364940] ffff881ac70c7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364963] ffff881ac70c7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.364985] ffff881ac70c7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[2607280.365012] XFS (rbd1): metadata I/O error: block 0x17fff93ff0 ("xfs_trans_read_buf_map") error 117 numblks 8
[2607280.365038] XFS (rbd1): xfs_do_force_shutdown(0x1) called from line 315 of file fs/xfs/xfs_trans_buf.c. Return address = 0xffffffffa07a008e
[2607280.388672] XFS (rbd1): I/O Error Detected. Shutting down filesystem
[2607280.388693] XFS (rbd1): Please umount the filesystem and rectify the problem(s)
[2607280.388723] Buffer I/O error on dev rbd1, logical block 13078470483, lost async page write
[2607280.388747] Buffer I/O error on dev rbd1, logical block 13078470484, lost async page write
[2607280.388782] Buffer I/O error on dev rbd1, logical block 13078470485, lost async page write
[2607280.388814] Buffer I/O error on dev rbd1, logical block 13078470486, lost async page write
[2607280.388836] Buffer I/O error on dev rbd1, logical block 13078470487, lost async page write
[2607280.388857] Buffer I/O error on dev rbd1, logical block 13078470488, lost async page write
[2607280.388879] Buffer I/O error on dev rbd1, logical block 13078470489, lost async page write
[2607280.388901] Buffer I/O error on dev rbd1, logical block 13078470490, lost async page write
[2607280.388922] Buffer I/O error on dev rbd1, logical block 13078470491, lost async page write
[2607280.388944] Buffer I/O error on dev rbd1, logical block 13078470492, lost async page write
[2607393.391976] libceph: osd1651 down
[2607404.653040] libceph: osd1640 down
[2607438.638985] libceph: osd1465 down
[2607445.615269] libceph: osd1343 down
[2608145.709479] libceph: osd1651 up
[2608212.492767] libceph: osd1343 up
[2608376.422946] libceph: osd1465 up
[2608630.041840] libceph: osd1640 up
[2610304.474025] VFS: umount (789497) detach xfs /dev/rbd1 (47) / (16384) parent ext4 /dev/md2 (21) /ceph_rbd_test_prod (1179653)
[2610304.474028] VFS: umount (789497) detach xfs /dev/rbd1 (47) / (16384) parent ext4 /dev/md2 (21) /ceph_rbd_test_prod (1179653)
[2610304.655681] XFS (rbd1): Unmounting Filesystem
[2610306.723927] XFS (rbd1): Mounting V4 Filesystem
[2610307.830989] XFS (rbd1): Starting recovery (logdev: internal)
[2610315.253023] XFS (rbd1): Ending recovery (logdev: internal)
[2610315.253045] VFS: mount (789498) attach xfs /dev/rbd1 (47) / (16384) parent ext4 /dev/md2 (21) /ceph_rbd_test_prod (1179653)
[2637077.777978] Process accounting resumed

May be XFS bug in 4.9 kernel

Actions

Copy link

#12

Updated by Ilya Dryomov over 6 years ago

What about the cluster log, e.g. /var/log/ceph/ceph.log?

Actions

Copy link

#13

Updated by Sergey Jerusalimov over 6 years ago

Ilya Dryomov wrote:

What about the cluster log, e.g. /var/log/ceph/ceph.log?

No "misdirected" records found in /var/log/ceph/ceph.1.log too

Actions

Copy link

#14

Updated by Ilya Dryomov over 6 years ago

What about other, not .1, log files?

Are you sure that what you are grepping over covers the time of "[2607280.364832] XFS (rbd1): Metadata corruption detected at xfs_allocbt_read_verify+0x6c/0xd0 [xfs], xfs_allocbt block 0x17fff93ff0"?

Actions

Copy link

#15

Updated by Sergey Jerusalimov over 6 years ago

yes, try all logs

zcat /var/log/ceph/ceph-osd.*.log.*.gz|grep -i misdirected
cat /var/log/ceph/ceph-osd.*.log|grep -i misdirected
zcat /var/log/ceph/ceph.log.*.gz|grep -i misdirected
cat /var/log/ceph/ceph.log|grep -i misdirected

Actions

Copy link

#16