Bug #14901
closedmisdirected requests on 4.2 during rebalancing
0%
Description
On ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) Linux server4 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
We're getting errors with rbd after rebalance
/var/log/syslog:Feb 26 11:00:08 server4 kernel: [1364195.648855] rbd: rbd0: result -6 xferred 1000
Feb 26 11:00:08 server4 kernel: [1364195.648857] blk_update_request: I/O error, dev rbd0, sector 470086560
Feb 26 11:00:11 server4 kernel: [1364198.837281] rbd: rbd0: read 4000 at 3711fd7000 (3d7000)
Feb 26 11:00:11 server4 kernel: [1364198.837287] rbd: rbd0: result -6 xferred 4000
Feb 26 11:00:11 server4 kernel: [1364198.837290] blk_update_request: I/O error, dev rbd0, sector 461962936
Feb 26 11:00:11 server4 kernel: [1364198.846163] rbd: rbd0: read 1000 at 3711fd7000 (3d7000)
Feb 26 11:00:11 server4 kernel: [1364198.846166] rbd: rbd0: result -6 xferred 1000
Feb 26 11:00:11 server4 kernel: [1364198.846168] blk_update_request: I/O error, dev rbd0, sector 461962936
Feb 26 11:00:11 server4 kernel: [1364198.934545] blk_update_request: I/O error, dev rbd0, sector 461961904
Feb 26 11:21:34 server4 kernel: [1365481.867101] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:34 server4 kernel: [1365481.867106] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:34 server4 kernel: [1365481.867109] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:34 server4 kernel: [1365481.875329] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:21:41 server4 kernel: [1365488.690117] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:41 server4 kernel: [1365488.690122] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:41 server4 kernel: [1365488.690125] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:41 server4 kernel: [1365488.698154] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:21:47 server4 kernel: [1365494.610039] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:47 server4 kernel: [1365494.610044] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:47 server4 kernel: [1365494.610047] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:47 server4 kernel: [1365494.617864] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:21:56 server4 kernel: [1365503.442835] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:56 server4 kernel: [1365503.442840] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:56 server4 kernel: [1365503.442844] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:56 server4 kernel: [1365503.450469] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:22:05 server4 kernel: [1365512.593962] rbd: rbd0: read 1000 at 348203e000 (3e000)
Feb 26 11:22:05 server4 kernel: [1365512.593969] rbd: rbd0: result -6 xferred 1000
Feb 26 11:22:05 server4 kernel: [1365512.593972] blk_update_request: I/O error, dev rbd0, sector 440467952
It has happened several times already after cluster rebalance and only with rbd devices. TGT iSCSI targets seems to be fine.
Files
Updated by Ilya Dryomov about 8 years ago
Is there anything else in dmesg? What is general state of the cluster, any warnings in ceph -s, osd logs? Do you use rbd snapshots and/or clones? How exactly are you increasing the number of PGs (commands you are issuing)? Are those errors transient or do they persist? Can you reproduce with "debug ms = 1" set on the OSDs?
We increase pg_count from 512 to 1024, and see rbd errors at first time. Nextime one of the osd in our cluster was fail due HDD failure which triggered the procedure of ceph rebalance, and we see rbd IO errors again. And at final step yesterday we aditionly add 9 new osd, and rebalance heppens again, and as a result we see rbd IO errors again. To fix rbd and filesistem failure we umount filesistem, umap rbd, then map rbd and mount file system
We doesn't use snapshots and/or clones.
Updated by Ruslan Usifov about 8 years ago
our dmsg log is clean, and doens't have any suspicious records(only boot log, thats all). When rbd erros happens last time (we launch massivly rsync). ceph will by in health state, like now:
cluster df060192-7369-42dd-bb9c-db18bb6f3355 health HEALTH_OK monmap e5: 6 mons at {ceph1=192.168.30.101:6789/0,ceph2=192.168.30.102:6789/0,ceph3=192.168.30.103:6789/0,ceph4=192.168.30.104:6789/0,ceph5=192.168.30.105:6789/0,ceph6=192.168.30.106:6789/0} election epoch 222, quorum 0,1,2,3,4,5 ceph1,ceph2,ceph3,ceph4,ceph5,ceph6 osdmap e5982: 49 osds: 49 up, 49 in flags sortbitwise pgmap v1069321: 1024 pgs, 1 pools, 6193 GB data, 1564 kobjects 12411 GB used, 38762 GB / 51174 GB avail 1024 active+clean client io 6403 B/s wr, 1 op/s
Updated by Ruslan Usifov about 8 years ago
We found follow warn in osd logs when we make massive rsync
2016-02-26 11:06:25.969815 7f16750af700 0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795379 pg 2 .ebfbb119 to osd.19 not [27,19] in e5982/5982 2016-02-26 11:06:26.046774 7f16750af700 0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795381 pg 2 .ebfbb119 to osd.19 not [27,19] in e5982/5982 2016-02-26 11:06:26.800276 7f16728aa700 0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795390 pg 2 .c5547424 to osd.19 not [22,19] in e5982/5982 2016-02-26 11:06:26.907875 7f16750af700 0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795393 pg 2 .c5547424 to osd.19 not [22,19] in e5982/5982 2016-02-26 11:06:51.353003 7f16728aa700 0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795598 pg 2 .76468424 to osd.19 not [22,19] in e5982/5982 2016-02-26 11:06:51.558168 7f1669e82700 0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6825/2555 pipe(0x7f169c13e000 sd=97 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f1 6aa043b80).accept connect_seq 325 vs existing 325 state standby 2016-02-26 11:06:51.558786 7f1669e82700 0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6825/2555 pipe(0x7f169c13e000 sd=97 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f1 6aa043b80).accept connect_seq 326 vs existing 325 state standby 2016-02-26 11:06:52.166995 7f16728aa700 0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795601 pg 2 .76468424 to osd.19 not [22,19] in e5982/5982 2016-02-26 11:15:17.370983 7f165f45f700 0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6817/2398 pipe(0x7f169f2de000 sd=211 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f 16aa0426e0).accept connect_seq 351 vs existing 351 state standby
192.168.31.220 is an ip of rbd maped instance
Updated by Ilya Dryomov about 8 years ago
- Subject changed from Rbd read errors after rebalance to misdirected requests on 4.2 during rebalancing
- Status changed from New to 12
- Assignee set to Ilya Dryomov
Updated by Ilya Dryomov about 8 years ago
- Project changed from Ceph to Linux kernel client
Updated by Ilya Dryomov about 8 years ago
- Project changed from Linux kernel client to Ceph
Hi Ruslan,
Can you do
$ ceph osd getmap 5982 >osdmap.5982
and attach that file?
Updated by Ilya Dryomov about 8 years ago
- Project changed from Ceph to Linux kernel client
Updated by Ruslan Usifov about 8 years ago
- File osdmap.5982 osdmap.5982 added
Hello
Ready, result in attachment
Updated by Ilya Dryomov about 8 years ago
Hmm. Can you do the same for each from 5970 to 5981 and attach a .tar with those?
Updated by Ruslan Usifov about 8 years ago
- File osdmap.tar osdmap.tar added
Osd maps from 5970 to 5981 are in attachment
Updated by Ilya Dryomov about 8 years ago
Was the massive rsync you've provided warnings for done during rebalancing or after rebalancing completed? Do you have more instances of these warnings? If so, provide all of them, please.
Could you also attach more osdmaps? 5982..6000 and, say, 5800..5969?
Updated by Ruslan Usifov about 8 years ago
- File osd-logs.tar osd-logs.tar added
- File osdmap.5800-5969.tar.gz osdmap.5800-5969.tar.gz added
- File osdmap.5982-6000.tar.gz osdmap.5982-6000.tar.gz added
We waited when rebalance was done, then launch rsync.
And yes of course, misderected happens doesn't only one node.
We attach all logs that we collect that have misderected WRN rocords.
Also i must mention that we have multiple rbd devices, and rbd that we talk about hosted on 192.168.31.220
osd maps 5982..6000 and, 5800..5969 in appropriate attachments
Also we must said, that after situation that we describe in this issue, we have another disk fail, so osdmaps that we put in attachments my be wrong, and not describe fully picture?
Updated by Ilya Dryomov about 8 years ago
This seems to be more serious than I thought. Could you send over as many osdmaps going back from 5800 as possible (they are periodically trimmed, but I think you should be able to get down to ~5600)? Another thing that could help is the full syslog for Feb 26 +/- a day.
Updated by Ruslan Usifov about 8 years ago
- File syslog.7.gz syslog.7.gz added
- File osdmap.5600-5800.tar.gz osdmap.5600-5800.tar.gz added
We attached syslog from 192.169.31.220 for 26 feb 2016
We make osdmap, but maps exist only begin from 5732
Updated by Ruslan Usifov about 8 years ago
- File syslog.27.02.2016.gz syslog.27.02.2016.gz added
Aditional attach syslog for 27-02-2016
Updated by Ilya Dryomov about 8 years ago
What about a syslog for the night of Feb 26 - the attached one starts at ~7 o'clock? I need to see syslogs for the entire 26th from both 31.220 and 30.4.
Updated by Ruslan Usifov about 8 years ago
Sorry but more old logs alredy rotated and we haven't them
Updated by Ilya Dryomov about 8 years ago
It's not immediately obvious what the problem is - there are a few different possibilities and not enough information to go on. Would you be willing to try to reproduce this with logging enabled? Ideally, I would like you to install a custom kernel (.deb package built on our gitbuilders) on 31.220 and 30.4 (or at least one of them) and enable more verbose logging on the OSD side. If installing a custom kernel is something you can't do, verbose OSD logs and some limited logs from the kernel you've got installed might still help...
Updated by Ruslan Usifov about 8 years ago
We will try! But where we can get custom kernels, and whats is scenario of reproduce? Set weight of some osd to 0? Or something else?
Updated by Ilya Dryomov about 8 years ago
If I had a reproducer, I wouldn't be asking you to reproduce ;) I would suggest trying to replicate what you described above as close as possible; changing weights to trigger rebalancing is a good idea. I tried those things on my local cluster but didn't see a single misdirected request - probably because my cluster is too small or I'm doing something different from what you did.
Would you be open to trying to replicate one of the occurrences (increasing PG count, OSD failure, adding OSDs) on that same cluster without installing a custom kernel, but with logging enabled? "debug ms = 1" for MONs and OSDs and
# modprobe rbd # echo 'file osdmap.c +pt' >/sys/kernel/debug/dynamic_debug/control # echo 'file osd_client.c +pt' >/sys/kernel/debug/dynamic_debug/control # echo 'format get_osd -pt' >/sys/kernel/debug/dynamic_debug/control # echo 'format put_osd -pt' >/sys/kernel/debug/dynamic_debug/control
on the kernel client boxes? That might be just enough and we won't have to bother with a custom kernel.
Updated by Ilya Dryomov almost 8 years ago
- Category set to rbd
- Status changed from 12 to Need More Info
- Priority changed from High to Low
- Tags deleted (
rbd)
OSD client has been rewritten in 4.7.
Updated by Ilya Dryomov almost 8 years ago
- Category changed from rbd to libceph
Updated by Ilya Dryomov almost 8 years ago
- Status changed from Need More Info to In Progress
- Priority changed from Low to Urgent
OK, so this is not fixed in 4.7 (yet). After two more reports, I went through all the logs again and managed to reproduce at least one scenario.
Updated by Ilya Dryomov almost 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Ruslan Usifov almost 8 years ago
Hello
We have not very new hdd that may fail quite often, so what you need from us to hel to you? Debug output consume very big space, so we can't hold it alltime.
Updated by Ilya Dryomov almost 8 years ago
Nothing - I think I fixed it. The patch just needs to be reviewed and will be merged ASAP, hopefully into the final 4.7 kernel.
Updated by Ruslan Usifov almost 8 years ago
Perhaps a stupid question. Have we any chanse to use try this new rbd driver on kernel 4.4? We use Ubuntu 14.4 and this it difficult to us use unstable kernels
Updated by Ilya Dryomov almost 8 years ago
Not stupid at all. The fix is set to be backported to 3.15+, it might take a bit more time for it to propagate to Ubuntu kernels, but it'll get there eventually.
Updated by Ilya Dryomov almost 8 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Ilya Dryomov almost 8 years ago
libceph: apply new_state before new_up_client on incrementals
merged into 4.7-final. Additional backport for 3.10-3.14 submitted.
Updated by Ruslan Usifov over 7 years ago
Is there any way to know when backport will hit 4.4 Ubuntu kernel?
Updated by Ilya Dryomov over 7 years ago
A couple of weeks from now? It's up to the Ubuntu kernel team.
Updated by Ilya Dryomov over 7 years ago
- Status changed from Pending Backport to Resolved
In 3.14.75, 3.18.39, 4.1.30, 4.4.17 and 4.6.6.
Updated by Ilya Dryomov about 7 years ago
- Has duplicate Bug #18807: I/O error on rbd device after adding new OSD to Crush map added
Updated by Ilya Dryomov about 7 years ago
- Related to Bug #19122: pre-jewel "osd rm" incrementals are misinterpreted (kernel client) added