Project

General

Profile

Actions

Bug #14901

closed

misdirected requests on 4.2 during rebalancing

Added by Ruslan Usifov about 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Crash signature (v1):
Crash signature (v2):

Description

On ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) Linux server4 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

We're getting errors with rbd after rebalance

/var/log/syslog:
Feb 26 11:00:08 server4 kernel: [1364195.648855] rbd: rbd0: result -6 xferred 1000
Feb 26 11:00:08 server4 kernel: [1364195.648857] blk_update_request: I/O error, dev rbd0, sector 470086560
Feb 26 11:00:11 server4 kernel: [1364198.837281] rbd: rbd0: read 4000 at 3711fd7000 (3d7000)
Feb 26 11:00:11 server4 kernel: [1364198.837287] rbd: rbd0: result -6 xferred 4000
Feb 26 11:00:11 server4 kernel: [1364198.837290] blk_update_request: I/O error, dev rbd0, sector 461962936
Feb 26 11:00:11 server4 kernel: [1364198.846163] rbd: rbd0: read 1000 at 3711fd7000 (3d7000)
Feb 26 11:00:11 server4 kernel: [1364198.846166] rbd: rbd0: result -6 xferred 1000
Feb 26 11:00:11 server4 kernel: [1364198.846168] blk_update_request: I/O error, dev rbd0, sector 461962936
Feb 26 11:00:11 server4 kernel: [1364198.934545] blk_update_request: I/O error, dev rbd0, sector 461961904
Feb 26 11:21:34 server4 kernel: [1365481.867101] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:34 server4 kernel: [1365481.867106] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:34 server4 kernel: [1365481.867109] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:34 server4 kernel: [1365481.875329] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:21:41 server4 kernel: [1365488.690117] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:41 server4 kernel: [1365488.690122] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:41 server4 kernel: [1365488.690125] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:41 server4 kernel: [1365488.698154] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:21:47 server4 kernel: [1365494.610039] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:47 server4 kernel: [1365494.610044] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:47 server4 kernel: [1365494.610047] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:47 server4 kernel: [1365494.617864] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:21:56 server4 kernel: [1365503.442835] rbd: rbd0: read 1000 at 3482027000 (27000)
Feb 26 11:21:56 server4 kernel: [1365503.442840] rbd: rbd0: result -6 xferred 1000
Feb 26 11:21:56 server4 kernel: [1365503.442844] blk_update_request: I/O error, dev rbd0, sector 440467768
Feb 26 11:21:56 server4 kernel: [1365503.450469] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block
Feb 26 11:22:05 server4 kernel: [1365512.593962] rbd: rbd0: read 1000 at 348203e000 (3e000)
Feb 26 11:22:05 server4 kernel: [1365512.593969] rbd: rbd0: result -6 xferred 1000
Feb 26 11:22:05 server4 kernel: [1365512.593972] blk_update_request: I/O error, dev rbd0, sector 440467952

It has happened several times already after cluster rebalance and only with rbd devices. TGT iSCSI targets seems to be fine.


Files

osdmap.5982 (32.1 KB) osdmap.5982 Ruslan Usifov, 03/02/2016 05:47 PM
osdmap.tar (400 KB) osdmap.tar Ruslan Usifov, 03/03/2016 08:11 AM
osd-logs.tar (310 KB) osd-logs.tar Ruslan Usifov, 03/04/2016 09:15 AM
osdmap.5800-5969.tar.gz (643 KB) osdmap.5800-5969.tar.gz Ruslan Usifov, 03/04/2016 09:33 AM
osdmap.5982-6000.tar.gz (66.3 KB) osdmap.5982-6000.tar.gz Ruslan Usifov, 03/04/2016 09:33 AM
syslog.7.gz (6.5 KB) syslog.7.gz Ruslan Usifov, 03/04/2016 03:00 PM
osdmap.5600-5800.tar.gz (286 KB) osdmap.5600-5800.tar.gz Ruslan Usifov, 03/04/2016 03:05 PM
syslog.27.02.2016.gz (2.83 KB) syslog.27.02.2016.gz Ruslan Usifov, 03/04/2016 03:15 PM

Related issues 2 (0 open2 closed)

Related to Linux kernel client - Bug #19122: pre-jewel "osd rm" incrementals are misinterpreted (kernel client)ResolvedIlya Dryomov03/01/2017

Actions
Has duplicate Linux kernel client - Bug #18807: I/O error on rbd device after adding new OSD to Crush mapDuplicateIlya Dryomov02/03/2017

Actions
Actions #1

Updated by Ilya Dryomov about 8 years ago

Is there anything else in dmesg? What is general state of the cluster, any warnings in ceph -s, osd logs?

Do you use rbd snapshots and/or clones?

How exactly are you increasing the number of PGs (commands you are issuing)? Are those errors transient or do they persist?

Can you reproduce with "debug ms = 1" set on the OSDs?

We increase pg_count from 512 to 1024, and see rbd errors at first time. Nextime one of the osd in our cluster was fail due HDD failure which triggered the procedure of ceph rebalance, and we see rbd IO errors again. And at final step yesterday we aditionly add 9 new osd, and rebalance heppens again, and as a result we see rbd IO errors again. To fix rbd and filesistem failure we umount filesistem, umap rbd, then map rbd and mount file system

We doesn't use snapshots and/or clones.

Actions #2

Updated by Ruslan Usifov about 8 years ago

our dmsg log is clean, and doens't have any suspicious records(only boot log, thats all). When rbd erros happens last time (we launch massivly rsync). ceph will by in health state, like now:

    cluster df060192-7369-42dd-bb9c-db18bb6f3355
     health HEALTH_OK
     monmap e5: 6 mons at {ceph1=192.168.30.101:6789/0,ceph2=192.168.30.102:6789/0,ceph3=192.168.30.103:6789/0,ceph4=192.168.30.104:6789/0,ceph5=192.168.30.105:6789/0,ceph6=192.168.30.106:6789/0}
            election epoch 222, quorum 0,1,2,3,4,5 ceph1,ceph2,ceph3,ceph4,ceph5,ceph6
     osdmap e5982: 49 osds: 49 up, 49 in
            flags sortbitwise
      pgmap v1069321: 1024 pgs, 1 pools, 6193 GB data, 1564 kobjects
            12411 GB used, 38762 GB / 51174 GB avail
                1024 active+clean
  client io 6403 B/s wr, 1 op/s

Actions #3

Updated by Ruslan Usifov about 8 years ago

We found follow warn in osd logs when we make massive rsync

2016-02-26 11:06:25.969815 7f16750af700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795379 pg 2
.ebfbb119 to osd.19 not [27,19] in e5982/5982
2016-02-26 11:06:26.046774 7f16750af700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795381 pg 2
.ebfbb119 to osd.19 not [27,19] in e5982/5982
2016-02-26 11:06:26.800276 7f16728aa700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795390 pg 2
.c5547424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:06:26.907875 7f16750af700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795393 pg 2
.c5547424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:06:51.353003 7f16728aa700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795598 pg 2
.76468424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:06:51.558168 7f1669e82700  0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6825/2555 pipe(0x7f169c13e000 sd=97 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f1
6aa043b80).accept connect_seq 325 vs existing 325 state standby
2016-02-26 11:06:51.558786 7f1669e82700  0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6825/2555 pipe(0x7f169c13e000 sd=97 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f1
6aa043b80).accept connect_seq 326 vs existing 325 state standby
2016-02-26 11:06:52.166995 7f16728aa700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795601 pg 2
.76468424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:15:17.370983 7f165f45f700  0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6817/2398 pipe(0x7f169f2de000 sd=211 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f
16aa0426e0).accept connect_seq 351 vs existing 351 state standby

192.168.31.220 is an ip of rbd maped instance

Actions #4

Updated by Ilya Dryomov about 8 years ago

  • Subject changed from Rbd read errors after rebalance to misdirected requests on 4.2 during rebalancing
  • Status changed from New to 12
  • Assignee set to Ilya Dryomov
Actions #5

Updated by Ilya Dryomov about 8 years ago

  • Project changed from Ceph to Linux kernel client
Actions #6

Updated by Ilya Dryomov about 8 years ago

  • Project changed from Linux kernel client to Ceph

Hi Ruslan,

Can you do

$ ceph osd getmap 5982 >osdmap.5982

and attach that file?

Actions #7

Updated by Ilya Dryomov about 8 years ago

  • Project changed from Ceph to Linux kernel client
Actions #8

Updated by Ruslan Usifov about 8 years ago

Hello

Ready, result in attachment

Actions #9

Updated by Ilya Dryomov about 8 years ago

Hmm. Can you do the same for each from 5970 to 5981 and attach a .tar with those?

Actions #10

Updated by Ruslan Usifov about 8 years ago

Osd maps from 5970 to 5981 are in attachment

Actions #11

Updated by Ilya Dryomov about 8 years ago

Was the massive rsync you've provided warnings for done during rebalancing or after rebalancing completed? Do you have more instances of these warnings? If so, provide all of them, please.

Could you also attach more osdmaps? 5982..6000 and, say, 5800..5969?

Updated by Ruslan Usifov about 8 years ago

We waited when rebalance was done, then launch rsync.

And yes of course, misderected happens doesn't only one node.

We attach all logs that we collect that have misderected WRN rocords.

Also i must mention that we have multiple rbd devices, and rbd that we talk about hosted on 192.168.31.220

osd maps 5982..6000 and, 5800..5969 in appropriate attachments

Also we must said, that after situation that we describe in this issue, we have another disk fail, so osdmaps that we put in attachments my be wrong, and not describe fully picture?

Actions #13

Updated by Ilya Dryomov about 8 years ago

This seems to be more serious than I thought. Could you send over as many osdmaps going back from 5800 as possible (they are periodically trimmed, but I think you should be able to get down to ~5600)? Another thing that could help is the full syslog for Feb 26 +/- a day.

Updated by Ruslan Usifov about 8 years ago

We attached syslog from 192.169.31.220 for 26 feb 2016

We make osdmap, but maps exist only begin from 5732

Actions #15

Updated by Ruslan Usifov about 8 years ago

Aditional attach syslog for 27-02-2016

Actions #16

Updated by Ilya Dryomov about 8 years ago

What about a syslog for the night of Feb 26 - the attached one starts at ~7 o'clock? I need to see syslogs for the entire 26th from both 31.220 and 30.4.

Actions #17

Updated by Ruslan Usifov about 8 years ago

Sorry but more old logs alredy rotated and we haven't them

Actions #18

Updated by Ilya Dryomov about 8 years ago

It's not immediately obvious what the problem is - there are a few different possibilities and not enough information to go on. Would you be willing to try to reproduce this with logging enabled? Ideally, I would like you to install a custom kernel (.deb package built on our gitbuilders) on 31.220 and 30.4 (or at least one of them) and enable more verbose logging on the OSD side. If installing a custom kernel is something you can't do, verbose OSD logs and some limited logs from the kernel you've got installed might still help...

Actions #19

Updated by Ruslan Usifov about 8 years ago

We will try! But where we can get custom kernels, and whats is scenario of reproduce? Set weight of some osd to 0? Or something else?

Actions #20

Updated by Ilya Dryomov about 8 years ago

If I had a reproducer, I wouldn't be asking you to reproduce ;) I would suggest trying to replicate what you described above as close as possible; changing weights to trigger rebalancing is a good idea. I tried those things on my local cluster but didn't see a single misdirected request - probably because my cluster is too small or I'm doing something different from what you did.

Would you be open to trying to replicate one of the occurrences (increasing PG count, OSD failure, adding OSDs) on that same cluster without installing a custom kernel, but with logging enabled? "debug ms = 1" for MONs and OSDs and

# modprobe rbd
# echo 'file osdmap.c +pt' >/sys/kernel/debug/dynamic_debug/control
# echo 'file osd_client.c +pt' >/sys/kernel/debug/dynamic_debug/control
# echo 'format get_osd -pt' >/sys/kernel/debug/dynamic_debug/control
# echo 'format put_osd -pt' >/sys/kernel/debug/dynamic_debug/control

on the kernel client boxes? That might be just enough and we won't have to bother with a custom kernel.

Actions #21

Updated by Ilya Dryomov almost 8 years ago

  • Category set to rbd
  • Status changed from 12 to Need More Info
  • Priority changed from High to Low
  • Tags deleted (rbd)

OSD client has been rewritten in 4.7.

Actions #22

Updated by Ilya Dryomov almost 8 years ago

  • Category changed from rbd to libceph
Actions #23

Updated by Ilya Dryomov almost 8 years ago

  • Status changed from Need More Info to In Progress
  • Priority changed from Low to Urgent

OK, so this is not fixed in 4.7 (yet). After two more reports, I went through all the logs again and managed to reproduce at least one scenario.

Actions #24

Updated by Ilya Dryomov almost 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #25

Updated by Ruslan Usifov almost 8 years ago

Hello

We have not very new hdd that may fail quite often, so what you need from us to hel to you? Debug output consume very big space, so we can't hold it alltime.

Actions #26

Updated by Ilya Dryomov almost 8 years ago

Nothing - I think I fixed it. The patch just needs to be reviewed and will be merged ASAP, hopefully into the final 4.7 kernel.

Actions #27

Updated by Ruslan Usifov almost 8 years ago

Perhaps a stupid question. Have we any chanse to use try this new rbd driver on kernel 4.4? We use Ubuntu 14.4 and this it difficult to us use unstable kernels

Actions #28

Updated by Ilya Dryomov almost 8 years ago

Not stupid at all. The fix is set to be backported to 3.15+, it might take a bit more time for it to propagate to Ubuntu kernels, but it'll get there eventually.

Actions #29

Updated by Ilya Dryomov almost 8 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #30

Updated by Ilya Dryomov almost 8 years ago

libceph: apply new_state before new_up_client on incrementals

merged into 4.7-final. Additional backport for 3.10-3.14 submitted.

Actions #31

Updated by Ruslan Usifov over 7 years ago

Is there any way to know when backport will hit 4.4 Ubuntu kernel?

Actions #32

Updated by Ilya Dryomov over 7 years ago

A couple of weeks from now? It's up to the Ubuntu kernel team.

Actions #33

Updated by Ilya Dryomov over 7 years ago

  • Status changed from Pending Backport to Resolved

In 3.14.75, 3.18.39, 4.1.30, 4.4.17 and 4.6.6.

Actions #34

Updated by Ilya Dryomov about 7 years ago

In 3.16.39.

Actions #35

Updated by Ilya Dryomov about 7 years ago

  • Has duplicate Bug #18807: I/O error on rbd device after adding new OSD to Crush map added
Actions #36

Updated by Ilya Dryomov about 7 years ago

  • Related to Bug #19122: pre-jewel "osd rm" incrementals are misinterpreted (kernel client) added
Actions

Also available in: Atom PDF