Bug #14901: misdirected requests on 4.2 during rebalancing - Linux kernel client - Ceph

Actions

Copy link

Bug #14901

closed

misdirected requests on 4.2 during rebalancing

Added by Ruslan Usifov about 8 years ago. Updated about 7 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Ilya Dryomov

Category:

libceph

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

rbd

Crash signature (v1):

Crash signature (v2):

Description

On ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) Linux server4 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

We're getting errors with rbd after rebalance

/var/log/syslog:
Feb 26 11:00:08 server4 kernel: [1364195.648855] rbd: rbd0: result -6 xferred 1000 Feb 26 11:00:08 server4 kernel: [1364195.648857] blk_update_request: I/O error, dev rbd0, sector 470086560 Feb 26 11:00:11 server4 kernel: [1364198.837281] rbd: rbd0: read 4000 at 3711fd7000 (3d7000) Feb 26 11:00:11 server4 kernel: [1364198.837287] rbd: rbd0: result -6 xferred 4000 Feb 26 11:00:11 server4 kernel: [1364198.837290] blk_update_request: I/O error, dev rbd0, sector 461962936 Feb 26 11:00:11 server4 kernel: [1364198.846163] rbd: rbd0: read 1000 at 3711fd7000 (3d7000) Feb 26 11:00:11 server4 kernel: [1364198.846166] rbd: rbd0: result -6 xferred 1000 Feb 26 11:00:11 server4 kernel: [1364198.846168] blk_update_request: I/O error, dev rbd0, sector 461962936 Feb 26 11:00:11 server4 kernel: [1364198.934545] blk_update_request: I/O error, dev rbd0, sector 461961904 Feb 26 11:21:34 server4 kernel: [1365481.867101] rbd: rbd0: read 1000 at 3482027000 (27000) Feb 26 11:21:34 server4 kernel: [1365481.867106] rbd: rbd0: result -6 xferred 1000 Feb 26 11:21:34 server4 kernel: [1365481.867109] blk_update_request: I/O error, dev rbd0, sector 440467768 Feb 26 11:21:34 server4 kernel: [1365481.875329] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block Feb 26 11:21:41 server4 kernel: [1365488.690117] rbd: rbd0: read 1000 at 3482027000 (27000) Feb 26 11:21:41 server4 kernel: [1365488.690122] rbd: rbd0: result -6 xferred 1000 Feb 26 11:21:41 server4 kernel: [1365488.690125] blk_update_request: I/O error, dev rbd0, sector 440467768 Feb 26 11:21:41 server4 kernel: [1365488.698154] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block Feb 26 11:21:47 server4 kernel: [1365494.610039] rbd: rbd0: read 1000 at 3482027000 (27000) Feb 26 11:21:47 server4 kernel: [1365494.610044] rbd: rbd0: result -6 xferred 1000 Feb 26 11:21:47 server4 kernel: [1365494.610047] blk_update_request: I/O error, dev rbd0, sector 440467768 Feb 26 11:21:47 server4 kernel: [1365494.617864] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block Feb 26 11:21:56 server4 kernel: [1365503.442835] rbd: rbd0: read 1000 at 3482027000 (27000) Feb 26 11:21:56 server4 kernel: [1365503.442840] rbd: rbd0: result -6 xferred 1000 Feb 26 11:21:56 server4 kernel: [1365503.442844] blk_update_request: I/O error, dev rbd0, sector 440467768 Feb 26 11:21:56 server4 kernel: [1365503.450469] EXT4-fs warning (device rbd0): htree_dirblock_to_tree:959: inode #13763564: lblock 0: comm mc: error -5 reading directory block Feb 26 11:22:05 server4 kernel: [1365512.593962] rbd: rbd0: read 1000 at 348203e000 (3e000) Feb 26 11:22:05 server4 kernel: [1365512.593969] rbd: rbd0: result -6 xferred 1000 Feb 26 11:22:05 server4 kernel: [1365512.593972] blk_update_request: I/O error, dev rbd0, sector 440467952

It has happened several times already after cluster rebalance and only with rbd devices. TGT iSCSI targets seems to be fine.

Files

Download all files

osdmap.5982 (32.1 KB) osdmap.5982		Ruslan Usifov, 03/02/2016 05:47 PM
osdmap.tar (400 KB) osdmap.tar		Ruslan Usifov, 03/03/2016 08:11 AM
osd-logs.tar (310 KB) osd-logs.tar		Ruslan Usifov, 03/04/2016 09:15 AM
osdmap.5800-5969.tar.gz (643 KB) osdmap.5800-5969.tar.gz		Ruslan Usifov, 03/04/2016 09:33 AM
osdmap.5982-6000.tar.gz (66.3 KB) osdmap.5982-6000.tar.gz		Ruslan Usifov, 03/04/2016 09:33 AM
syslog.7.gz (6.5 KB) syslog.7.gz		Ruslan Usifov, 03/04/2016 03:00 PM
osdmap.5600-5800.tar.gz (286 KB) osdmap.5600-5800.tar.gz		Ruslan Usifov, 03/04/2016 03:05 PM
syslog.27.02.2016.gz (2.83 KB) syslog.27.02.2016.gz		Ruslan Usifov, 03/04/2016 03:15 PM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Ilya Dryomov about 8 years ago

Is there anything else in dmesg? What is general state of the cluster, any warnings in ceph -s, osd logs?

Do you use rbd snapshots and/or clones?

How exactly are you increasing the number of PGs (commands you are issuing)? Are those errors transient or do they persist?

Can you reproduce with "debug ms = 1" set on the OSDs?

We increase pg_count from 512 to 1024, and see rbd errors at first time. Nextime one of the osd in our cluster was fail due HDD failure which triggered the procedure of ceph rebalance, and we see rbd IO errors again. And at final step yesterday we aditionly add 9 new osd, and rebalance heppens again, and as a result we see rbd IO errors again. To fix rbd and filesistem failure we umount filesistem, umap rbd, then map rbd and mount file system

We doesn't use snapshots and/or clones.

Actions

Copy link

Updated by Ruslan Usifov about 8 years ago

our dmsg log is clean, and doens't have any suspicious records(only boot log, thats all). When rbd erros happens last time (we launch massivly rsync). ceph will by in health state, like now:

    cluster df060192-7369-42dd-bb9c-db18bb6f3355
     health HEALTH_OK
     monmap e5: 6 mons at {ceph1=192.168.30.101:6789/0,ceph2=192.168.30.102:6789/0,ceph3=192.168.30.103:6789/0,ceph4=192.168.30.104:6789/0,ceph5=192.168.30.105:6789/0,ceph6=192.168.30.106:6789/0}
            election epoch 222, quorum 0,1,2,3,4,5 ceph1,ceph2,ceph3,ceph4,ceph5,ceph6
     osdmap e5982: 49 osds: 49 up, 49 in
            flags sortbitwise
      pgmap v1069321: 1024 pgs, 1 pools, 6193 GB data, 1564 kobjects
            12411 GB used, 38762 GB / 51174 GB avail
                1024 active+clean
  client io 6403 B/s wr, 1 op/s

Actions

Copy link

Updated by Ruslan Usifov about 8 years ago

We found follow warn in osd logs when we make massive rsync

2016-02-26 11:06:25.969815 7f16750af700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795379 pg 2
.ebfbb119 to osd.19 not [27,19] in e5982/5982
2016-02-26 11:06:26.046774 7f16750af700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795381 pg 2
.ebfbb119 to osd.19 not [27,19] in e5982/5982
2016-02-26 11:06:26.800276 7f16728aa700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795390 pg 2
.c5547424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:06:26.907875 7f16750af700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795393 pg 2
.c5547424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:06:51.353003 7f16728aa700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795598 pg 2
.76468424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:06:51.558168 7f1669e82700  0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6825/2555 pipe(0x7f169c13e000 sd=97 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f1
6aa043b80).accept connect_seq 325 vs existing 325 state standby
2016-02-26 11:06:51.558786 7f1669e82700  0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6825/2555 pipe(0x7f169c13e000 sd=97 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f1
6aa043b80).accept connect_seq 326 vs existing 325 state standby
2016-02-26 11:06:52.166995 7f16728aa700  0 log_channel(cluster) log [WRN] : client.157872 192.168.31.220:0/3211780995 misdirected client.157872.1:795601 pg 2
.76468424 to osd.19 not [22,19] in e5982/5982
2016-02-26 11:15:17.370983 7f165f45f700  0 -- 192.168.30.103:6805/1974 >> 192.168.30.101:6817/2398 pipe(0x7f169f2de000 sd=211 :6805 s=0 pgs=0 cs=0 l=0 c=0x7f
16aa0426e0).accept connect_seq 351 vs existing 351 state standby

192.168.31.220 is an ip of rbd maped instance

Actions

Copy link

Updated by Ilya Dryomov about 8 years ago

Subject changed from Rbd read errors after rebalance to misdirected requests on 4.2 during rebalancing
Status changed from New to 12
Assignee set to Ilya Dryomov

Actions

Copy link

Updated by Ilya Dryomov about 8 years ago

Project changed from Ceph to Linux kernel client

Actions

Copy link

Updated by Ilya Dryomov about 8 years ago

Project changed from Linux kernel client to Ceph

Hi Ruslan,

Can you do

$ ceph osd getmap 5982 >osdmap.5982

and attach that file?

Actions

Copy link

Updated by Ilya Dryomov about 8 years ago

Project changed from Ceph to Linux kernel client

Actions

Copy link

Updated by Ruslan Usifov about 8 years ago

File osdmap.5982 osdmap.5982 added

Hello

Ready, result in attachment

Actions

Copy link

Updated by Ilya Dryomov about 8 years ago

Hmm. Can you do the same for each from 5970 to 5981 and attach a .tar with those?

Actions

Copy link

#10

Updated by Ruslan Usifov about 8 years ago

File osdmap.tar osdmap.tar added

Osd maps from 5970 to 5981 are in attachment

Actions

Copy link

#11

Updated by Ilya Dryomov about 8 years ago

Was the massive rsync you've provided warnings for done during rebalancing or after rebalancing completed? Do you have more instances of these warnings? If so, provide all of them, please.

Could you also attach more osdmaps? 5982..6000 and, say, 5800..5969?

Actions

Copy link Download all files

#12

Updated by Ruslan Usifov about 8 years ago

File osd-logs.tar osd-logs.tar added
File osdmap.5800-5969.tar.gz osdmap.5800-5969.tar.gz added
File osdmap.5982-6000.tar.gz osdmap.5982-6000.tar.gz added

We waited when rebalance was done, then launch rsync.

And yes of course, misderected happens doesn't only one node.

We attach all logs that we collect that have misderected WRN rocords.

Also i must mention that we have multiple rbd devices, and rbd that we talk about hosted on 192.168.31.220

osd maps 5982..6000 and, 5800..5969 in appropriate attachments

Also we must said, that after situation that we describe in this issue, we have another disk fail, so osdmaps that we put in attachments my be wrong, and not describe fully picture?

Actions

Copy link

#13

Updated by Ilya Dryomov about 8 years ago

This seems to be more serious than I thought. Could you send over as many osdmaps going back from 5800 as possible (they are periodically trimmed, but I think you should be able to get down to ~5600)? Another thing that could help is the full syslog for Feb 26 +/- a day.

Actions

Copy link Download all files

#14

Updated by Ruslan Usifov about 8 years ago

File syslog.7.gz syslog.7.gz added
File osdmap.5600-5800.tar.gz osdmap.5600-5800.tar.gz added

We attached syslog from 192.169.31.220 for 26 feb 2016

We make osdmap, but maps exist only begin from 5732

Actions

Copy link

#15

Updated by Ruslan Usifov about 8 years ago

File syslog.27.02.2016.gz syslog.27.02.2016.gz added

Aditional attach syslog for 27-02-2016

Actions

Copy link

#16

Updated by Ilya Dryomov about 8 years ago

What about a syslog for the night of Feb 26 - the attached one starts at ~7 o'clock? I need to see syslogs for the entire 26th from both 31.220 and 30.4.

Actions

Copy link

#17

Updated by Ruslan Usifov about 8 years ago

Sorry but more old logs alredy rotated and we haven't them

Actions

Copy link

#18

Updated by Ilya Dryomov about 8 years ago

It's not immediately obvious what the problem is - there are a few different possibilities and not enough information to go on. Would you be willing to try to reproduce this with logging enabled? Ideally, I would like you to install a custom kernel (.deb package built on our gitbuilders) on 31.220 and 30.4 (or at least one of them) and enable more verbose logging on the OSD side. If installing a custom kernel is something you can't do, verbose OSD logs and some limited logs from the kernel you've got installed might still help...

Actions

Copy link

#19

Updated by Ruslan Usifov about 8 years ago

We will try! But where we can get custom kernels, and whats is scenario of reproduce? Set weight of some osd to 0? Or something else?

Actions

Copy link

#20

Updated by Ilya Dryomov about 8 years ago

If I had a reproducer, I wouldn't be asking you to reproduce ;) I would suggest trying to replicate what you described above as close as possible; changing weights to trigger rebalancing is a good idea. I tried those things on my local cluster but didn't see a single misdirected request - probably because my cluster is too small or I'm doing something different from what you did.

Would you be open to trying to replicate one of the occurrences (increasing PG count, OSD failure, adding OSDs) on that same cluster without installing a custom kernel, but with logging enabled? "debug ms = 1" for MONs and OSDs and

# modprobe rbd
# echo 'file osdmap.c +pt' >/sys/kernel/debug/dynamic_debug/control
# echo 'file osd_client.c +pt' >/sys/kernel/debug/dynamic_debug/control
# echo 'format get_osd -pt' >/sys/kernel/debug/dynamic_debug/control
# echo 'format put_osd -pt' >/sys/kernel/debug/dynamic_debug/control

on the kernel client boxes? That might be just enough and we won't have to bother with a custom kernel.

Actions

Copy link

#21

Updated by Ilya Dryomov almost 8 years ago

Category set to rbd
Status changed from 12 to Need More Info
Priority changed from High to Low
Tags deleted (~~rbd~~)

OSD client has been rewritten in 4.7.

Actions

Copy link

#22

Updated by Ilya Dryomov almost 8 years ago

Category changed from rbd to libceph

Actions

Copy link

#23

Updated by Ilya Dryomov almost 8 years ago

Status changed from Need More Info to In Progress
Priority changed from Low to Urgent

OK, so this is not fixed in 4.7 (yet). After two more reports, I went through all the logs again and managed to reproduce at least one scenario.

Actions

Copy link

#24

Updated by Ilya Dryomov almost 8 years ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

#25

Updated by Ruslan Usifov almost 8 years ago

Hello

We have not very new hdd that may fail quite often, so what you need from us to hel to you? Debug output consume very big space, so we can't hold it alltime.

Actions

Copy link

#26

Updated by Ilya Dryomov almost 8 years ago

Nothing - I think I fixed it. The patch just needs to be reviewed and will be merged ASAP, hopefully into the final 4.7 kernel.

Actions

Copy link

#27

Updated by Ruslan Usifov almost 8 years ago

Perhaps a stupid question. Have we any chanse to use try this new rbd driver on kernel 4.4? We use Ubuntu 14.4 and this it difficult to us use unstable kernels

Actions

Copy link

#28

Updated by Ilya Dryomov almost 8 years ago

Not stupid at all. The fix is set to be backported to 3.15+, it might take a bit more time for it to propagate to Ubuntu kernels, but it'll get there eventually.

Actions

Copy link

#29

Updated by Ilya Dryomov almost 8 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#30

Updated by Ilya Dryomov almost 8 years ago

libceph: apply new_state before new_up_client on incrementals

merged into 4.7-final. Additional backport for 3.10-3.14 submitted.

Actions

Copy link

#31

Updated by Ruslan Usifov over 7 years ago

Is there any way to know when backport will hit 4.4 Ubuntu kernel?

Actions

Copy link

#32

Updated by Ilya Dryomov over 7 years ago

A couple of weeks from now? It's up to the Ubuntu kernel team.

Actions

Copy link

#33

Updated by Ilya Dryomov over 7 years ago

Status changed from Pending Backport to Resolved

In 3.14.75, 3.18.39, 4.1.30, 4.4.17 and 4.6.6.

Actions

Copy link

#34

Updated by Ilya Dryomov about 7 years ago

In 3.16.39.

Actions

Copy link

#35

Updated by Ilya Dryomov about 7 years ago

Has duplicate Bug #18807: I/O error on rbd device after adding new OSD to Crush map added

Actions

Copy link

#36

Updated by Ilya Dryomov about 7 years ago

Related to Bug #19122: pre-jewel "osd rm" incrementals are misinterpreted (kernel client) added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #14901

misdirected requests on 4.2 during rebalancing

Updated by Ilya Dryomov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ruslan Usifov about 8 years ago

Updated by Ilya Dryomov about 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ruslan Usifov almost 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ruslan Usifov almost 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ilya Dryomov almost 8 years ago

Updated by Ruslan Usifov over 7 years ago

Updated by Ilya Dryomov over 7 years ago

Updated by Ilya Dryomov over 7 years ago

Updated by Ilya Dryomov about 7 years ago

Updated by Ilya Dryomov about 7 years ago

Updated by Ilya Dryomov about 7 years ago