Project

General

Profile

Bug #18751

hammer client generated misdirected op against jewel cluster

Added by Sage Weil 4 months ago. Updated about 2 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
01/31/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
kraken,jewel,hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

user on #ceph reported corruption in 2 VMs. crush tunables were firefly.

ceph log indicates misdirected ops:

2017-01-30 16:14:13.982137 osd.38 172.16.4.113:6804/3209 2411 : cluster [INF] 2.9d6 starting backfill to osd.30 from (0'0,0'0] MAX to 10520'1536861
2017-01-30 16:14:13.983909 osd.43 172.16.4.114:6806/3226 4054 : cluster [INF] 2.ae2 starting backfill to osd.30 from (0'0,0'0] MAX to 10520'2939297
2017-01-30 16:14:21.414895 mon.0 172.16.4.101:6789/0 2915930 : cluster [INF] pgmap v40007933: 14304 pgs: 1 active+clean+scrubbing, 1 active+recovery_wait+degraded+remapped, 1 activating+degraded, 17 active+degraded, 17 active+remapped+backfilling, 26 active+undersized+degraded+remapped+backfilling, 235 active+remapped+wait_backfill, 2 active+recovering+degraded, 937 active+undersized+degraded+remapped+wait_backfill, 3 active+remapped, 44 active+recovery_wait+degraded, 94 active+undersized+degraded+remapped, 12926 active+clean; 6701 GB data, 18776 GB used, 104 TB / 122 TB avail; 23241 kB/s rd, 51162 kB/s wr, 4281 op/s; 204337/4060808 objects degraded (5.032%); 365813/4060808 objects misplaced (9.008%)
2017-01-30 16:14:13.248319 osd.79 172.16.4.113:6808/3215 3416 : cluster [WRN] client.35260614 172.16.5.1:0/2077966938 misdirected client.35260614.0:142350068 pg 2.b3e06cc3 to osd.79 in e10523, client e10523 pg 2.cc3 features 55169095164739583
2017-01-30 16:14:13.375920 osd.79 172.16.4.113:6808/3215 3417 : cluster [WRN] client.35260614 172.16.5.1:0/2077966938 misdirected client.35260614.0:142350076 pg 2.b3e06cc3 to osd.79 in e10523, client e10523 pg 2.cc3 features 55169095164739583
2017-01-30 16:14:13.382900 osd.79 172.16.4.113:6808/3215 3418 : cluster [WRN] client.35260614 172.16.5.1:0/2077966938 misdirected client.35260614.0:142350079 pg 2.b3e06cc3 to osd.79 in e10523, client e10523 pg 2.cc3 features 55169095164739583
2017-01-30 16:14:13.390264 osd.79 172.16.4.113:6808/3215 3419 : cluster [WRN] client.35260614 172.16.5.1:0/2077966938 misdirected client.35260614.0:142350086 pg 2.b3e06cc3 to osd.79 in e10523, client e10523 pg 2.cc3 features 55169095164739583
2017-01-30 16:14:13.464428 osd.79 172.16.4.113:6808/3215 3420 : cluster [WRN] client.35260614 172.16.5.1:0/2077966938 misdirected client.35260614.0:142350088 pg 2.b3e06cc3 to osd.79 in e10523, client e10523 pg 2.cc3 features 55169095164739583
2017-01-30 16:14:13.959772 osd.73 172.16.4.111:6804/3540 3574 : cluster [INF] 2.12c starting backfill to osd.30 from (0'0,0'0] MAX to 10520'2415637
2017-01-30 16:14:13.960795 osd.73 172.16.4.111:6804/3540 3575 : cluster [INF] 2.f1b starting backfill to osd.30 from (0'0,0'0] MAX to 10521'588369
2017-01-30 16:14:13.960869 osd.73 172.16.4.111:6804/3540 3576 : cluster [INF] 2.2cb starting backfill to osd.30 from (0'0,0'0] MAX to 10520'6091890

and VMs got ENXIO which turned into EIO.

unclear why the op was seen to be misdirected


Related issues

Copied to Backport #18812: jewel: hammer client generated misdirected op against jewel cluster Resolved
Copied to Backport #18813: hammer: hammer client generated misdirected op against jewel cluster In Progress
Copied to Backport #19622: kraken: hammer client generated misdirected op against jewel cluster Resolved

History

#1 Updated by Sage Weil 4 months ago

  • Backport set to jewel,hammer

workaround is to not return ENXIO at all and instead let the IO appear to be hung. if we're lucky objecter will resend to another OSD or whatever and life will go on without any interruptions. worst case, the VM sees a hung IO instead of an EIO.

#2 Updated by Sage Weil 4 months ago

  • Status changed from Need More Info to Need Review

#3 Updated by Sage Weil 4 months ago

  • Description updated (diff)

#5 Updated by Yuri Weinstein 4 months ago

  • Status changed from Need Review to Pending Backport

#6 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #18812: jewel: hammer client generated misdirected op against jewel cluster added

#7 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #18813: hammer: hammer client generated misdirected op against jewel cluster added

#8 Updated by Nathan Cutler about 2 months ago

  • Backport changed from jewel,hammer to kraken,jewel,hammer

#9 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #19622: kraken: hammer client generated misdirected op against jewel cluster added

Also available in: Atom PDF