Project

General

Profile

Actions

Bug #19119

closed

pre-jewel "osd rm" incrementals are misinterpreted

Added by Ilya Dryomov about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
OSDMap
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a bunch of misdirected requests from a recent kernel client to a hammer cluster, triggered by osd rm:

2017-02-27 15:37:56.976845 osd.190 10.115.1.133:6808/3914 97 : cluster [WRN] client.9450549 10.115.1.35:0/1493770383 misdirected client.9450549.1:1379645861 pg 2.ec640804 to osd.190 in e241865, client e241865 pg 2.804 features 288863570635346

e241864 -> e241865 incremental:

{
    "epoch": 241865,
    "fsid": "9e3e9015-f626-4a44-83f7-0a939ef7ec02",
    "modified": "2017-02-27 11:07:56.497658",
    "new_pool_max": -1,
    "new_flags": -1,
    "new_max_osd": -1,
    "new_pools": [],
    "new_pool_names": [],
    "old_pools": [],
    "new_up_osds": [],
    "new_weight": [],
    "osd_state_xor": [
        {
            "osd": 204,
            "state_xor": [
                "autoout",
                "exists" 
            ]
        },
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [
            {
                "osd": 204,
                "uuid": "00000000-0000-0000-0000-000000000000" 
            }
        ],
        {},
        []
    ]

On master:

$ bin/osdmaptool --test-map-pg 2.ec640804 /tmp/map-241864.bin 
bin/osdmaptool: osdmap file '/tmp/map-241864.bin'
 parsed '2.ec640804' -> 2.ec640804
2.ec640804 raw ([197,201,1], p197) up ([197,201,1], p197) acting ([197,201,1], p197)
$ bin/osdmaptool --test-map-pg 2.ec640804 /tmp/map-241865.bin 
bin/osdmaptool: osdmap file '/tmp/map-241865.bin'
 parsed '2.ec640804' -> 2.ec640804
2.ec640804 raw ([197,201,1], p197) up ([197,201,1], p197) acting ([197,201,1], p197)

but (with osdmaptool patched to accept and apply incrementals):

$ bin/osdmaptool --test-map-pg 2.ec640804 /tmp/map-241864.bin /tmp/inc-241865.bin 
bin/osdmaptool: osdmap file '/tmp/map-241864.bin'
bin/osdmaptool: incremental file '/tmp/inc-241865.bin'
 parsed '2.ec640804' -> 2.ec640804
2.ec640804 raw ([190,1], p190) up ([190,1], p190) acting ([190,1], p190)

which is where the misdirected request was sent.

On hammer:

$ ./osdmaptool --test-map-pg 2.ec640804 /tmp/map-241864.bin
./osdmaptool: osdmap file '/tmp/map-241864.bin'
 parsed '2.ec640804' -> 2.ec640804
2.ec640804 raw ([197,201,1], p197) up ([197,201,1], p197) acting ([197,201,1], p197)
$ ./osdmaptool --test-map-pg 2.ec640804 /tmp/map-241865.bin
./osdmaptool: osdmap file '/tmp/map-241865.bin'
 parsed '2.ec640804' -> 2.ec640804
2.ec640804 raw ([197,201,1], p197) up ([197,201,1], p197) acting ([197,201,1], p197)

and (same osdmaptool patch):

$ ./osdmaptool --test-map-pg 2.ec640804 /tmp/map-241864.bin /tmp/inc-241865.bin
./osdmaptool: osdmap file '/tmp/map-241864.bin'
./osdmaptool: incremental file '/tmp/inc-241865.bin'
 parsed '2.ec640804' -> 2.ec640804
2.ec640804 raw ([197,201,1], p197) up ([197,201,1], p197) acting ([197,201,1], p197)

Files

osdmaptool.diff (2.08 KB) osdmaptool.diff Ilya Dryomov, 03/01/2017 06:45 PM

Related issues 3 (0 open3 closed)

Related to Ceph - Bug #13988: new OSD re-using old OSD id fails to bootResolvedLoïc Dachary12/05/2015

Actions
Copied to Ceph - Backport #19209: kraken: pre-jewel "osd rm" incrementals are misinterpretedResolvedShinobu KinjoActions
Copied to Ceph - Backport #19210: jewel: pre-jewel "osd rm" incrementals are misinterpretedResolvedShinobu KinjoActions
Actions #1

Updated by Ilya Dryomov about 7 years ago

It looks like Sage's commit in https://github.com/ceph/ceph/pull/6900 is the culprit. That "set weight to 1" was carried over into the kernel client in 4.7.

https://github.com/idryomov/ceph/commits/wip-osd-rm-incremental fixes it for me, but I'm not sure what to do with all the jewel and kraken maps...

Actions #2

Updated by Ilya Dryomov about 7 years ago

  • Related to Bug #13988: new OSD re-using old OSD id fails to boot added
Actions #3

Updated by Sage Weil about 7 years ago

  • Status changed from New to 12
  • Priority changed from Urgent to Immediate
Actions #4

Updated by Ilya Dryomov about 7 years ago

  • Status changed from 12 to Fix Under Review
  • Backport set to kraken,jewel
Actions #5

Updated by Ilya Dryomov about 7 years ago

  • Subject changed from hammer "osd rm" incrementals are misinterpreted to pre-jewel "osd rm" incrementals are misinterpreted
Actions #6

Updated by Ilya Dryomov about 7 years ago

Actions #7

Updated by Sage Weil about 7 years ago

  • Subject changed from pre-jewel "osd rm" incrementals are misinterpreted to hammer "osd rm" incrementals are misinterpreted
  • Description updated (diff)
  • Status changed from Fix Under Review to 12
Actions #8

Updated by Ilya Dryomov about 7 years ago

  • Subject changed from hammer "osd rm" incrementals are misinterpreted to pre-jewel "osd rm" incrementals are misinterpreted
  • Description updated (diff)
Actions #9

Updated by Ilya Dryomov about 7 years ago

  • Status changed from 12 to Pending Backport
Actions #10

Updated by Jan Fajerski about 7 years ago

  • Copied to Backport #19209: kraken: pre-jewel "osd rm" incrementals are misinterpreted added
Actions #11

Updated by Jan Fajerski about 7 years ago

  • Copied to Backport #19210: jewel: pre-jewel "osd rm" incrementals are misinterpreted added
Actions #12

Updated by Nathan Cutler almost 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF