Project

General

Profile

Bug #19744

mon: pg_create not re-sent after primary osd changed

Added by Sage Weil 6 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
04/21/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

This happened during an upgrade test, but I think the issue has nothing to do with upgrades.

pg_create arrives while osd is starting (pool create and osd restart race! this is probably why teh upgrade suite triggered it):

2017-04-20 20:35:45.300933 7fef1a23e700  1 -- 172.21.15.67:6811/13837 <== mon.0 172.21.15.47:6789/0 10 ==== osd_pg_create(e6 3.2:6 3.4:6 3.7:6) v3 ==== 235+0+0 (1372823504 0 0) 0x7fef36d83200 con 0x7fef36c4a000
2017-04-20 20:35:45.300946 7fef1a23e700 20 osd.1 5 OSD::ms_dispatch: osd_pg_create(e6 3.2:6 3.4:6 3.7:6) v3
2017-04-20 20:35:45.300950 7fef1a23e700 10 osd.1 5 do_waiters -- start
2017-04-20 20:35:45.300952 7fef1a23e700 10 osd.1 5 do_waiters -- finish
2017-04-20 20:35:45.300954 7fef1a23e700 20 osd.1 5 _dispatch 0x7fef36d83200 osd_pg_create(e6 3.2:6 3.4:6 3.7:6) v3
2017-04-20 20:35:45.300969 7fef1a23e700 10 osd.1 5 handle_pg_create osd_pg_create(e6 3.2:6 3.4:6 3.7:6) v3
2017-04-20 20:35:45.300974 7fef1a23e700 15 osd.1 5 require_same_or_newer_map 6 (i am 5) 0x7fef36d83200
2017-04-20 20:35:45.300976 7fef1a23e700  7 osd.1 5 waiting for newer map epoch 6 > my 5 with 0x7fef36d83200

the osd goes active..

2017-04-20 20:35:47.446319 7fef1f17e700  1 osd.1 8 state: booting -> active

and then we reprocess teh message

2017-04-20 20:35:47.456488 7fef1923c700 10 osd.1 8 handle_pg_create osd_pg_create(e6 3.2:6 3.4:6 3.7:6) v3
2017-04-20 20:35:47.456494 7fef1923c700 15 osd.1 8 require_same_or_newer_map 6 (i am 8) 0x7fef36d83200
2017-04-20 20:35:47.456496 7fef1923c700  7 osd.1 8 from pre-up epoch 6 < 8

History

#1 Updated by Sage Weil 6 months ago

I think the require_self_aliveness() can be ignored in the pg_create case. That's the simplest fix.

ALternatively, subscriptions could be reset when the osd goes active, but i don't think the mon protocol lets you do that. :(

#2 Updated by Sage Weil 6 months ago

Actually, wait: I think the mon should recalculate the pg mapping when the osd goes "up" in 8 and resend the create message?

2017-04-20 20:35:47.291125 7fe759cb4700 10 mon.b@0(leader).osd e7 encode_pending e 8
2017-04-20 20:35:47.291129 7fe759cb4700  1 mon.b@0(leader).osd e7 encode_pending skipping prime_pg_temp; mapping job 0x7fe76ca716c0 is prior epoch 7
2017-04-20 20:35:47.291163 7fe759cb4700  2 mon.b@0(leader).osd e7  osd.1 UP 172.21.15.67:6811/13837
2017-04-20 20:35:47.291174 7fe759cb4700  2 mon.b@0(leader).osd e7  osd.2 UP 172.21.15.47:6800/2023
2017-04-20 20:35:47.291177 7fe759cb4700  2 mon.b@0(leader).osd e7  osd.3 UP 172.21.15.47:6806/2088
2017-04-20 20:35:47.291188 7fe759cb4700 10 mon.b@0(leader).osd e7 encode_pending encoding without feature SERVER_LUMINOUS
2017-04-20 20:35:47.291191 7fe759cb4700 10 mon.b@0(leader).osd e7 encode_pending encoding without feature SERVER_KRAKEN | MSG_ADDR2
2017-04-20 20:35:47.291192 7fe759cb4700 10 mon.b@0(leader).osd e7 encode_pending encoding full map with 575862587619852283
2017-04-20 20:35:47.291259 7fe759cb4700 20 mon.b@0(leader).osd e7  full_crc 2470623810 inc_crc 77687141
2017-04-20 20:35:47.291282 7fe759cb4700 10 mon.b@0(leader).osd e7 scan_for_creating_pgs no change in pool 0 replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 1 flags hashpspool stripe_width 0
2017-04-20 20:35:47.291292 7fe759cb4700 10 mon.b@0(leader).osd e7 scan_for_creating_pgs no change in pool 1 replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3 flags hashpspool stripe_width 0
2017-04-20 20:35:47.291296 7fe759cb4700 10 mon.b@0(leader).osd e7 scan_for_creating_pgs no change in pool 2 replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 5 flags hashpspool crash_replay_interval 5 stripe_width 0
2017-04-20 20:35:47.291300 7fe759cb4700 10 mon.b@0(leader).osd e7 scan_for_creating_pgs no change in pool 3 replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 6 flags hashpspool stripe_width 0
2017-04-20 20:35:47.291304 7fe759cb4700  7 mon.b@0(leader).osd e7 encode_pending in the middle of upgrading,  trimming pending creating_pgs using pgmap
...
2017-04-20 20:35:47.435749 7fe75c4b9700 10 update_creating_pgs to 8 pgs, osdmap epoch 8
2017-04-20 20:35:47.435763 7fe75c4b9700 20 update_creating_pgs  3.0  acting_primary: 0 -> 3 acting: [0] -> [3,0] up_primary: 0 -> 3 up: [0] -> [3,0]
2017-04-20 20:35:47.435772 7fe75c4b9700 20 update_creating_pgs  3.1  acting_primary: 0 -> 0 acting: [0] -> [0,2] up_primary: 0 -> 0 up: [0] -> [0,2]
2017-04-20 20:35:47.435781 7fe75c4b9700 20 update_creating_pgs  3.2  acting_primary: -1 -> 1 acting: [] -> [1,3] up_primary: -1 -> 1 up: [] -> [1,3]
2017-04-20 20:35:47.435789 7fe75c4b9700 20 update_creating_pgs  3.3  acting_primary: 0 -> 0 acting: [0] -> [0,2] up_primary: 0 -> 0 up: [0] -> [0,2]
2017-04-20 20:35:47.435799 7fe75c4b9700 20 update_creating_pgs  3.4  acting_primary: -1 -> 1 acting: [] -> [1,2] up_primary: -1 -> 1 up: [] -> [1,2]
2017-04-20 20:35:47.435804 7fe75c4b9700 20 update_creating_pgs  3.5  acting_primary: 0 -> 2 acting: [0] -> [2,0] up_primary: 0 -> 2 up: [0] -> [2,0]
2017-04-20 20:35:47.435810 7fe75c4b9700 20 update_creating_pgs  3.6  acting_primary: -1 -> 2 acting: [] -> [2,1] up_primary: -1 -> 2 up: [] -> [2,1]
2017-04-20 20:35:47.435818 7fe75c4b9700 20 update_creating_pgs  3.7  acting_primary: -1 -> 1 acting: [] -> [1,3] up_primary: -1 -> 1 up: [] -> [1,3]
2017-04-20 20:35:47.435824 7fe75c4b9700 10 update_creating_pgs 8 pgs changed primary

.. but nothing is sent.

/a/sage-2017-04-20_18:45:07-upgrade:jewel-x-master---basic-smithi/1049789

#3 Updated by Sage Weil 6 months ago

  • Assignee set to Kefu Chai

#4 Updated by Greg Farnum 6 months ago

This is definitely on the mon. If the OSD ignored the epoch mappings it could create a PG which was created by somebody else in epoch 7!

#5 Updated by Kefu Chai 6 months ago

  • Subject changed from osd: pg_create ignored to mon: pg_create not re-sent after primary osd changed

#6 Updated by Kefu Chai 6 months ago

  • Status changed from New to Need Review

#7 Updated by Sage Weil 6 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF