Project

General

Profile

Actions

Bug #19787

closed

mon: send wrong pg create epoch

Added by Sage Weil almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-04-26 18:00:43.023153 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.2 to create 3.e

but later
2017-04-26 18:00:45.695850 7f32887f0700 20 mon.c@2(peon).osd e11 send_pg_creates will create 3.d at 11
2017-04-26 18:00:45.695852 7f32887f0700 20 mon.c@2(peon).osd e11 send_pg_creates will create 3.e at 11
2017-04-26 18:00:45.695853 7f32887f0700 20 mon.c@2(peon).osd e11 send_pg_creates will create 3.f at 11
2017-04-26 18:00:45.695850 7f32867ec700 10 mon.c@2(peon).paxosservice(mgr 1..1) post_refresh
2017-04-26 18:00:45.695854 7f32887f0700  1 -- 172.21.15.65:6790/0 --> 172.21.15.158:6804/21815 -- osd_pg_create(e11 3.0:11 3.1:11 3.2:11 3.3:11 3.4:11 3.5:11 3.6:11 3.7:11 3.8:11 3.c:11 3.d:11 3.e:11 3.f:11) v3 -- 0x7f329d6be900 con 0

/a/sage-2017-04-26_17:39:56-upgrade:jewel-x-wip-past-intervals---basic-smithi/1070261

(but ignore the osd behavior. in this branch the osd makes strong assertions about past_intervals and ultimately crashes because of the bad creation epoch.)

Actions #1

Updated by Kefu Chai almost 7 years ago

  • Assignee set to Kefu Chai
Actions #2

Updated by Kefu Chai almost 7 years ago

2017-04-26 18:00:43.023136 7f32887f0700 10 osdmap epoch 8 mapping took 0.000090 seconds
2017-04-26 18:00:43.023138 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.1 to create 3.0
2017-04-26 18:00:43.023142 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.0 to create 3.1
2017-04-26 18:00:43.023143 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.3 to create 3.2
2017-04-26 18:00:43.023145 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.0 to create 3.3
2017-04-26 18:00:43.023146 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.1 to create 3.4
2017-04-26 18:00:43.023146 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.2 to create 3.5
2017-04-26 18:00:43.023148 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.0 to create 3.6
2017-04-26 18:00:43.023148 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.3 to create 3.7
2017-04-26 18:00:43.023149 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.3 to create 3.8
2017-04-26 18:00:43.023150 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.3 to create 3.9
2017-04-26 18:00:43.023151 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.3 to create 3.a
2017-04-26 18:00:43.023152 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.3 to create 3.b
2017-04-26 18:00:43.023152 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.0 to create 3.c
2017-04-26 18:00:43.023153 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.3 to create 3.d
2017-04-26 18:00:43.023153 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.2 to create 3.e
2017-04-26 18:00:43.023155 7f32887f0700 10 mon.c@2(peon).osd e8 update_creating_pgs will instruct osd.0 to create 3.f
..
2017-04-26 18:00:43.332882 7f32887f0700 10 osdmap epoch 9 mapping took 0.000142 seconds
2017-04-26 18:00:43.332887 7f32887f0700 10 mon.c@2(peon).osd e9 update_creating_pgs will instruct osd.1 to create 3.0
..
2017-04-26 18:00:44.602310 7f3287fef700 20 mon.c@2(peon).osd e10 update_creating_pgs 3.0  acting_primary:1 -> 3
2017-04-26 18:00:44.602319 7f3287fef700 10 mon.c@2(peon).osd e10 update_creating_pgs will instruct osd.3 to create 3.0
..
2017-04-26 18:00:44.607188 7f32867ec700 20 mon.c@2(peon).osd e10 send_pg_creates will create 3.0 at 10
..
2017-04-26 18:00:44.607208 7f32867ec700  1 -- 172.21.15.65:6790/0 --> 172.21.15.65:6807/12716 -- osd_pg_create(e10 3.0:10 3.9:8 3.a:8 3.c:10 3.d:8) v3 -- 0x7f329d5b4240 con 0
..
2017-04-26 18:00:44.719579 7f32867ec700 20 mon.c@2(peon).pg v14  refreshing pg 3.0 0:0 creating
...
2017-04-26 18:00:45.695762 7f32887f0700 10 osdmap epoch 11 mapping took 0.000100 seconds
2017-04-26 18:00:45.695765 7f32887f0700 20 mon.c@2(peon).osd e11 update_creating_pgs 3.0  acting_primary:3 -> 0
..
2017-04-26 18:00:45.695767 7f32887f0700 10 mon.c@2(peon).osd e11 update_creating_pgs will instruct osd.0 to create 3.0
..
2017-04-26 18:00:45.695837 7f32887f0700 20 mon.c@2(peon).osd e11 send_pg_creates will create 3.0 at 11

so, when mon.c was sending pg creates to osd.0, osd.0 just took the place of osd.3, and became the primary of pg 3.0, that's why the pg create epoch was 11. so i think the behavior of mon is expected.

Actions #3

Updated by Sage Weil almost 7 years ago

iirc 'created' is supposed to be the epoch the pg is logically created in (when the pool is created or pg_num is increased to trigger the split).

maybe you're thinking of the other epoch value, which is similar to same_primary_since, which is/was used to keep track of whether hte pg_create message has been sent to the current primary osd?

Actions #4

Updated by Kefu Chai almost 7 years ago

  • Status changed from New to Fix Under Review
Actions #5

Updated by Sage Weil almost 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF