Project

General

Profile

Backport #14592

osd crashes when handling a stale pg-create message (hammer)

Added by Kefu Chai about 8 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
-
Release:
hammer
Crash signature (v1):
Crash signature (v2):

Description

see http://tracker.ceph.com/issues/13990#note-39.

reproduce steps:

  1. monitor sends pg-create messages, so the pg_create.created is pool.last_change, while the newly pool.last_change is the OSDMonitor.pending_inc.epoch at that moment. but somehow these PGs fail to create because
    • some osd is down but not out, or
    • some osd's osd_debug_drop_pg_create_probability is 1.0 // this option is only available in hammer, or
    • the osd just starts up, so it's waiting for an osdmap
  2. and some changes are happening in the meantime, which update the osdmap. once the number osdmap epochs reach the threshold, monitor starts to trim them
  3. the the OSDs are back to business, they start to process pg-create, and these pg-create messages carry old osdmaps which were already trimmed by mon and osd, so when osd try to build the prior set, they are missing. so assert failure!

a possible fix could be:

monitor should not trim the osdmaps until the pg-create which references them gets processed by osd.


Related issues

Related to Ceph - Bug #13990: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) Resolved 12/05/2015

History

#1 Updated by Kefu Chai about 8 years ago

  • Related to Bug #13990: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) added

#2 Updated by Samuel Just about 8 years ago

  • Priority changed from Normal to Urgent

#3 Updated by Samuel Just about 8 years ago

  • Assignee set to Samuel Just

#4 Updated by Samuel Just about 8 years ago

Creating teuthology reproducer.

#5 Updated by Sage Weil about 8 years ago

  • Status changed from 12 to Fix Under Review

#6 Updated by Sage Weil about 8 years ago

  • Status changed from Fix Under Review to 12

oops, wrong bug

#7 Updated by Sage Weil about 8 years ago

  • Priority changed from Urgent to Immediate

#8 Updated by Samuel Just about 8 years ago

  • Assignee changed from Samuel Just to Kefu Chai

I haven't actually made any progress on this. Kefu, can you implement the fix where the mon doesn't trim maps for creating pgs after the created epoch?

#9 Updated by Kefu Chai about 8 years ago

  • Description updated (diff)

#10 Updated by Kefu Chai about 8 years ago

  • Description updated (diff)

#11 Updated by Kefu Chai almost 8 years ago

version_t OSDMonitor::get_trim_to()
{
  if (mon->pgmon()->is_readable() &&
      mon->pgmon()->pg_map.creating_pgs.empty()) {
    // ...
  }
  return 0;
}

so if monitor has one or more PGs being created, it won't trim any map.

2016-01-14 12:13:35.447833 7f2d404c4700 10 osd.28 336 handle_pg_create osd_pg_create(pg0.5,1@2016-01-12 09:59:56.672414; pg0.19,1@2016-01-12 09:59:56.672426; pg0.2e,1@2016-01-12 09:59:56.672444; pg0.38,1@2016-01-12 09:59:56.672450; pg0.3b,1@2016-01-12 09:59:56.672451; ) v3
2016-01-14 12:13:35.447846 7f2d404c4700 15 osd.28 336 require_same_or_newer_map 336 (i am 336) 0x4f05440
2016-01-14 12:13:35.447860 7f2d404c4700 20 osd.28 336 mkpg 0.5 e1@2016-01-12 09:59:56.672414
2016-01-14 12:13:35.447870 7f2d2c49c700 20 osd.28 336 agent_entry empty queue
2016-01-14 12:13:35.447917 7f2d404c4700 15 osd.28 336 project_pg_history 0.5 from 1 to 336, start ec=1 les/c 0/1 0/0/0
2016-01-14 12:13:35.448022 7f2d404c4700 15 osd.28 336 project_pg_history 0.5 acting|up changed in 336 from []/[] -1/-1 -> [28]/[28] 28/28
2016-01-14 12:13:35.448026 7f2d404c4700 15 osd.28 336 project_pg_history 0.5 up changed in 336 from [] -1 -> [28] 28
2016-01-14 12:13:35.448029 7f2d404c4700 15 osd.28 336 project_pg_history 0.5 primary changed in 336
2016-01-14 12:13:35.448163 7f2d404c4700 15 osd.28 336 project_pg_history end ec=1 les/c 0/1 336/336/336
2016-01-14 12:13:35.448167 7f2d404c4700 15 osd.28 336 calc_priors_during 0.5 [1,336)

all the PGs are being created belong to the "rbd" pool. they did have any acting osd in the previous epoch.

#12 Updated by Kefu Chai almost 8 years ago

the fix of #15241 might also apply to this issue, but the epoch of the stale pg-create message from monitor was not printed anywhere, so i can hardly tell if the message->epoch is less than osdmap.epoch or not, but we are sure that the "history.same_primary_since" is always the current osd epoch.

and the strange thing is that the monitor removed some osdmap [1,10].

132> 2016-01-13 15:25:53.816846 7f9b482dc700 1 -- 10.3.20.166:6800/2874 <== mon.2 10.3.20.169:6789/0 5 ==== osd_map(188..188 src has 11..188) v3 ==== 24386+0+0 (4212435560 0 0) 0x4560d80 con 0x43c78c0

if the pg0.5 was not created before, and its primary changed from -1 to osd.28, it's MOSDPGCreate::mkpg[pg].created should be the pool.get_last_change(), which is in turn the epoch when the inc map is applied, and the "rbd" pool was created.

but these PGs were never created, how come the monitor trimmed the osdmap? no idea.

#13 Updated by Sage Weil almost 8 years ago

  • Subject changed from osd crashes when handling a stale pg-create message to osd crashes when handling a stale pg-create message (hammer)
  • Priority changed from Immediate to Urgent

#14 Updated by Kefu Chai almost 8 years ago

  • Release set to hammer

#15 Updated by Nathan Cutler over 7 years ago

  • Tracker changed from Bug to Backport

Moving to backport tracker because it will presumably be a hammer-only change.

#16 Updated by Samuel Just over 7 years ago

  • Priority changed from Urgent to High

#17 Updated by Sage Weil almost 7 years ago

  • Status changed from 12 to Closed

not worth fixing. this code has been rewritten 2x now anyway.

Also available in: Atom PDF