Project

General

Profile

Bug #4675

mon: pg creations don't get queued on mon startup

Added by Sage Weil almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
Start date:
04/06/2013
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
cuttlefish, bobtail
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

PGMonitor::send_pg_creates also divvies up pg creations among the current osds they map to. This happens from update_from_paxos(), and presumably also when osdmaps update. On startup, it happens from preinit() -> init_paxos(), which calls PGMonitor::update_from_paxos() before teh OSDMOnitor, which means the OSDMap is not loaded and everything maps to no OSD. Until there is an osdmap update, a reconnecting osd will fail to see creations queued for it.

The fix is probably to break the divvying out of send_pg_creates(), and then ensure that it is called at some other point during startup.

The result is that a pool creation that races with a mon restart will hang. Some other path also gets in this state, or there is a different bug, since it was triggered by the job below (osd thrashing only). In any case, after that hang, restarting the mon got into this buggy state, so it should get fixed regardless.

ubuntu@teuthology:/a/sage-2013-04-06_09:10:56-rados-wip-osd-throttle-testing-basic/9833$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: b0bb70d12c365872547f10d185bf88eba3ed6083
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: ext4
    log-whitelist:
    - slow request
    sha1: aca0aea1bfbafba9cab1b2c693760b824bd82d30
  s3tests:
    branch: master
  workunit:
    sha1: aca0aea1bfbafba9cab1b2c693760b824bd82d30
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rados/test.sh


Related issues

Related to Ceph - Bug #4813: pgs stuck creating Resolved 04/25/2013

Associated revisions

Revision a2fe0137 (diff)
Added by Sage Weil almost 6 years ago

mon: remap creating pgs on startup

After Monitor::init_paxos() has loaded all of the PaxosService state,
we should then map creating pgs to osds. This ensures we do so after the
osdmap has been loaded and the pgs actually map somewhere meaningful.

Fixes: #4675
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil almost 6 years ago

  • Status changed from New to Need Review
  • Priority changed from High to Urgent

wip-mon-pg

#2 Updated by Ian Colle almost 6 years ago

  • Assignee set to Greg Farnum

Greg - can you please review this wip branch?

#3 Updated by Greg Farnum almost 6 years ago

  • Status changed from Need Review to Need More Info

Okay, I've looked at the patches and I've looked at the bug description and I can't tell what the problem is here. The effective change from the patches is to queue PG creates less frequently. Each monitor calls update_from_paxos() when it boots or wins an election, and when the OSDMonitor updates then it tells the PGMonitor to check the map, which calculates these mappings again. So it should all be fine with or without the patches (which would not have any impact on the stuck-creating PGs that I see when I go look at the teuthology archive).
Unfortunately there are no logs that I can find, but I see that there were 8 PGs creating and it manages one of them; do we know this is a pool create and not a split? Why do we believe the issue is with the monitors?

#4 Updated by Greg Farnum almost 6 years ago

  • Assignee changed from Greg Farnum to Sage Weil

#5 Updated by Sage Weil almost 6 years ago

  • Status changed from Need More Info to Need Review
  • Assignee changed from Sage Weil to Greg Farnum

the problem is that update_from_apxos() is called on startup when the osdmap isn't loaded yet, so it remaps everything to [] and no creates are queued. then osdmap does get loaded, mon starts up, osds reconnect.. but if there are then no osdmap updates, then the creates never get recalculated with a non-broken value. unfortunately it's a difficult case to reproduce; i only saw it once with the hung qa run last week. easy fix though.

the quick fix is just the if get_epoch() != 0 chekcs in teh second patch. the first patch can wait.. eventually we'll want to be more explicit about when we recalc the mapping and when we send, although as you say that's not needed for cuttlefish.

#6 Updated by Greg Farnum almost 6 years ago

Okay, but an OSD booting creates a new OSD Map, which will lead to PGMonitor::check_pg_map(), which will lead to send_pg_creates(). I do see that we won't actually calculate them again when the OSDMap initially loads since we'll have seen it previously, so that assumption of mine wasn't quite right, but as soon as we have an OSD boot we're good.
So, thus the thrashing monitors but not thrashing OSDs. Okay, I see it now. But this won't actually fix that race either — the only callers of send_pg_creates are PGMonitor::update_from_paxos() and PGMonitor::check_osd_map(). The second is called whenever the OSD Map changes and the PGMap hasn't seen it before, but that's true for the same cases with and without these patches. PGMonitor::update_from_paxos() is also going to happen at the same times with or without these patches. So I'm still not seeing how these do anything. Just pushed a wip-4675-model that might handle this better; check it out?

Also I think the actual referenced hang here is a different issue than this race.

#7 Updated by Greg Farnum almost 6 years ago

  • Status changed from Need Review to In Progress

#8 Updated by Greg Farnum almost 6 years ago

  • Priority changed from Urgent to High

Also also, pretty sure a rare race with an easy workaround is not an urgent bug. :)

#9 Updated by Sage Weil almost 6 years ago

yep, not urgent. i'll take a look later. thanks!

#10 Updated by Greg Farnum almost 6 years ago

  • Assignee changed from Greg Farnum to Sage Weil

Giving this back since you're no longer on vacation.

#11 Updated by Sage Weil almost 6 years ago

  • Backport set to cuttlefish, bobtail

#12 Updated by Sage Weil almost 6 years ago

  • Status changed from In Progress to Need Review
  • Priority changed from High to Urgent

pushed updated wip-mon-pg

#13 Updated by Sage Weil almost 6 years ago

  • Status changed from Need Review to Resolved

merged the fix for the mon restart case. 6a5be251df0e14ec66fb868ff6a6ef6e08d539c6

there is likely still a similar bug lurking, though, that can trigger when the mon hasn't been restarted. see #4813

Also available in: Atom PDF