Project

General

Profile

Actions

Bug #4675

closed

mon: pg creations don't get queued on mon startup

Added by Sage Weil about 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
cuttlefish, bobtail
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

PGMonitor::send_pg_creates also divvies up pg creations among the current osds they map to. This happens from update_from_paxos(), and presumably also when osdmaps update. On startup, it happens from preinit() -> init_paxos(), which calls PGMonitor::update_from_paxos() before teh OSDMOnitor, which means the OSDMap is not loaded and everything maps to no OSD. Until there is an osdmap update, a reconnecting osd will fail to see creations queued for it.

The fix is probably to break the divvying out of send_pg_creates(), and then ensure that it is called at some other point during startup.

The result is that a pool creation that races with a mon restart will hang. Some other path also gets in this state, or there is a different bug, since it was triggered by the job below (osd thrashing only). In any case, after that hang, restarting the mon got into this buggy state, so it should get fixed regardless.

ubuntu@teuthology:/a/sage-2013-04-06_09:10:56-rados-wip-osd-throttle-testing-basic/9833$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: b0bb70d12c365872547f10d185bf88eba3ed6083
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: ext4
    log-whitelist:
    - slow request
    sha1: aca0aea1bfbafba9cab1b2c693760b824bd82d30
  s3tests:
    branch: master
  workunit:
    sha1: aca0aea1bfbafba9cab1b2c693760b824bd82d30
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rados/test.sh


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #4813: pgs stuck creatingResolvedSamuel Just04/25/2013

Actions
Actions #1

Updated by Sage Weil about 11 years ago

  • Status changed from New to Fix Under Review
  • Priority changed from High to Urgent

wip-mon-pg

Actions #2

Updated by Ian Colle about 11 years ago

  • Assignee set to Greg Farnum

Greg - can you please review this wip branch?

Actions #3

Updated by Greg Farnum about 11 years ago

  • Status changed from Fix Under Review to Need More Info

Okay, I've looked at the patches and I've looked at the bug description and I can't tell what the problem is here. The effective change from the patches is to queue PG creates less frequently. Each monitor calls update_from_paxos() when it boots or wins an election, and when the OSDMonitor updates then it tells the PGMonitor to check the map, which calculates these mappings again. So it should all be fine with or without the patches (which would not have any impact on the stuck-creating PGs that I see when I go look at the teuthology archive).
Unfortunately there are no logs that I can find, but I see that there were 8 PGs creating and it manages one of them; do we know this is a pool create and not a split? Why do we believe the issue is with the monitors?

Actions #4

Updated by Greg Farnum about 11 years ago

  • Assignee changed from Greg Farnum to Sage Weil
Actions #5

Updated by Sage Weil about 11 years ago

  • Status changed from Need More Info to Fix Under Review
  • Assignee changed from Sage Weil to Greg Farnum

the problem is that update_from_apxos() is called on startup when the osdmap isn't loaded yet, so it remaps everything to [] and no creates are queued. then osdmap does get loaded, mon starts up, osds reconnect.. but if there are then no osdmap updates, then the creates never get recalculated with a non-broken value. unfortunately it's a difficult case to reproduce; i only saw it once with the hung qa run last week. easy fix though.

the quick fix is just the if get_epoch() != 0 chekcs in teh second patch. the first patch can wait.. eventually we'll want to be more explicit about when we recalc the mapping and when we send, although as you say that's not needed for cuttlefish.

Actions #6

Updated by Greg Farnum about 11 years ago

Okay, but an OSD booting creates a new OSD Map, which will lead to PGMonitor::check_pg_map(), which will lead to send_pg_creates(). I do see that we won't actually calculate them again when the OSDMap initially loads since we'll have seen it previously, so that assumption of mine wasn't quite right, but as soon as we have an OSD boot we're good.
So, thus the thrashing monitors but not thrashing OSDs. Okay, I see it now. But this won't actually fix that race either — the only callers of send_pg_creates are PGMonitor::update_from_paxos() and PGMonitor::check_osd_map(). The second is called whenever the OSD Map changes and the PGMap hasn't seen it before, but that's true for the same cases with and without these patches. PGMonitor::update_from_paxos() is also going to happen at the same times with or without these patches. So I'm still not seeing how these do anything. Just pushed a wip-4675-model that might handle this better; check it out?

Also I think the actual referenced hang here is a different issue than this race.

Actions #7

Updated by Greg Farnum about 11 years ago

  • Status changed from Fix Under Review to In Progress
Actions #8

Updated by Greg Farnum about 11 years ago

  • Priority changed from Urgent to High

Also also, pretty sure a rare race with an easy workaround is not an urgent bug. :)

Actions #9

Updated by Sage Weil about 11 years ago

yep, not urgent. i'll take a look later. thanks!

Actions #10

Updated by Greg Farnum almost 11 years ago

  • Assignee changed from Greg Farnum to Sage Weil

Giving this back since you're no longer on vacation.

Actions #11

Updated by Sage Weil almost 11 years ago

  • Backport set to cuttlefish, bobtail
Actions #12

Updated by Sage Weil almost 11 years ago

  • Status changed from In Progress to Fix Under Review
  • Priority changed from High to Urgent

pushed updated wip-mon-pg

Actions #13

Updated by Sage Weil almost 11 years ago

  • Status changed from Fix Under Review to Resolved

merged the fix for the mon restart case. 6a5be251df0e14ec66fb868ff6a6ef6e08d539c6

there is likely still a similar bug lurking, though, that can trigger when the mon hasn't been restarted. see #4813

Actions

Also available in: Atom PDF