Project

General

Profile

Actions

Bug #9321

closed

pgmap updates from OSDMap can be delayed indefinitely

Added by Greg Farnum over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We saw a customer cluster in which a full OSD had been removed from the OSDMap, but after almost two hours that change had not propagated to the pgmap's list of full OSDs. Going through monitor logs, every time the PGMonitor tried to run update_from_osdmap, either the osdmap was unreadable or the pgmap was unwriteable.

After discussion with Joao and Sage, we think it's safe in our current implementation to simply drop the is_readable() check on the osdmonitor in this case, because while we might see an out-of-date map, we won't see an invalid one. In future we'll need to always provide a stable readable map for situations like this.


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDSResolvedJoao Eduardo Luis10/15/2014

Actions
Actions #1

Updated by Greg Farnum over 9 years ago

  • Subject changed from OSDMap updates from pgmap can be delayed indefinitely to pgmap updates from OSDMap can be delayed indefinitely
Actions #2

Updated by Greg Farnum over 9 years ago

I should also note that I suspect this condition might have been exacerbated by our full map handling. We probably had every daemon/client in the cluster trying to get map updates from the monitor (increasing load), and maybe the OSDs would have been reporting more frequently? Because the updates seemed to be happening just fine before the full flag got set.

Actions #3

Updated by Sage Weil over 9 years ago

  • Assignee set to Joao Eduardo Luis
Actions #4

Updated by Joao Eduardo Luis over 9 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Samuel Just over 9 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF