Project

General

Profile

Bug #9321

pgmap updates from OSDMap can be delayed indefinitely

Added by Greg Farnum almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Category:
Monitor
Target version:
-
Start date:
09/02/2014
Due date:
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

We saw a customer cluster in which a full OSD had been removed from the OSDMap, but after almost two hours that change had not propagated to the pgmap's list of full OSDs. Going through monitor logs, every time the PGMonitor tried to run update_from_osdmap, either the osdmap was unreadable or the pgmap was unwriteable.

After discussion with Joao and Sage, we think it's safe in our current implementation to simply drop the is_readable() check on the osdmonitor in this case, because while we might see an out-of-date map, we won't see an invalid one. In future we'll need to always provide a stable readable map for situations like this.


Related issues

Related to Ceph - Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS Resolved 10/15/2014

Associated revisions

Revision 06fc39c8 (diff)
Added by Joao Eduardo Luis almost 5 years ago

mon: PaxosService: can be readable even if proposing

As long as we have a stable version in memory that is lower or equal to
the version we want.

Fixes: #9321
Fixes: #9322

Signed-off-by: Joao Eduardo Luis <>

History

#1 Updated by Greg Farnum almost 5 years ago

  • Subject changed from OSDMap updates from pgmap can be delayed indefinitely to pgmap updates from OSDMap can be delayed indefinitely

#2 Updated by Greg Farnum almost 5 years ago

I should also note that I suspect this condition might have been exacerbated by our full map handling. We probably had every daemon/client in the cluster trying to get map updates from the monitor (increasing load), and maybe the OSDs would have been reporting more frequently? Because the updates seemed to be happening just fine before the full flag got set.

#3 Updated by Sage Weil almost 5 years ago

  • Assignee set to Joao Eduardo Luis

#4 Updated by Joao Eduardo Luis almost 5 years ago

  • Status changed from New to In Progress

#5 Updated by Samuel Just over 4 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF