Bug #9419
closeddumpling->firefly upgrade, sending setallochint?
100%
Description
Crash on dumpling osds with bad op 39 when the first osd is upgraded to firefly, setallochint.
Updated by Samuel Just over 9 years ago
client rbd (firefly) --with setallochint--> primary (firefly) --with setallochint--> replica (dumpling) crash
Updated by Samuel Just over 9 years ago
The problem here appears to be that the user upgraded the clients before the osds were fully upgraded. librbd sends the setallochint unconditionally, old osds will respond with ENOTSUPP. The bug here would be that the primary supported the op and the replicas didn't. It probably should have returned ENOTSUPP.
Updated by Samuel Just over 9 years ago
- Assignee set to David Zafman
Two steps:
1) During GetInfo, for actingbackfill peers, build up a feature set which is the intersection of the feature sets of all of the peers as we receive messages from them.
2) Use this feature set in the setallochint handler in do_osd_ops to return ENOTSUPP if any peer does not understand it.
Updated by Loïc Dachary over 9 years ago
What happens if
- all OSDs in a PG support setallochint
- one secondary OSD goes down
- the secondary is replaced by an OSD that does not support setallochint
Updated by David Zafman over 9 years ago
- Status changed from 7 to Fix Under Review
Updated by David Zafman over 9 years ago
On any change of pg configuration peering happens, so a new collection of feature bits from the peers is collected. If not all peers support the feature, EOPNOTSUPP is returned to client and no messages are sent to any secondaries.
Updated by Loïc Dachary over 9 years ago
Thanks for explaining. Since alloc hint is optional it does not matter if it is activated and deactivate later.
Updated by David Zafman over 9 years ago
Notes on using feature bits already present. The problem is that CEPH_FEATURE_MSGR_KEEPALIVE2 was back ported, so we'd have to check CEPH_FEATURE_OSD_POOLRESEND but that is over 2 months worth of changes later. For maintainability I'd rather have a feature bit dedicated to the feature being checked for.
f825624f (Sage Weil 2014-01-29 19:47:21 -0800 125) CEPH_FEATURE_OSD_PRIMARY_AFFINITY
64568023 (Ilya Dryomov 2014-02-21 16:34:13 +0200 Added HINT CODE (v0.78)
d747d79f (Sage Weil 2014-03-27 21:09:13 -0700 126) CEPH_FEATURE_MSGR_KEEPALIVE2 (v0.79)
45e79a17 (Sage Weil 2014-05-08 10:50:51 -0700 54) CEPH_FEATURE_OSD_POOLRESEND (v0.81)
Updated by David Zafman over 9 years ago
- Source changed from Support to Development
Updated by David Zafman over 9 years ago
- Source changed from Development to Support
Updated by Samuel Just over 9 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Samuel Just over 9 years ago
next step is to add a tests for this to the upgrade suties.
Updated by Samuel Just over 9 years ago
- Assignee changed from David Zafman to Yuri Weinstein
Updated by Samuel Just over 9 years ago
- Status changed from Pending Backport to 12
Updated by Yuri Weinstein over 9 years ago
- Status changed from 12 to 7
This is done an a new case was added - PR https://github.com/ceph/ceph-qa-suite/pull/198