Project

General

Profile

Bug #16592

Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"

Added by Greg Farnum almost 3 years ago. Updated almost 2 years ago.

Status:
Need More Info
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
Start date:
11/09/2016
Due date:
% Done:

100%

Source:
Community (user)
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:

Description

We've seen a few reports on the ceph-user mailing lists of the latest jewel.

2016-07-03 17:12:55.195553 7f828388e700 -1 mon/MDSMonitor.cc: In function 'bool MDSMonitor::maybe_promote_standby(std::shared_ptr<Filesystem>)' thread 7f828388e700 time 2016-07-03 17:12:55.193360
mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x556e001f1e12]
 2: (MDSMonitor::maybe_promote_standby(std::shared_ptr<Filesystem>)+0x97f) [0x556dffede53f]
 3: (MDSMonitor::tick()+0x3b6) [0x556dffee0866]
 4: (MDSMonitor::on_active()+0x28) [0x556dffed9038]
 5: (PaxosService::_active()+0x66a) [0x556dffe5968a]
 6: (Context::complete(int)+0x9) [0x556dffe249a9]
 7: (void finish_contexts<Context>(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xac) [0x556dffe2ba7c]
 8: (Paxos::finish_round()+0xd0) [0x556dffe50460]
 9: (Paxos::handle_last(std::shared_ptr<MonOpRequest>)+0x103d) [0x556dffe51acd]
 10: (Paxos::dispatch(std::shared_ptr<MonOpRequest>)+0x38c) [0x556dffe5254c]
 11: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xd3b) [0x556dffe2245b]
 12: (Monitor::_ms_dispatch(Message*)+0x581) [0x556dffe22b91]
 13: (Monitor::ms_dispatch(Message*)+0x23) [0x556dffe41393]
 14: (DispatchQueue::entry()+0x7ba) [0x556e002e722a]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x556e001d62cd]
 16: (()+0x74a4) [0x7f8290f904a4]
 17: (clone()+0x6d) [0x7f828f29298d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

mdsmap.bin (1.09 KB) alexander walker, 11/11/2016 06:46 AM


Related issues

Duplicated by fs - Bug #17837: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3 Resolved 11/09/2016

History

#1 Updated by Patrick Donnelly almost 3 years ago

  • Assignee set to Patrick Donnelly

#2 Updated by Patrick Donnelly almost 3 years ago

Should note that this is maybe related to: http://tracker.ceph.com/issues/15591

#3 Updated by Patrick Donnelly almost 3 years ago

So, rambling brain dump of my current thoughts on this:

I haven't been able to reproduce this problem. There are two known instances of
this bug in 10.2.2 upgrades after [1]: [2] and [3].

Dzianis upgraded from Infernalis to 10.2.2. Bill upgraded from 10.2.0 to
10.2.2. So far I've assumed that they are afflicted by the same bug but
obviously that can't be known until we can reproduce this.

So far it is clear that the problem in both cases is that standby_daemons [4]
has a MDS in it that is not in standby. The code is sprinkled with assertions
that check this invariant (ideally, we would enforce that on mutations of
standby_daemons). Both users are hitting this assertion (in different places).

I have tried naive exercises doing live/dead upgrades from v9.2.1 to v10.2.2
with mon/mds in various upgrade states. For example:

o Kill all mds and upgrade. Then upgrade mons.
o Kill lead mds and lead mon, upgrade. Wait. Upgrade the rest.
o Use "mds standby replay" / "mds standby for rank" in various configurations during upgrade.

I wasn't able to hit the assertion. (In most cases, I was trying to get
standby_daemons of a v10.2.2 mon to violate its "info->state ==
MDS_STATE_STANDBY" invariant through updates from other monitors.)

In v10.2.0+, a standby daemon's state may be changed in these places:

o [5] (removed in v10.2.2)
o [6] (removed in v10.2.2)
o [7] <-- strong candidate for bug?
o [8] and [9] (fs command, probably unrelated)
o [10] <-- strong candidate for bug?
o [11] which is only called from [12], so must be standby.
o [11] <-- received bad standby_daemon from other v10.2.0+ mon. Possible to hit on v10.2.2 if [5] or [6] are exercised on a v10.2.0 daemon!

I have tried to reproduce using [7] but I have not found a way to have a daemon
with rank MDS_RANK_NONE that is not also MDS_STATE_STANDBY.

That leaves [10] which I have done some code reading to see how to get a beacon
sent which causes that code path to run which modifies a standby daemon. So far
my thoughts on that are a v10.2.0 mds daemon asking to be
STANDBY_REPLAY_ONESHOT [13] which is not handled correctly (??) by a v10.2.2
mon as STANDBY_REPLAY_ONESHOT is removed since [15]. I do not really think this
is likely as STANDBY_REPLAY_ONESHOT is not really used according to [15] and
wouldn't really affect a live upgrade.

The other possibility is a v10.2.0 MDS wanting to be state
MDS_STATE_STANDBY_REPLAY in [13]. This possibility is handled in v10.2.2 in
[14]. However, I suspect it may be possible during live upgrade from v10.2.0
to v10.2.2 for an older monitor to change standby_daemons in the [5] and [6]
code paths
. In this way, a bad standby_daemons map could be sent from an
older v10.2.0 monitor to a v10.2.2 monitor. [For this possibility, upgrading is
not actually necessary as v10.2.0 should fail eventually too with an
assertion?]

Anyway, I think at this point I'm spinning my wheels here (although, going
through the MDSMonitor/FSMap/MDSMap code was very useful). Maybe someone else
has an idea here.

BTW, I will soon submit a PR which adds more assertions in the code paths which
may violate the standby_daemons invariant so we can catch this problem earlier.

[1] http://tracker.ceph.com/issues/15591:
[2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/011033.html
[3] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011098.html
[4] standby_daemons is a map of standbys maintained in the FSMap. The FSMap is
a new structure that contains one or more MDSMaps (one for each file system, we
can now have more than one). A standby is no longer associated with a specific
file system and may be used on demand.
[5] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L617-L623
[6] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L639-L645
[7] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L702-L705
[8] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L2232-L2234
[9] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L2250-L2252
[10] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/FSMap.cc#L410-L414
[11] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/FSMap.cc#L769-L774
[12] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L523-L533
[13] https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/MDSDaemon.cc#L980-L986
[14] https://github.com/ceph/ceph/blob/45107e21c568dd033c2f0a3107dec8f0b0e58374/src/messages/MMDSBeacon.h#L240-L245 (v10.2.2!)
[15] https://github.com/ceph/ceph/commit/02e3edd93c0f4ef6e0d11df1f35187f74c7ea2ff

#5 Updated by Nathan Cutler almost 3 years ago

  • Backport set to jewel

#6 Updated by Denis kaganovich almost 3 years ago

I make (maybe wrong, but no way back) one-shot upgrade: stop all client, stop all ceph daemons (mds,osd,mon) and run all again upgraded. Now (after bypath assertion patching, but while not pull 10316) I have all working, but with strange effect (may be even not related to this bug):

  1. ceph mds stat
    e5682: 1/1/1 up {0=c=up:standby-replay}, 1 up:standby

or same but up:active. But all 3 mds started: active, standby-reply & standby. "mds stat" show only one of non-standby mds's - active or standby-replay (max_mds=1). IMHO there are mds state misconfiguration too.

#7 Updated by Patrick Donnelly almost 3 years ago

Dzianis reported that he upgraded to 10.2.2 without ever upgrading to 10.2.0 (and downgrading after, if that's even possible):

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011603.html

So this would indicate that http://tracker.ceph.com/issues/15591 is a different problem.

#8 Updated by Greg Farnum almost 3 years ago

  • Category set to Correctness/Safety
  • Component(FS) MDSMonitor added

#9 Updated by John Spray almost 3 years ago

Denis: you are seeing http://tracker.ceph.com/issues/15705, unrelated to this ticket.

#10 Updated by John Spray almost 3 years ago

It is interesting that we're hitting this in maybe_promote_standby and not in sanity(). Sanity gets called after paxos decode and before encode, so the fact that we're not hitting it there means that the bad state is probably originating while handling a beacon.

Case 7 does indeed look like a strong candidate.

Older daemons would set STANDBY_REPLAY here, but that's filtered out in MMDSBeacon::decode_payload and set back to STANDBY. The ONESHOT case is possible but really unlikely that the users did this and didn't mention it. However, we should whitelist the acceptable states during these updates (in addition to the assertion that Patrick already added to modify_daemon).

Another potential way for a bad state to make it through to this point would be if the epoch checking in preprocess_beacon was wrong: this got more complicated with multi-filesystem support (see effective_epoch in preprocess_beacon). I don't see any bug there though.

Opened https://github.com/ceph/ceph/pull/10428 to validate state transitions (and drop beacons rather than asserting out when something goes bad).

#11 Updated by Greg Farnum almost 3 years ago

  • Status changed from New to Need More Info
  • Priority changed from Urgent to Normal

Moving this down and setting Need More Info based on Patrick's investigation and the new asserts; let me know if that was the wrong move Patrick.

#12 Updated by Patrick Donnelly over 2 years ago

  • Duplicated by Bug #17837: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3 added

#13 Updated by alexander walker over 2 years ago

I've created the ticket #17837.
Here ist output from "ceph mds dump --format=json-pretty"

{
    "epoch": 440,
    "flags": 0,
    "created": "2016-03-11 15:24:45.516358",
    "modified": "2016-11-09 14:31:22.696300",
    "tableserver": 0,
    "root": 0,
    "session_timeout": 60,
    "session_autoclose": 300,
    "max_file_size": 1099511627776,
    "last_failure": 395,
    "last_failure_osd_epoch": 2465,
    "compat": {
        "compat": {},
        "ro_compat": {},
        "incompat": {
            "feature_1": "base v0.20",
            "feature_2": "client writeable ranges",
            "feature_3": "default file layouts on dirs",
            "feature_4": "dir inode in separate object",
            "feature_5": "mds uses versioned encoding",
            "feature_6": "dirfrag is stored in omap",
            "feature_8": "no anchor table" 
        }
    },
    "max_mds": 1,
    "in": [
        0
    ],
    "up": {
        "mds_0": 5854219
    },
    "failed": [],
    "stopped": [],
    "info": {
        "gid_5854102": {
            "gid": 5854102,
            "name": "ceph2.aditosoftware.local",
            "rank": -1,
            "incarnation": 0,
            "state": "up:standby",
            "state_seq": 1,
            "addr": "192.168.49.102:6800\/1261",
            "standby_for_rank": -1,
            "standby_for_name": "",
            "export_targets": []
        },
        "gid_5854219": {
            "gid": 5854219,
            "name": "ceph1.aditosoftware.local",
            "rank": 0,
            "incarnation": 41,
            "state": "up:active",
            "state_seq": 111157,
            "addr": "192.168.49.101:6800\/1287",
            "standby_for_rank": -1,
            "standby_for_name": "",
            "export_targets": []
        },
        "gid_6000903": {
            "gid": 6000903,
            "name": "ceph3.aditosoftware.local",
            "rank": -1,
            "incarnation": 0,
            "state": "up:standby",
            "state_seq": 1,
            "addr": "192.168.49.103:6800\/1231",
            "standby_for_rank": -1,
            "standby_for_name": "",
            "export_targets": []
        }
    },
    "data_pools": [
        1
    ],
    "metadata_pool": 2,
    "enabled": true,
    "fs_name": "cephfs_fs" 
}

And I've attached the output from "ceph mds getmap > mdsmap.bin"

#14 Updated by Patrick Donnelly almost 2 years ago

  • Assignee deleted (Patrick Donnelly)

Also available in: Atom PDF