Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)" - CephFS - Ceph

Actions

Copy link

Bug #16592

open

Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"

Added by Greg Farnum almost 8 years ago. Updated over 6 years ago.

Status:

Need More Info

Priority:

High

Assignee:

Category:

Correctness/Safety

Target version:

% Done:

100%

Source:

Community (user)

Tags:

Backport:

jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDSMonitor

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We've seen a few reports on the ceph-user mailing lists of the latest jewel.

2016-07-03 17:12:55.195553 7f828388e700 -1 mon/MDSMonitor.cc: In function 'bool MDSMonitor::maybe_promote_standby(std::shared_ptr<Filesystem>)' thread 7f828388e700 time 2016-07-03 17:12:55.193360
mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x556e001f1e12]
 2: (MDSMonitor::maybe_promote_standby(std::shared_ptr<Filesystem>)+0x97f) [0x556dffede53f]
 3: (MDSMonitor::tick()+0x3b6) [0x556dffee0866]
 4: (MDSMonitor::on_active()+0x28) [0x556dffed9038]
 5: (PaxosService::_active()+0x66a) [0x556dffe5968a]
 6: (Context::complete(int)+0x9) [0x556dffe249a9]
 7: (void finish_contexts<Context>(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xac) [0x556dffe2ba7c]
 8: (Paxos::finish_round()+0xd0) [0x556dffe50460]
 9: (Paxos::handle_last(std::shared_ptr<MonOpRequest>)+0x103d) [0x556dffe51acd]
 10: (Paxos::dispatch(std::shared_ptr<MonOpRequest>)+0x38c) [0x556dffe5254c]
 11: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xd3b) [0x556dffe2245b]
 12: (Monitor::_ms_dispatch(Message*)+0x581) [0x556dffe22b91]
 13: (Monitor::ms_dispatch(Message*)+0x23) [0x556dffe41393]
 14: (DispatchQueue::entry()+0x7ba) [0x556e002e722a]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x556e001d62cd]
 16: (()+0x74a4) [0x7f8290f904a4]
 17: (clone()+0x6d) [0x7f828f29298d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

mdsmap.bin (1.09 KB) mdsmap.bin

alexander walker, 11/11/2016 06:46 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Patrick Donnelly almost 8 years ago

Assignee set to Patrick Donnelly

Actions

Copy link

Updated by Patrick Donnelly almost 8 years ago

Should note that this is maybe related to: http://tracker.ceph.com/issues/15591

Actions

Copy link

Updated by Patrick Donnelly almost 8 years ago

So, rambling brain dump of my current thoughts on this:

I haven't been able to reproduce this problem. There are two known instances of
this bug in 10.2.2 upgrades after [1]: [2] and [3].

Dzianis upgraded from Infernalis to 10.2.2. Bill upgraded from 10.2.0 to
10.2.2. So far I've assumed that they are afflicted by the same bug but
obviously that can't be known until we can reproduce this.

So far it is clear that the problem in both cases is that standby_daemons [4]
has a MDS in it that is not in standby. The code is sprinkled with assertions
that check this invariant (ideally, we would enforce that on mutations of
standby_daemons). Both users are hitting this assertion (in different places).

I have tried naive exercises doing live/dead upgrades from v9.2.1 to v10.2.2
with mon/mds in various upgrade states. For example:

o Kill all mds and upgrade. Then upgrade mons.
o Kill lead mds and lead mon, upgrade. Wait. Upgrade the rest.
o Use "mds standby replay" / "mds standby for rank" in various configurations during upgrade.

I wasn't able to hit the assertion. (In most cases, I was trying to get
standby_daemons of a v10.2.2 mon to violate its "info->state ==
MDS_STATE_STANDBY" invariant through updates from other monitors.)

In v10.2.0+, a standby daemon's state may be changed in these places:

o [5] (removed in v10.2.2)
o [6] (removed in v10.2.2)
o [7] <-- strong candidate for bug?
o [8] and [9] (fs command, probably unrelated)
o [10] <-- strong candidate for bug?
o [11] which is only called from [12], so must be standby.
o [11] <-- received bad standby_daemon from other v10.2.0+ mon. Possible to hit on v10.2.2 if [5] or [6] are exercised on a v10.2.0 daemon!

I have tried to reproduce using [7] but I have not found a way to have a daemon
with rank MDS_RANK_NONE that is not also MDS_STATE_STANDBY.

That leaves [10] which I have done some code reading to see how to get a beacon
sent which causes that code path to run which modifies a standby daemon. So far
my thoughts on that are a v10.2.0 mds daemon asking to be
STANDBY_REPLAY_ONESHOT [13] which is not handled correctly (??) by a v10.2.2
mon as STANDBY_REPLAY_ONESHOT is removed since [15]. I do not really think this
is likely as STANDBY_REPLAY_ONESHOT is not really used according to [15] and
wouldn't really affect a live upgrade.

The other possibility is a v10.2.0 MDS wanting to be state
MDS_STATE_STANDBY_REPLAY in [13]. This possibility is handled in v10.2.2 in
[14]. However, I suspect it may be possible during live upgrade from v10.2.0
to v10.2.2 for an older monitor to change standby_daemons in the [5] and [6]
code paths. In this way, a bad standby_daemons map could be sent from an
older v10.2.0 monitor to a v10.2.2 monitor. [For this possibility, upgrading is
not actually necessary as v10.2.0 should fail eventually too with an
assertion?]

Anyway, I think at this point I'm spinning my wheels here (although, going
through the MDSMonitor/FSMap/MDSMap code was very useful). Maybe someone else
has an idea here.

BTW, I will soon submit a PR which adds more assertions in the code paths which
may violate the standby_daemons invariant so we can catch this problem earlier.

Actions

Copy link

Updated by Patrick Donnelly almost 8 years ago

PR for added assertions: https://github.com/ceph/ceph/pull/10316

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

Backport set to jewel

Actions

Copy link

Updated by Denis kaganovich almost 8 years ago

I make (maybe wrong, but no way back) one-shot upgrade: stop all client, stop all ceph daemons (mds,osd,mon) and run all again upgraded. Now (after bypath assertion patching, but while not pull 10316) I have all working, but with strange effect (may be even not related to this bug):

ceph mds stat
e5682: 1/1/1 up {0=c=up:standby-replay}, 1 up:standby

or same but up:active. But all 3 mds started: active, standby-reply & standby. "mds stat" show only one of non-standby mds's - active or standby-replay (max_mds=1). IMHO there are mds state misconfiguration too.

Actions

Copy link

Updated by Patrick Donnelly almost 8 years ago

Dzianis reported that he upgraded to 10.2.2 without ever upgrading to 10.2.0 (and downgrading after, if that's even possible):

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011603.html

So this would indicate that http://tracker.ceph.com/issues/15591 is a different problem.

Actions

Copy link

Updated by Greg Farnum almost 8 years ago

Category set to Correctness/Safety
Component(FS) MDSMonitor added

Actions

Copy link

Updated by John Spray almost 8 years ago

Denis: you are seeing http://tracker.ceph.com/issues/15705, unrelated to this ticket.

Actions

Copy link

#10

Updated by John Spray almost 8 years ago

It is interesting that we're hitting this in maybe_promote_standby and not in sanity(). Sanity gets called after paxos decode and before encode, so the fact that we're not hitting it there means that the bad state is probably originating while handling a beacon.

Case 7 does indeed look like a strong candidate.

Older daemons would set STANDBY_REPLAY here, but that's filtered out in MMDSBeacon::decode_payload and set back to STANDBY. The ONESHOT case is possible but really unlikely that the users did this and didn't mention it. However, we should whitelist the acceptable states during these updates (in addition to the assertion that Patrick already added to modify_daemon).

Another potential way for a bad state to make it through to this point would be if the epoch checking in preprocess_beacon was wrong: this got more complicated with multi-filesystem support (see effective_epoch in preprocess_beacon). I don't see any bug there though.

Opened https://github.com/ceph/ceph/pull/10428 to validate state transitions (and drop beacons rather than asserting out when something goes bad).

Actions

Copy link

#11

Updated by Greg Farnum over 7 years ago

Status changed from New to Need More Info
Priority changed from Urgent to Normal

Moving this down and setting Need More Info based on Patrick's investigation and the new asserts; let me know if that was the wrong move Patrick.

Actions

Copy link

#12

Updated by Patrick Donnelly over 7 years ago

Has duplicate Bug #17837: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3 added

Actions

Copy link

#13

Updated by alexander walker over 7 years ago

File mdsmap.bin mdsmap.bin added

I've created the ticket #17837.
Here ist output from "ceph mds dump --format=json-pretty"

{
    "epoch": 440,
    "flags": 0,
    "created": "2016-03-11 15:24:45.516358",
    "modified": "2016-11-09 14:31:22.696300",
    "tableserver": 0,
    "root": 0,
    "session_timeout": 60,
    "session_autoclose": 300,
    "max_file_size": 1099511627776,
    "last_failure": 395,
    "last_failure_osd_epoch": 2465,
    "compat": {
        "compat": {},
        "ro_compat": {},
        "incompat": {
            "feature_1": "base v0.20",
            "feature_2": "client writeable ranges",
            "feature_3": "default file layouts on dirs",
            "feature_4": "dir inode in separate object",
            "feature_5": "mds uses versioned encoding",
            "feature_6": "dirfrag is stored in omap",
            "feature_8": "no anchor table" 
        }
    },
    "max_mds": 1,
    "in": [
        0
    ],
    "up": {
        "mds_0": 5854219
    },
    "failed": [],
    "stopped": [],
    "info": {
        "gid_5854102": {
            "gid": 5854102,
            "name": "ceph2.aditosoftware.local",
            "rank": -1,
            "incarnation": 0,
            "state": "up:standby",
            "state_seq": 1,
            "addr": "192.168.49.102:6800\/1261",
            "standby_for_rank": -1,
            "standby_for_name": "",
            "export_targets": []
        },
        "gid_5854219": {
            "gid": 5854219,
            "name": "ceph1.aditosoftware.local",
            "rank": 0,
            "incarnation": 41,
            "state": "up:active",
            "state_seq": 111157,
            "addr": "192.168.49.101:6800\/1287",
            "standby_for_rank": -1,
            "standby_for_name": "",
            "export_targets": []
        },
        "gid_6000903": {
            "gid": 6000903,
            "name": "ceph3.aditosoftware.local",
            "rank": -1,
            "incarnation": 0,
            "state": "up:standby",
            "state_seq": 1,
            "addr": "192.168.49.103:6800\/1231",
            "standby_for_rank": -1,
            "standby_for_name": "",
            "export_targets": []
        }
    },
    "data_pools": [
        1
    ],
    "metadata_pool": 2,
    "enabled": true,
    "fs_name": "cephfs_fs" 
}

And I've attached the output from "ceph mds getmap > mdsmap.bin"

Actions

Copy link

#14

Updated by Patrick Donnelly over 6 years ago

Assignee deleted (~~Patrick Donnelly~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #16592

Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"

Updated by Patrick Donnelly almost 8 years ago

Updated by Patrick Donnelly almost 8 years ago

Updated by Patrick Donnelly almost 8 years ago

Updated by Patrick Donnelly almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Denis kaganovich almost 8 years ago

Updated by Patrick Donnelly almost 8 years ago

Updated by Greg Farnum almost 8 years ago

Updated by John Spray almost 8 years ago

Updated by John Spray almost 8 years ago

Updated by Greg Farnum over 7 years ago

Updated by Patrick Donnelly over 7 years ago

Updated by alexander walker over 7 years ago

Updated by Patrick Donnelly over 6 years ago