Project

General

Profile

Actions

Bug #43365

closed

Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan

Added by Alex Walender over 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):

6a617f9d477ab8df2d068af0768ff741c68adabcc5c1ecb5dd3e9872d613c943
dacbff55030f3d0837e58d8f4961441b6902d5750b0e1579682df5650c33d44d
ba2236bee8cb7fa239e26b19bf4ec3fcd2245c1a3ea0decd85d6e183c5476f91
a1d05027be7f31340a919c0c79a91d7be71d547503fa54fdf0b74448ccd3b7f7
f7126ff3d59617640b56f9aaaa317a56b8827fc299ba37f1e727b8ac13ce0e1a
cf2864eb1281dffc3340730dc2caae163b4c0170132bcbd3dcbd6147d8f29fa8

Crash signature (v2):

Description

Thanks to 14.2.5 auto warning for recent crashes, we are observing frequent (somewhat daily period) random crashes of our monitors.

``` {
"crash_id": "2019-12-18_10:56:08.666582Z_408b4d27-47e2-43bf-849e-87b62a9c6084",
"timestamp": "2019-12-18 10:56:08.666582Z",
"process_name": "ceph-mon",
"entity_name": "mon.weser",
"ceph_version": "14.2.5",
"utsname_hostname": "weser",
"utsname_sysname": "Linux",
"utsname_release": "5.0.0-27-generic",
"utsname_version": "#28~18.04.1-Ubuntu SMP Thu Aug 22 03:00:32 UTC 2019",
"utsname_machine": "x86_64",
"os_name": "Ubuntu",
"os_id": "ubuntu",
"os_version_id": "18.04",
"os_version": "18.04.3 LTS (Bionic Beaver)",
"assert_condition": "z >= signedspan::zero()",
"assert_func": "ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)",
"assert_file": "/build/ceph-14.2.5/src/common/ceph_time.h",
"assert_line": 485,
"assert_thread_name": "fn_monstore",
"assert_msg": "/build/ceph-14.2.5/src/common/ceph_time.h: In function 'ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)' thread 7fc3198ac700 time 2019-12-18 11:56:08.662665\n/build/ceph-14.2.5/src/common/ceph_time.h: 485: FAILED ceph_assert(z >= signedspan::zero())\n",
"backtrace": [
"(()+0x12890) [0x7fc3284fc890]",
"(gsignal()+0xc7) [0x7fc3275f4e97]",
"(abort()+0x141) [0x7fc3275f6801]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7fc3296a0287]",
"(()+0x28140e) [0x7fc3296a040e]",
"(Paxos::begin(ceph::buffer::v14_2_0::list&)+0xcd2) [0x5571b4b2f412]",
"(Paxos::propose_pending()+0x127) [0x5571b4b305f7]",
"(Paxos::finish_round()+0x50a) [0x5571b4b30e1a]",
"(Paxos::commit_finish()+0x5fc) [0x5571b4b32d6c]",
"(C_Committed::finish(int)+0x34) [0x5571b4b36d54]",
"(Context::complete(int)+0x9) [0x5571b4a6d359]",
"(MonitorDBStore::C_DoTransaction::finish(int)+0x94) [0x5571b4b36ac4]",
"(Context::complete(int)+0x9) [0x5571b4a6d359]",
"(Finisher::finisher_thread_entry()+0x17f) [0x7fc32972b79f]",
"(()+0x76db) [0x7fc3284f16db]",
"(clone()+0x3f) [0x7fc3276d788f]"
]
}
```

Crashes seem to appear out of random. I'm not able to provoke it directly.

Since 14.2.5, we also have issues with our relationship between mon/mgr described here:
https://tracker.ceph.com/issues/43364

Not sure if this exception is related to this monitor crashes.


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #44078: centos 8.1: ceph-mon: assertion "z >= signedspan::zero()" failed in ceph::to_timespan(Duplicate

Actions
Copied to RADOS - Backport #44486: nautilus: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspanResolvedNathan CutlerActions
Actions

Also available in: Atom PDF