Actions
Bug #49809
open1 out of 3 mon crashed in MonitorDBStore::get_synchronizer
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We experienced a single mon crash (out of 3 mons) - We observed no other issues on the machine or the cluster.
I attached the ceph-mon.log to this ticket.
ceph crash info:
{ "os_version_id": "18.04", "utsname_release": "4.15.0-136-generic", "os_name": "Ubuntu", "entity_name": "mon.ctrl-01", "timestamp": "2021-03-14 18:00:28.686018Z", "process_name": "ceph-mon", "utsname_machine": "x86_64", "utsname_sysname": "Linux", "os_version": "18.04.5 LTS (Bionic Beaver)", "os_id": "ubuntu", "utsname_version": "#140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021", "backtrace": [ "(()+0x12980) [0x7f1c0cfbe980]", "(MonitorDBStore::get_synchronizer(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&)+0x3f) [0x555e110c756f]", "(Monitor::_scrub(ScrubResult*, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >*, int*)+0xe9) [0x555e11089379]", "(Monitor::scrub()+0x298) [0x555e11092658]", "(Monitor::scrub_start()+0x137) [0x555e11092847]", "(C_MonContext::finish(int)+0x39) [0x555e1106a899]", "(Context::complete(int)+0x9) [0x555e110adc69]", "(SafeTimer::timer_thread()+0x190) [0x7f1c0e1dbd20]", "(SafeTimerThread::entry()+0xd) [0x7f1c0e1dd5ed]", "(()+0x76db) [0x7f1c0cfb36db]", "(clone()+0x3f) [0x7f1c0c19971f]" ], "utsname_hostname": "ctrl-01", "crash_id": "2021-03-14_18:00:28.686018Z_274d6160-d49e-4395-9d8c-64d4dcad53f5", "ceph_version": "14.2.16" }
ceph.log reported:
2021-03-14 18:00:00.000161 mon.ctrl-01 (mon.0) 224915 : cluster [INF] overall HEALTH_OK 2021-03-14 18:00:27.513970 mon.ctrl-03 (mon.2) 146401 : cluster [INF] mon.ctrl-03 calling monitor election 2021-03-14 18:00:27.516169 mon.ctrl-02 (mon.1) 122690 : cluster [INF] mon.ctrl-02 calling monitor election 2021-03-14 18:00:39.412761 mon.ctrl-02 (mon.1) 122691 : cluster [INF] mon.ctrl-02 is new leader, mons ctrl-02,ctrl-03 in quorum (ranks 1,2) 2021-03-14 18:00:39.423504 mon.ctrl-02 (mon.1) 122696 : cluster [WRN] Health check failed: 1/3 mons down, quorum ctrl-02,ctrl-03 (MON_DOWN) 2021-03-14 18:00:39.431667 mon.ctrl-02 (mon.1) 122697 : cluster [WRN] overall HEALTH_WARN 1/3 mons down, quorum ctrl-02,ctrl-03 2021-03-14 18:00:40.733430 mon.ctrl-01 (mon.0) 1 : cluster [INF] mon.ctrl-01 calling monitor election 2021-03-14 18:00:40.742270 mon.ctrl-01 (mon.0) 2 : cluster [INF] mon.ctrl-01 calling monitor election 2021-03-14 18:00:45.745709 mon.ctrl-01 (mon.0) 3 : cluster [INF] mon.ctrl-01 is new leader, mons ctrl-01,ctrl-03 in quorum (ranks 0,2) 2021-03-14 18:00:51.930256 mon.ctrl-02 (mon.1) 122710 : cluster [INF] mon.ctrl-02 calling monitor election 2021-03-14 18:00:51.934337 mon.ctrl-01 (mon.0) 4 : cluster [WRN] overall HEALTH_WARN 1/3 mons down, quorum ctrl-02,ctrl-03 2021-03-14 18:00:51.934422 mon.ctrl-01 (mon.0) 5 : cluster [INF] mon.ctrl-01 calling monitor election 2021-03-14 18:00:51.960175 mon.ctrl-01 (mon.0) 6 : cluster [INF] mon.ctrl-01 is new leader, mons ctrl-01,ctrl-02,ctrl-03 in quorum (ranks 0,1,2) 2021-03-14 18:00:51.970642 mon.ctrl-01 (mon.0) 11 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum ctrl-02,ctrl-03) 2021-03-14 18:00:51.970670 mon.ctrl-01 (mon.0) 12 : cluster [INF] Cluster is now healthy 2021-03-14 18:00:51.979150 mon.ctrl-01 (mon.0) 13 : cluster [INF] overall HEALTH_OK 2021-03-14 18:03:47.551678 mon.ctrl-01 (mon.0) 49 : cluster [WRN] Health check failed: 1 daemons have recently crashed (RECENT_CRASH)
Files
Actions