Project

General

Profile

Actions

Bug #59207

closed

Mgr failed to start normally after a node was restarted

Added by huazhong chen about 1 year ago. Updated about 1 year ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
ceph-mgr
Target version:
% Done:

100%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version ceph: v16.2.7

OS (e.g. from /etc/os-release): ubuntu:20.04
Kernel (e.g. uname -a): Linux pro-k8s-ceph12 5.4.0-125-generic # 141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod): v1.8.3
Storage backend version (e.g. for ceph do ceph -v): ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
Kubernetes version (use kubectl version): 1.23.2

'''

{
"crash_id": "2023-03-29T08:50:09.492296Z_8f4f378f-e3cb-479c-8eca-bb1ab756ce46",
"timestamp": "2023-03-29T08:50:09.492296Z",
"process_name": "ceph-mgr",
"entity_name": "mgr.a",
"ceph_version": "16.2.7",
"utsname_hostname": "rook-ceph-mgr-a-64f9b69798-c5k4g",
"utsname_sysname": "Linux",
"utsname_release": "5.4.0-125-generic",
"utsname_version": "#141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022",
"utsname_machine": "x86_64",
"os_name": "CentOS Linux",
"os_id": "centos",
"os_version_id": "8",
"os_version": "8",
"backtrace": [
"/lib64/libpthread.so.0(+0x12c20) [0x7f82c5622c20]",
"/lib64/libpython3.6m.so.1.0(+0x1100c2) [0x7f82cf7750c2]",
"(PyFormatter::dump_pyobject(std::basic_string_view<char, std::char_traits<char> >, _object*)+0x61) [0x561178017d91]",
"(OSDMap::dump(ceph::Formatter*) const+0x6d7) [0x7f82c6c35157]",
"ceph-mgr(+0x270402) [0x561178029402]",
"/lib64/libpython3.6m.so.1.0(+0x19d5f1) [0x7f82cf8025f1]",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.6m.so.1.0(+0x179e28) [0x7f82cf7dee28]",
"/lib64/libpython3.6m.so.1.0(+0x19d397) [0x7f82cf802397]",
"_PyEval_EvalFrameDefault()",
"_PyFunction_FastCallDict()",
"_PyObject_FastCallDict()",
"/lib64/libpython3.6m.so.1.0(+0x10dda0) [0x7f82cf772da0]",
"/lib64/libpython3.6m.so.1.0(+0x188bc2) [0x7f82cf7edbc2]",
"_PyObject_FastCallKeywords()",
"/lib64/libpython3.6m.so.1.0(+0x19d4d6) [0x7f82cf8024d6]",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.6m.so.1.0(+0x179e28) [0x7f82cf7dee28]",
"/lib64/libpython3.6m.so.1.0(+0x19d397) [0x7f82cf802397]",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.6m.so.1.0(+0xfa516) [0x7f82cf75f516]",
"_PyFunction_FastCallDict()",
"_PyObject_FastCallDict()",
"/lib64/libpython3.6m.so.1.0(+0x10dda0) [0x7f82cf772da0]",
"_PyObject_FastCallDict()",
"PyObject_CallMethod()",
"(PyModuleRunner::serve()+0x66) [0x5611780274a6]",
"(PyModuleRunner::PyModuleRunnerThread::entry()+0x1e5) [0x561178027d85]",
"/lib64/libpthread.so.0(+0x817a) [0x7f82c561817a]",
"clone()"
]
}

'''

-26> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting devicehealth
-25> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting iostat
-24> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting nfs
-23> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting orchestrator
-22> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting pg_autoscaler
-21> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting progress
-20> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting prometheus
-19> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting rbd_support
-18> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting restful
-17> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting status
-16> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting telemetry
-15> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr[py] Starting volumes
-14> 2023-03-29T16:50:09.484+0800 7f801f9a5700 5 asok(0x56117a8b2000) register_command dump_osd_network hook 0x56118213a170
-13> 2023-03-29T16:50:09.484+0800 7f801f9a5700 5 asok(0x56117a8b2000) register_command mgr_status hook 0x56118216d800
-12> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr init Complete.
-11> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr send_beacon going active, including 444 commands in beacon
-10> 2023-03-29T16:50:09.484+0800 7f801f9a5700 10 monclient: _send_mon_message to mon.l at v2:172.188.59.154:3300/0
-9> 2023-03-29T16:50:09.484+0800 7f801f9a5700 0 [balancer DEBUG root] setting log level based on debug_mgr: INFO (2/5)
-8> 2023-03-29T16:50:09.484+0800 7f801f9a5700 1 mgr load Constructed class from module: balancer
-7> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr operator() Starting thread for balancer
-6> 2023-03-29T16:50:09.484+0800 7f81e29eb700 4 mgr entry Entering thread for balancer
-5> 2023-03-29T16:50:09.484+0800 7f81e29eb700 0 [balancer INFO root] Starting
-4> 2023-03-29T16:50:09.484+0800 7f801f9a5700 0 [crash DEBUG root] setting log level based on debug_mgr: INFO (2/5)
-3> 2023-03-29T16:50:09.484+0800 7f801f9a5700 1 mgr load Constructed class from module: crash
-2> 2023-03-29T16:50:09.484+0800 7f801f9a5700 4 mgr operator() Starting thread for crash
-1> 2023-03-29T16:50:09.484+0800 7f81da1ea700 4 mgr entry Entering thread for crash
0> 2023-03-29T16:50:09.492+0800 7f81e29eb700 -1 ** Caught signal (Segmentation fault) *
in thread 7f81e29eb700 thread_name:balancer

ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
1: /lib64/libpthread.so.0(+0x12c20) [0x7f82c5622c20]
2: /lib64/libpython3.6m.so.1.0(+0x1100c2) [0x7f82cf7750c2]
3: (PyFormatter::dump_pyobject(std::basic_string_view&lt;char, std::char_traits&lt;char&gt; >, _object*)+0x61) [0x561178017d91]
4: (OSDMap::dump(ceph::Formatter*) const+0x6d7) [0x7f82c6c35157]
5: ceph-mgr(+0x270402) [0x561178029402]
6: /lib64/libpython3.6m.so.1.0(+0x19d5f1) [0x7f82cf8025f1]
7: _PyEval_EvalFrameDefault()
8: /lib64/libpython3.6m.so.1.0(+0x179e28) [0x7f82cf7dee28]
9: /lib64/libpython3.6m.so.1.0(+0x19d397) [0x7f82cf802397]
10: _PyEval_EvalFrameDefault()
11: _PyFunction_FastCallDict()
12: _PyObject_FastCallDict()
13: /lib64/libpython3.6m.so.1.0(+0x10dda0) [0x7f82cf772da0]
14: /lib64/libpython3.6m.so.1.0(+0x188bc2) [0x7f82cf7edbc2]
15: _PyObject_FastCallKeywords()
16: /lib64/libpython3.6m.so.1.0(+0x19d4d6) [0x7f82cf8024d6]
17: _PyEval_EvalFrameDefault()
18: /lib64/libpython3.6m.so.1.0(+0x179e28) [0x7f82cf7dee28]
19: /lib64/libpython3.6m.so.1.0(+0x19d397) [0x7f82cf802397]
20: _PyEval_EvalFrameDefault()
21: /lib64/libpython3.6m.so.1.0(+0xfa516) [0x7f82cf75f516]
22: _PyFunction_FastCallDict()
23: _PyObject_FastCallDict()
24: /lib64/libpython3.6m.so.1.0(+0x10dda0) [0x7f82cf772da0]
25: _PyObject_FastCallDict()
26: PyObject_CallMethod()
27: (PyModuleRunner::serve()+0x66) [0x5611780274a6]
28: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1e5) [0x561178027d85]
29: /lib64/libpthread.so.0(+0x817a) [0x7f82c561817a]
30: clone()
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_pwl
0/ 5 journaler
0/ 5 objectcacher
0/ 5 immutable_obj_cache
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 fuse
2/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
1/ 5 prioritycache
0/ 5 test
0/ 5 cephfs_mirror
0/ 5 cephsqlite
2/-2 (syslog threshold)
99/99 (stderr threshold)
--
pthread ID / name mapping for recent threads ---
140188262749952 / mgr-fin
140195686950656 / crash
140195829561088 / balancer
140199461922560 / safe_timer
140199470315264 / io_context_pool
140199478707968 / io_context_pool
140199495493376 / ms_dispatch
140199537456896 / admin_socket
140199545849600 / msgr-worker-2
140199554242304 / msgr-worker-1
140199562635008 / msgr-worker-0
140199809774848 / ceph-mgr
max_recent 10000
max_new 10000
log_file /var/lib/ceph/crash/2023-03-29T08:50:09.492296Z_8f4f378f-e3cb-479c-8eca-bb1ab756ce46/log
--- end dump of recent events ---

Actions #1

Updated by huazhong chen about 1 year ago

the issue is closed

Actions #2

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from New to Can't reproduce
  • % Done changed from 0 to 100
Actions #3

Updated by Ilya Dryomov about 1 year ago

  • Target version changed from v16.2.12 to v16.2.13
Actions

Also available in: Atom PDF