Actions
Bug #42680
opencrash in in thread 7f6a445ee700 thread_name:devicehealth
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi,
It seems ceph-mgr crashes in relation to device health. I think it may be related to scraping.
-18> 2019-11-07 12:42:25.884 7f6a47df5700 4 mgr.server handle_report from 0x555590146900 mon,sandrider -17> 2019-11-07 12:42:25.885 7f6a47df5700 4 mgr.server maybe_ready initial report from osd 5 -16> 2019-11-07 12:42:25.886 7f6a47df5700 4 mgr.server maybe_ready still waiting for 4 osds to report in before PGMap is ready -15> 2019-11-07 12:42:25.886 7f6a47df5700 4 mgr.server handle_report from 0x555590018000 osd,4 -14> 2019-11-07 12:42:25.887 7f6a47df5700 4 mgr.server handle_report from 0x555590149600 osd,3 -13> 2019-11-07 12:42:25.888 7f6a47df5700 4 mgr.server handle_report from 0x555590681b00 osd,0 -12> 2019-11-07 12:42:25.889 7f6a47df5700 4 mgr.server handle_report from 0x555590681680 osd,1 -11> 2019-11-07 12:42:25.893 7f6a47df5700 4 mgr.server maybe_ready initial report from osd 4 -10> 2019-11-07 12:42:25.893 7f6a47df5700 4 mgr.server maybe_ready still waiting for 3 osds to report in before PGMap is ready -9> 2019-11-07 12:42:25.893 7f6a47df5700 4 mgr.server maybe_ready initial report from osd 3 -8> 2019-11-07 12:42:25.893 7f6a47df5700 4 mgr.server maybe_ready still waiting for 2 osds to report in before PGMap is ready -7> 2019-11-07 12:42:25.894 7f6a47df5700 4 mgr.server maybe_ready initial report from osd 0 -6> 2019-11-07 12:42:25.894 7f6a47df5700 4 mgr.server maybe_ready still waiting for 1 osds to report in before PGMap is ready -5> 2019-11-07 12:42:25.894 7f6a47df5700 4 mgr.server maybe_ready initial report from osd 1 -4> 2019-11-07 12:42:25.894 7f6a47df5700 4 mgr.server maybe_ready all osds have reported, sending PG state to mon -3> 2019-11-07 12:42:25.895 7f6a47df5700 0 log_channel(cluster) log [DBG] : pgmap v2: 225 pgs: 225 active+clean; 90 GiB data, 144 GiB used, 1.6 TiB / 1.7 TiB avail -2> 2019-11-07 12:42:25.895 7f6a47df5700 10 monclient: _send_mon_message to mon.naib at v2:[2001:470:71:68d:e3f5:39b2:1578:f7ae]:3300/0 -1> 2019-11-07 12:42:25.899 7f6a33c97700 3 client.4425728 may_lookup 0x55558f990580 = 0 0> 2019-11-07 12:42:26.262 7f6a445ee700 -1 *** Caught signal (Segmentation fault) ** in thread 7f6a445ee700 thread_name:devicehealth ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) 1: (()+0x14950) [0x7f6a5b80a950] 2: (rados_write_op_omap_set()+0x164) [0x7f6a4f8c0074] 3: (()+0xb8c21) [0x7f6a4fa8ac21] 4: (()+0x17d842) [0x7f6a5bb93842] 5: (PyVectorcall_Call()+0x70) [0x7f6a5bb42df0] 6: (()+0x452d6) [0x7f6a4fa172d6] 7: (()+0x8bbc9) [0x7f6a4fa5dbc9] 8: (_PyObject_MakeTpCall()+0x230) [0x7f6a5bb3d440] 9: (()+0xd187e) [0x7f6a5bae787e] 10: (_PyEval_EvalFrameDefault()+0x5199) [0x7f6a5bba64f9] 11: (_PyFunction_Vectorcall()+0xfa) [0x7f6a5bb6ff0a] 12: (_PyEval_EvalFrameDefault()+0x852) [0x7f6a5bba1bb2] 13: (_PyFunction_Vectorcall()+0xfa) [0x7f6a5bb6ff0a] 14: (_PyEval_EvalFrameDefault()+0x852) [0x7f6a5bba1bb2] 15: (_PyFunction_Vectorcall()+0xfa) [0x7f6a5bb6ff0a] 16: (()+0x175eab) [0x7f6a5bb8beab] 17: (()+0x127d14) [0x7f6a5bb3dd14] 18: (()+0x1fc6cf) [0x7f6a5bc126cf] 19: (PyObject_CallMethod()+0xc0) [0x7f6a5bc1d770] 20: (PyModuleRunner::serve()+0x66) [0x55558a35da76] 21: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1dd) [0x55558a35e31d] 22: (()+0x94e2) [0x7f6a5b7ff4e2] 23: (clone()+0x43) [0x7f6a5b3ca623] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 9/ 9 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 9/ 9 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mgr.naib.log --- end dump of recent events ---
This is on Fedora rawhide, package version ceph-mgr-14.2.4-1.fc32.x86_64. Cluster has 6 OSD on SATA HDD (no ssd, no nvme).
May be related to BUG #42578
Actions