Actions
Bug #36244
closedmgr crash when handle_report updating existing DaemonState for rgw
% Done:
0%
Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Description
i use multisite with 2 zone,both zone with 1 rgw,after i add 1 rgw for each zone,all the mgr are crash,after restart mgr service,the mgr still crash,didn't work any more.
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
-39> 2018-09-28 10:37:46.772706 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.22 100.97.8.124:6804/20375 4 ==== mgrreport(osd.22 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (1233950104 0 0) 0x5652d3c24100 con 0x5652d37d9000 -38> 2018-09-28 10:37:46.772714 7f3dde666700 4 mgr.server handle_report from 0x5652d37d9000 osd,22 -37> 2018-09-28 10:37:46.772716 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,22 -36> 2018-09-28 10:37:46.772713 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:6810/21991 conn(0x5652d3817800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=739 cs=1 l=1). rx osd.25 seq 4 0x5652d3c243c0 mgrreport(osd.25 +0-0 packed 742 osd_metrics=1) v5 -35> 2018-09-28 10:37:46.772718 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data -34> 2018-09-28 10:37:46.772753 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.25 100.97.8.124:6810/21991 4 ==== mgrreport(osd.25 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (2973115570 0 0) 0x5652d3c243c0 con 0x5652d3817800 -33> 2018-09-28 10:37:46.772761 7f3dde666700 4 mgr.server handle_report from 0x5652d3817800 osd,25 -32> 2018-09-28 10:37:46.772763 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,25 -31> 2018-09-28 10:37:46.772765 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data -30> 2018-09-28 10:37:46.773302 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:6804/20375 conn(0x5652d37d9000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=763 cs=1 l=1). rx osd.22 seq 5 0x5652d3c24680 pg_stats(50 pgs tid 0 v 0) v1 -29> 2018-09-28 10:37:46.773376 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.22 100.97.8.124:6804/20375 5 ==== pg_stats(50 pgs tid 0 v 0) v1 ==== 30008+0+0 (12111768 0 0) 0x5652d3c24680 con 0x5652d37d9000 -28> 2018-09-28 10:37:46.773468 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6816/21859 conn(0x5652d384d800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1335 cs=1 l=1). rx osd.18 seq 4 0x5652d3c17440 mgrreport(osd.18 +0-0 packed 742 osd_metrics=1) v5 -27> 2018-09-28 10:37:46.773487 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.18 100.97.8.123:6816/21859 4 ==== mgrreport(osd.18 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (156505875 0 0) 0x5652d3c17440 con 0x5652d384d800 -26> 2018-09-28 10:37:46.773498 7f3dde666700 4 mgr.server handle_report from 0x5652d384d800 osd,18 -25> 2018-09-28 10:37:46.773512 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,18 -24> 2018-09-28 10:37:46.773515 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data -23> 2018-09-28 10:37:46.773604 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6816/21859 conn(0x5652d384d800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1335 cs=1 l=1). rx osd.18 seq 5 0x5652d3c17700 pg_stats(14 pgs tid 0 v 0) v1 -22> 2018-09-28 10:37:46.773621 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.18 100.97.8.123:6816/21859 5 ==== pg_stats(14 pgs tid 0 v 0) v1 ==== 8508+0+0 (948072078 0 0) 0x5652d3c17700 con 0x5652d384d800 -21> 2018-09-28 10:37:46.773704 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:6810/21991 conn(0x5652d3817800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=739 cs=1 l=1). rx osd.25 seq 5 0x5652d3d28100 pg_stats(45 pgs tid 0 v 0) v1 -20> 2018-09-28 10:37:46.773717 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.25 100.97.8.124:6810/21991 5 ==== pg_stats(45 pgs tid 0 v 0) v1 ==== 27034+0+0 (1702851695 0 0) 0x5652d3d28100 con 0x5652d3817800 -19> 2018-09-28 10:37:46.773731 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6810/20319 conn(0x5652d38f3000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.15 seq 4 0x5652d3e18100 mgrreport(osd.15 +0-0 packed 742 osd_metrics=1) v5 -18> 2018-09-28 10:37:46.773777 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.15 100.97.8.123:6810/20319 4 ==== mgrreport(osd.15 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (2683331780 0 0) 0x5652d3e18100 con 0x5652d38f3000 -17> 2018-09-28 10:37:46.773784 7f3dde666700 4 mgr.server handle_report from 0x5652d38f3000 osd,15 -16> 2018-09-28 10:37:46.773787 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,15 -15> 2018-09-28 10:37:46.773789 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data -14> 2018-09-28 10:37:46.774280 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6810/20319 conn(0x5652d38f3000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.15 seq 5 0x5652d3e183c0 pg_stats(53 pgs tid 0 v 0) v1 -13> 2018-09-28 10:37:46.774304 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.15 100.97.8.123:6810/20319 5 ==== pg_stats(53 pgs tid 0 v 0) v1 ==== 31806+0+0 (3567068761 0 0) 0x5652d3e183c0 con 0x5652d38f3000 -12> 2018-09-28 10:37:46.774751 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6800/17601 conn(0x5652d383d000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.10 seq 4 0x5652d3b939c0 mgrreport(osd.10 +0-0 packed 742 osd_metrics=1) v5 -11> 2018-09-28 10:37:46.774770 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.10 100.97.8.123:6800/17601 4 ==== mgrreport(osd.10 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (1173938277 0 0) 0x5652d3b939c0 con 0x5652d383d000 -10> 2018-09-28 10:37:46.774778 7f3dde666700 4 mgr.server handle_report from 0x5652d383d000 osd,10 -9> 2018-09-28 10:37:46.774780 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,10 -8> 2018-09-28 10:37:46.774782 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data -7> 2018-09-28 10:37:46.775016 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6800/17601 conn(0x5652d383d000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.10 seq 5 0x5652d3b93c80 pg_stats(65 pgs tid 0 v 0) v1 -6> 2018-09-28 10:37:46.775034 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== osd.10 100.97.8.123:6800/17601 5 ==== pg_stats(65 pgs tid 0 v 0) v1 ==== 38938+0+0 (804497348 0 0) 0x5652d3b93c80 con 0x5652d383d000 -5> 2018-09-28 10:37:46.781888 7f3deb39c700 5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:0/2791165959 conn(0x5652d39f5800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=222 cs=1 l=1). rx client.6381 seq 3 0x5652d409f440 mgrreport(rgw.cn-bj-test2 +0-0 packed 214) v5 -4> 2018-09-28 10:37:46.781946 7f3dde666700 1 -- 100.97.8.131:6800/1265072 <== client.6381 100.97.8.124:0/2791165959 3 ==== mgrreport(rgw.cn-bj-test2 +0-0 packed 214) v5 ==== 253+0+0 (1062603950 0 0) 0x5652d409f440 con 0x5652d39f5800 -3> 2018-09-28 10:37:46.781962 7f3dde666700 4 mgr.server handle_report from 0x5652d39f5800 rgw,cn-bj-test2 -2> 2018-09-28 10:37:46.781966 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for rgw,cn-bj-test2 -1> 2018-09-28 10:37:46.781968 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 214 bytes of data 0> 2018-09-28 10:37:46.783446 7f3dde666700 -1 *** Caught signal (Aborted) ** in thread 7f3dde666700 thread_name:ms_dispatch ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable) 1: (()+0x3f40c1) [0x5652c9b220c1] 2: (()+0xf6d0) [0x7f3df00026d0] 3: (gsignal()+0x37) [0x7f3def011277] 4: (abort()+0x148) [0x7f3def012968] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f3def920ac5] 6: (()+0x5ea36) [0x7f3def91ea36] 7: (()+0x5ea63) [0x7f3def91ea63] 8: (()+0x5ec83) [0x7f3def91ec83] 9: (std::__throw_out_of_range(char const*)+0x77) [0x7f3def973b47] 10: (DaemonPerfCounters::update(MMgrReport*)+0xb6c) [0x5652c99d09ec] 11: (DaemonServer::handle_report(MMgrReport*)+0x243) [0x5652c99d8903] 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x5652c99e4917] 13: (DispatchQueue::entry()+0x792) [0x5652c9e20cb2] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x5652c9c0abed] 15: (()+0x7e25) [0x7f3defffae25] 16: (clone()+0x6d) [0x7f3def0d9bad] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 20/20 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mgr.JXQ-97-8-131.log --- end dump of recent events ---
Files
Updated by Mykola Golub over 5 years ago
- Status changed from New to Need More Info
Could you please attach the full mgr log?
Updated by Mykola Golub over 5 years ago
- Status changed from Need More Info to In Progress
- Assignee set to Mykola Golub
Updated by Mykola Golub over 5 years ago
- Status changed from In Progress to Fix Under Review
- Backport set to mimic,luminous
- Pull request ID set to 25534
Updated by Kefu Chai over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37826: mimic: mgr crash when handle_report updating existing DaemonState for rgw added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37827: luminous: mgr crash when handle_report updating existing DaemonState for rgw added
Updated by Lenz Grimmer over 5 years ago
- Related to Bug #24982: mgr: terminate called after throwing an instance of 'std::out_of_range' in DaemonPerfCounters::update added
Updated by Nathan Cutler over 5 years ago
- Status changed from Pending Backport to Resolved
Actions