Project

General

Profile

Actions

Bug #36244

closed

mgr crash when handle_report updating existing DaemonState for rgw

Added by Diluga Salome over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

i use multisite with 2 zone,both zone with 1 rgw,after i add 1 rgw for each zone,all the mgr are crash,after restart mgr service,the mgr still crash,didn't work any more.

ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)

-39> 2018-09-28 10:37:46.772706 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.22 100.97.8.124:6804/20375 4 ==== mgrreport(osd.22 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (1233950104 0 0) 0x5652d3c24100 con 0x5652d37d9000
   -38> 2018-09-28 10:37:46.772714 7f3dde666700  4 mgr.server handle_report from 0x5652d37d9000 osd,22
   -37> 2018-09-28 10:37:46.772716 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,22
   -36> 2018-09-28 10:37:46.772713 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:6810/21991 conn(0x5652d3817800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=739 cs=1 l=1). rx osd.25 seq 4 0x5652d3c243c0 mgrreport(osd.25 +0-0 packed 742 osd_metrics=1) v5
   -35> 2018-09-28 10:37:46.772718 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data
   -34> 2018-09-28 10:37:46.772753 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.25 100.97.8.124:6810/21991 4 ==== mgrreport(osd.25 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (2973115570 0 0) 0x5652d3c243c0 con 0x5652d3817800
   -33> 2018-09-28 10:37:46.772761 7f3dde666700  4 mgr.server handle_report from 0x5652d3817800 osd,25
   -32> 2018-09-28 10:37:46.772763 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,25
   -31> 2018-09-28 10:37:46.772765 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data
   -30> 2018-09-28 10:37:46.773302 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:6804/20375 conn(0x5652d37d9000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=763 cs=1 l=1). rx osd.22 seq 5 0x5652d3c24680 pg_stats(50 pgs tid 0 v 0) v1
   -29> 2018-09-28 10:37:46.773376 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.22 100.97.8.124:6804/20375 5 ==== pg_stats(50 pgs tid 0 v 0) v1 ==== 30008+0+0 (12111768 0 0) 0x5652d3c24680 con 0x5652d37d9000
   -28> 2018-09-28 10:37:46.773468 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6816/21859 conn(0x5652d384d800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1335 cs=1 l=1). rx osd.18 seq 4 0x5652d3c17440 mgrreport(osd.18 +0-0 packed 742 osd_metrics=1) v5
   -27> 2018-09-28 10:37:46.773487 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.18 100.97.8.123:6816/21859 4 ==== mgrreport(osd.18 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (156505875 0 0) 0x5652d3c17440 con 0x5652d384d800
   -26> 2018-09-28 10:37:46.773498 7f3dde666700  4 mgr.server handle_report from 0x5652d384d800 osd,18
   -25> 2018-09-28 10:37:46.773512 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,18
   -24> 2018-09-28 10:37:46.773515 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data
   -23> 2018-09-28 10:37:46.773604 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6816/21859 conn(0x5652d384d800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1335 cs=1 l=1). rx osd.18 seq 5 0x5652d3c17700 pg_stats(14 pgs tid 0 v 0) v1
   -22> 2018-09-28 10:37:46.773621 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.18 100.97.8.123:6816/21859 5 ==== pg_stats(14 pgs tid 0 v 0) v1 ==== 8508+0+0 (948072078 0 0) 0x5652d3c17700 con 0x5652d384d800
   -21> 2018-09-28 10:37:46.773704 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:6810/21991 conn(0x5652d3817800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=739 cs=1 l=1). rx osd.25 seq 5 0x5652d3d28100 pg_stats(45 pgs tid 0 v 0) v1
   -20> 2018-09-28 10:37:46.773717 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.25 100.97.8.124:6810/21991 5 ==== pg_stats(45 pgs tid 0 v 0) v1 ==== 27034+0+0 (1702851695 0 0) 0x5652d3d28100 con 0x5652d3817800
   -19> 2018-09-28 10:37:46.773731 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6810/20319 conn(0x5652d38f3000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.15 seq 4 0x5652d3e18100 mgrreport(osd.15 +0-0 packed 742 osd_metrics=1) v5
   -18> 2018-09-28 10:37:46.773777 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.15 100.97.8.123:6810/20319 4 ==== mgrreport(osd.15 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (2683331780 0 0) 0x5652d3e18100 con 0x5652d38f3000
   -17> 2018-09-28 10:37:46.773784 7f3dde666700  4 mgr.server handle_report from 0x5652d38f3000 osd,15
   -16> 2018-09-28 10:37:46.773787 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,15
   -15> 2018-09-28 10:37:46.773789 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data
   -14> 2018-09-28 10:37:46.774280 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6810/20319 conn(0x5652d38f3000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.15 seq 5 0x5652d3e183c0 pg_stats(53 pgs tid 0 v 0) v1
   -13> 2018-09-28 10:37:46.774304 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.15 100.97.8.123:6810/20319 5 ==== pg_stats(53 pgs tid 0 v 0) v1 ==== 31806+0+0 (3567068761 0 0) 0x5652d3e183c0 con 0x5652d38f3000
   -12> 2018-09-28 10:37:46.774751 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6800/17601 conn(0x5652d383d000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.10 seq 4 0x5652d3b939c0 mgrreport(osd.10 +0-0 packed 742 osd_metrics=1) v5
   -11> 2018-09-28 10:37:46.774770 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.10 100.97.8.123:6800/17601 4 ==== mgrreport(osd.10 +0-0 packed 742 osd_metrics=1) v5 ==== 784+0+0 (1173938277 0 0) 0x5652d3b939c0 con 0x5652d383d000
   -10> 2018-09-28 10:37:46.774778 7f3dde666700  4 mgr.server handle_report from 0x5652d383d000 osd,10
    -9> 2018-09-28 10:37:46.774780 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for osd,10
    -8> 2018-09-28 10:37:46.774782 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 742 bytes of data
    -7> 2018-09-28 10:37:46.775016 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.123:6800/17601 conn(0x5652d383d000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=738 cs=1 l=1). rx osd.10 seq 5 0x5652d3b93c80 pg_stats(65 pgs tid 0 v 0) v1
    -6> 2018-09-28 10:37:46.775034 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== osd.10 100.97.8.123:6800/17601 5 ==== pg_stats(65 pgs tid 0 v 0) v1 ==== 38938+0+0 (804497348 0 0) 0x5652d3b93c80 con 0x5652d383d000
    -5> 2018-09-28 10:37:46.781888 7f3deb39c700  5 -- 100.97.8.131:6800/1265072 >> 100.97.8.124:0/2791165959 conn(0x5652d39f5800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=222 cs=1 l=1). rx client.6381 seq 3 0x5652d409f440 mgrreport(rgw.cn-bj-test2 +0-0 packed 214) v5
    -4> 2018-09-28 10:37:46.781946 7f3dde666700  1 -- 100.97.8.131:6800/1265072 <== client.6381 100.97.8.124:0/2791165959 3 ==== mgrreport(rgw.cn-bj-test2 +0-0 packed 214) v5 ==== 253+0+0 (1062603950 0 0) 0x5652d409f440 con 0x5652d39f5800
    -3> 2018-09-28 10:37:46.781962 7f3dde666700  4 mgr.server handle_report from 0x5652d39f5800 rgw,cn-bj-test2
    -2> 2018-09-28 10:37:46.781966 7f3dde666700 20 mgr.server handle_report updating existing DaemonState for rgw,cn-bj-test2
    -1> 2018-09-28 10:37:46.781968 7f3dde666700 20 mgr update loading 0 new types, 0 old types, had 129 types, got 214 bytes of data
     0> 2018-09-28 10:37:46.783446 7f3dde666700 -1 *** Caught signal (Aborted) **
 in thread 7f3dde666700 thread_name:ms_dispatch

 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
 1: (()+0x3f40c1) [0x5652c9b220c1]
 2: (()+0xf6d0) [0x7f3df00026d0]
 3: (gsignal()+0x37) [0x7f3def011277]
 4: (abort()+0x148) [0x7f3def012968]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f3def920ac5]
 6: (()+0x5ea36) [0x7f3def91ea36]
 7: (()+0x5ea63) [0x7f3def91ea63]
 8: (()+0x5ec83) [0x7f3def91ec83]
 9: (std::__throw_out_of_range(char const*)+0x77) [0x7f3def973b47]
 10: (DaemonPerfCounters::update(MMgrReport*)+0xb6c) [0x5652c99d09ec]
 11: (DaemonServer::handle_report(MMgrReport*)+0x243) [0x5652c99d8903]
 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x5652c99e4917]
 13: (DispatchQueue::entry()+0x792) [0x5652c9e20cb2]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x5652c9c0abed]
 15: (()+0x7e25) [0x7f3defffae25]
 16: (clone()+0x6d) [0x7f3def0d9bad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
  20/20 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mgr.JXQ-97-8-131.log
--- end dump of recent events ---


Files

mgr-err.log (311 KB) mgr-err.log Diluga Salome, 12/25/2018 02:34 AM

Related issues 3 (0 open3 closed)

Related to mgr - Bug #24982: mgr: terminate called after throwing an instance of 'std::out_of_range' in DaemonPerfCounters::updateResolvedBoris Ranto07/18/2018

Actions
Copied to mgr - Backport #37826: mimic: mgr crash when handle_report updating existing DaemonState for rgwResolvedAshish SinghActions
Copied to mgr - Backport #37827: luminous: mgr crash when handle_report updating existing DaemonState for rgwResolvedAshish SinghActions
Actions #1

Updated by Joao Eduardo Luis over 5 years ago

  • Description updated (diff)
Actions #2

Updated by Mykola Golub over 5 years ago

  • Status changed from New to Need More Info

Could you please attach the full mgr log?

Actions #3

Updated by Mykola Golub over 5 years ago

  • Status changed from Need More Info to In Progress
  • Assignee set to Mykola Golub
Actions #4

Updated by Mykola Golub over 5 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to mimic,luminous
  • Pull request ID set to 25534
Actions #5

Updated by Diluga Salome over 5 years ago

log file attached!

Actions #6

Updated by Kefu Chai over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #37826: mimic: mgr crash when handle_report updating existing DaemonState for rgw added
Actions #8

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #37827: luminous: mgr crash when handle_report updating existing DaemonState for rgw added
Actions #9

Updated by Lenz Grimmer about 5 years ago

  • Related to Bug #24982: mgr: terminate called after throwing an instance of 'std::out_of_range' in DaemonPerfCounters::update added
Actions #10

Updated by Nathan Cutler about 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF