Project

General

Profile

Actions

Bug #38599

closed

Ceph Mons crash on Glance image delete

Added by Jayanath Dissanayake about 5 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We had an issue with our cluster where 4/7 mons crashed and failed to come up for longer than 1-2 minutes to establish quorum. Issue behavior is similar to (https://tracker.ceph.com/issues/18746) but not entirely sure if it is the same issue.

The ceph cluster is used by an Openstack deployment. Ultimiately we were able to start the Mons after disabling Glance on one of the Openstack controller nodes. Right before the crash happened these actions were done from Glance to create an image from a volume in use by a VM at that time.

openstack image create --disk-format qcow2 --volume $VOLUME $IMAGE_NAME
openstack image save $IMAGE_NAME --file $BACKUP_FILE_NAME
openstack image delete $IMAGE_NAME

All of above except image delete command was successful. I think it was a 509 HTTP response but unfortunately do not have that output right now.

    -7> 2019-03-02 18:53:27.163112 7fe359f02700  1 -- 192.168.253.203:6789/0 <== client.8204779 192.168.253.32:0/614279813 4 ==== pool_op(delete unmanaged snap pool 116 auid 0 tid 35 name  v0) v4 ==== 65+0+0 (3422141634 0 0) 0x560105794000 con 0x560105758000
    -6> 2019-03-02 18:53:27.163129 7fe356efc700  5 -- 192.168.253.203:6789/0 >> 192.168.253.165:6802/708867 conn(0x560105942800 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=356689 cs=1 l=1). rx osd.9 seq 5 0x5601054dba00 osd_beacon(pgs [121.77,105.12d,116.5f,85.a4,85.14,85.b1,116.a,85.4d,116.8a,105.177,115.24,105.3f3,105.336,85.60,105.232,108.7d,85.75,108.5f,85.ee,105.65,105.1fc,105.3b9,116.de,105.c3,108.9,85.f4,105.22c,105.60,116.ad,105.39c,108.58,121.6a,116.67,121.3b,116.9f,105.1db,85.74,105.23,105.36b,105.211,116.e3,105.289,105.12e,105.7f,115.1a,121.51,105.35d,116.76,105.392,105.a0,105.3fc,108.20,85.ac,85.eb,105.10b,105.3d5,121.c,105.26a,105.3e3,85.b6,105.1ca,105.3d6,108.39,105.388,116.ec,115.3d,105.3b0,105.145,105.2ac,115.5,116.30,108.28] lec 79197 v79198) v1
    -5> 2019-03-02 18:53:27.163517 7fe35d709700  5 -- 192.168.253.203:6789/0 >> 192.168.253.164:6812/543269 conn(0x560105973800 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=454471 cs=1 l=1). rx osd.52 seq 4 0x5601056d9a00 mon_subscribe({mgrmap=0+,monmap=15+,osd_pg_creates=0+}) v2
    -4> 2019-03-02 18:53:27.163537 7fe35d709700  5 -- 192.168.253.203:6789/0 >> 192.168.253.164:6812/543269 conn(0x560105973800 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=454471 cs=1 l=1). rx osd.52 seq 5 0x5601056d4c00 osd_beacon(pgs [121.7b,121.5c,85.ca,105.295,105.2b0,105.1a3,105.327,105.304,105.3e6,105.1e0,105.19b,85.90,105.81,85.1f,105.2bc,119.18,105.1c7,105.2f4,105.21d,105.3af,105.3f2,116.26,116.27,116.c0,85.e7,105.39e,105.44,116.5,105.143,116.1c,116.ae,105.2e6,105.26d,116.b0,105.1b3,105.3cb,105.d2,108.1c,105.2a3,116.75,105.d4,105.279,105.37c,105.a9,105.a3,85.56,105.3a4,85.63,85.c2,105.74,105.31b,116.6d,105.34e,105.212,108.65,105.1fe] lec 79197 v79198) v1
    -3> 2019-03-02 18:53:27.163637 7fe3576fd700  5 -- 192.168.253.203:6789/0 >> 192.168.253.165:6808/526512 conn(0x56010563b000 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=465203 cs=1 l=1). rx osd.15 seq 4 0x5601056d9c00 mon_subscribe({mgrmap=0+,monmap=15+,osd_pg_creates=0+}) v2
    -2> 2019-03-02 18:53:27.163654 7fe3576fd700  5 -- 192.168.253.203:6789/0 >> 192.168.253.165:6808/526512 conn(0x56010563b000 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=465203 cs=1 l=1). rx osd.15 seq 5 0x5601056d6e00 osd_beacon(pgs [121.7a,121.3a,121.d,115.2e,105.fe,105.2b6,105.2b2,105.9b,85.8b,119.0,105.272,105.3dd,116.60,105.51,105.372,105.29e,105.350,85.53,105.331,105.28b,85.4,105.9f,116.ff,85.15,121.5,116.8,115.f,108.10,116.7a,105.158,105.213,116.b7,105.238,116.16,105.333,85.fb,116.e8,105.1ed,85.9f,105.31d,105.1b2,105.224,85.8f,108.15,116.d,105.be,85.24,115.21,105.173,105.369,85.f1,105.364,105.236,105.f7,85.92,108.35,119.1f,105.296,105.11e,105.2ca,105.167,105.de,105.334,116.2a,105.3c4,105.3e7,105.40] lec 79197 v79198) v1
    -1> 2019-03-02 18:53:27.163666 7fe3576fd700  5 -- 192.168.253.203:6789/0 >> 192.168.253.165:6808/526512 conn(0x56010563b000 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=465203 cs=1 l=1). rx osd.15 seq 6 0x5601056d6800 osd_beacon(pgs [] lec 79197 v79198) v1
     0> 2019-03-02 18:53:27.166322 7fe359f02700 -1 *** Caught signal (Aborted) **
 in thread 7fe359f02700 thread_name:ms_dispatch

 ceph version 12.2.5-2-ge988fb6 (e988fb6c5457b9dd6f1e11ce1b5fd58b4c3a828c) luminous (stable)
 1: (()+0x8f53b1) [0x5600fb5d83b1]
 2: (()+0xf6d0) [0x7fe3634736d0]
 3: (gsignal()+0x37) [0x7fe3607ae277]
 4: (abort()+0x148) [0x7fe3607af968]
 5: (std::_Rb_tree_iterator<std::pair<std::string const, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > > > std::_Rb_tree<std::string, std::pair<std::string const, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > >, std::_Select1st<std::pair<std::string const, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > > > >::_M_emplace_hint_unique<std::pair<std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > > >(std::_Rb_tree_const_iterator<std::pair<std::string const, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > > >, std::pair<std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > >&&)+0) [0x5600fb420780]
 6: (pg_pool_t::remove_unmanaged_snap(snapid_t)+0x4d) [0x5600fb40d2dd]
 7: (OSDMonitor::prepare_pool_op(boost::intrusive_ptr<MonOpRequest>)+0xe40) [0x5600fb200f10]
 8: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x336) [0x5600fb2388c6]
 9: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x5600fb1c4858]
 10: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x5af) [0x5600fb0a4fef]
 11: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x5600fb0a65db]
 12: (Monitor::ms_dispatch(Message*)+0x23) [0x5600fb0d2743]
 13: (DispatchQueue::entry()+0x792) [0x5600fb583752]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x5600fb37c90d]
 15: (()+0x7e25) [0x7fe36346be25]
 16: (clone()+0x6d) [0x7fe360876bad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 0 lockdep
   0/ 0 context
   0/ 0 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 0 buffer
   0/ 0 timer
   0/ 0 filer
   0/ 1 striper
   0/ 0 objecter
   0/ 0 rados
   0/ 0 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 0 journaler
   0/ 5 objectcacher
   0/ 0 client
   0/ 0 osd
   0/ 0 optracker
   0/ 0 objclass
   0/ 0 filestore
   0/ 0 journal
   5/ 5 ms
   5/ 5 mon
   0/ 0 monc
   0/ 0 paxos
   0/ 0 tp
   0/ 0 auth
   1/ 5 crypto
   0/ 0 finisher
   1/ 1 reserver
   0/ 0 heartbeatmap
   0/ 0 perfcounter
   0/ 0 rgw
   1/10 civetweb
   1/ 5 javaclient
   0/ 0 asok
   0/ 0 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.storage01roc.log
--- end dump of recent events ---

Files

ceph-mon.storage01roc.log (475 KB) ceph-mon.storage01roc.log Jayanath Dissanayake, 03/06/2019 03:25 AM
Actions #1

Updated by Brad Hubbard about 5 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (openstack)
Actions #2

Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to rbd

It looks like it might be the same bug, unfortunately. :(

If the rbd team thinks this is not, can you guys translate those OpenStack commands into rados ones and kick it back? :)

Actions #3

Updated by Jason Dillaman about 5 years ago

  • Status changed from New to Duplicate

This looks like a dup of #23915 which was fixed in 12.2.6 (you indicated you were running 12.2.5).

Actions

Also available in: Atom PDF