Project

General

Profile

Actions

Bug #16982

closed

OSD crash after upgrade to Jewel: give useful error when trying to commit 4000 maps to a 100MB journal

Added by Wido den Hollander over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While upgrading from Hammer to Jewel I'm seeing this happen on multiple OSDs.

These systems are running Ubuntu 14.04 with the 4.4 kernel backported from Ubuntu 16.04

Not sure yet what is happening here.

    -3> 2016-08-10 19:41:33.479095 7f20c2fd0700  5 osd.319 89070 heartbeat: osd_stat(738 GB used, 377 GB avail, 1116 GB total, peers []/[] op hist [])
    -2> 2016-08-10 19:41:36.379292 7f20c2fd0700  5 osd.319 89070 heartbeat: osd_stat(738 GB used, 377 GB avail, 1116 GB total, peers []/[] op hist [])
    -1> 2016-08-10 19:41:38.679495 7f20c2fd0700  5 osd.319 89070 heartbeat: osd_stat(738 GB used, 377 GB avail, 1116 GB total, peers []/[] op hist [])
     0> 2016-08-10 19:41:40.686944 7f20d37f1700 -1 common/buffer.cc: In function 'void ceph::buffer::ptr::copy_in(unsigned int, unsigned int, const char*, bool)' thread 7f20d37f1700 time 2016-08-10 19:41:40.682052
common/buffer.cc: 977: FAILED assert(o+l <= _len)

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x563076f97dab]
 2: (ceph::buffer::ptr::copy_in(unsigned int, unsigned int, char const*, bool)+0x248) [0x563076fa09c8]
 3: (ceph::buffer::list::rebuild(ceph::buffer::ptr&)+0x3c) [0x563076fa0d8c]
 4: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1e1) [0x563076fa23e1]
 5: (FileJournal::prepare_entry(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, ceph::buffer::list*)+0x87b) [0x563076d47deb]
 6: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x491) [0x563076c688c1]
 7: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x18d) [0x56307698b08d]
 8: (OSD::handle_osd_map(MOSDMap*)+0x1485) [0x56307694c595]
 9: (OSD::_dispatch(Message*)+0x261) [0x563076962291]
 10: (OSD::ms_dispatch(Message*)+0x20f) [0x5630769628cf]
 11: (DispatchQueue::entry()+0x78b) [0x56307705337b]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563076f7447d]
 13: (()+0x8184) [0x7f20e78f9184]
 14: (clone()+0x6d) [0x7f20e5a2537d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.319.log
--- end dump of recent events ---
2016-08-10 19:41:40.698677 7f20d37f1700 -1 *** Caught signal (Aborted) **
 in thread 7f20d37f1700 thread_name:ms_dispatch

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x8ebb02) [0x563076ea0b02]
 2: (()+0x10330) [0x7f20e7901330]
 3: (gsignal()+0x37) [0x7f20e5961c37]
 4: (abort()+0x148) [0x7f20e5965028]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x563076f97f85]
 6: (ceph::buffer::ptr::copy_in(unsigned int, unsigned int, char const*, bool)+0x248) [0x563076fa09c8]
 7: (ceph::buffer::list::rebuild(ceph::buffer::ptr&)+0x3c) [0x563076fa0d8c]
 8: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1e1) [0x563076fa23e1]
 9: (FileJournal::prepare_entry(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, ceph::buffer::list*)+0x87b) [0x563076d47deb]
 10: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x491) [0x563076c688c1]
 11: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x18d) [0x56307698b08d]
 12: (OSD::handle_osd_map(MOSDMap*)+0x1485) [0x56307694c595]
 13: (OSD::_dispatch(Message*)+0x261) [0x563076962291]
 14: (OSD::ms_dispatch(Message*)+0x20f) [0x5630769628cf]
 15: (DispatchQueue::entry()+0x78b) [0x56307705337b]
 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x563076f7447d]
 17: (()+0x8184) [0x7f20e78f9184]
 18: (clone()+0x6d) [0x7f20e5a2537d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2016-08-10 19:41:40.698677 7f20d37f1700 -1 *** Caught signal (Aborted) **
 in thread 7f20d37f1700 thread_name:ms_dispatch

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x8ebb02) [0x563076ea0b02]
 2: (()+0x10330) [0x7f20e7901330]
 3: (gsignal()+0x37) [0x7f20e5961c37]
 4: (abort()+0x148) [0x7f20e5965028]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x563076f97f85]
 6: (ceph::buffer::ptr::copy_in(unsigned int, unsigned int, char const*, bool)+0x248) [0x563076fa09c8]
 7: (ceph::buffer::list::rebuild(ceph::buffer::ptr&)+0x3c) [0x563076fa0d8c]
 8: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1e1) [0x563076fa23e1]
 9: (FileJournal::prepare_entry(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, ceph::buffer::list*)+0x87b) [0x563076d47deb]
 10: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x491) [0x563076c688c1]
 11: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x18d) [0x56307698b08d]
 12: (OSD::handle_osd_map(MOSDMap*)+0x1485) [0x56307694c595]
 13: (OSD::_dispatch(Message*)+0x261) [0x563076962291]
 14: (OSD::ms_dispatch(Message*)+0x20f) [0x5630769628cf]
 15: (DispatchQueue::entry()+0x78b) [0x56307705337b]
 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x563076f7447d]
 17: (()+0x8184) [0x7f20e78f9184]
 18: (clone()+0x6d) [0x7f20e5a2537d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.319.log
--- end dump of recent events ---
104.20.63.56

Files

ceph-osd.136.log.gz (47.9 KB) ceph-osd.136.log.gz Wido den Hollander, 08/11/2016 06:51 AM

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #17023: OSD failed to subscribe skipped osdmaps after "ceph osd pause"ResolvedKefu Chai08/10/2016

Actions
Actions

Also available in: Atom PDF