Actions
Bug #16982
closedOSD crash after upgrade to Jewel: give useful error when trying to commit 4000 maps to a 100MB journal
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
While upgrading from Hammer to Jewel I'm seeing this happen on multiple OSDs.
These systems are running Ubuntu 14.04 with the 4.4 kernel backported from Ubuntu 16.04
Not sure yet what is happening here.
-3> 2016-08-10 19:41:33.479095 7f20c2fd0700 5 osd.319 89070 heartbeat: osd_stat(738 GB used, 377 GB avail, 1116 GB total, peers []/[] op hist []) -2> 2016-08-10 19:41:36.379292 7f20c2fd0700 5 osd.319 89070 heartbeat: osd_stat(738 GB used, 377 GB avail, 1116 GB total, peers []/[] op hist []) -1> 2016-08-10 19:41:38.679495 7f20c2fd0700 5 osd.319 89070 heartbeat: osd_stat(738 GB used, 377 GB avail, 1116 GB total, peers []/[] op hist []) 0> 2016-08-10 19:41:40.686944 7f20d37f1700 -1 common/buffer.cc: In function 'void ceph::buffer::ptr::copy_in(unsigned int, unsigned int, const char*, bool)' thread 7f20d37f1700 time 2016-08-10 19:41:40.682052 common/buffer.cc: 977: FAILED assert(o+l <= _len) ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x563076f97dab] 2: (ceph::buffer::ptr::copy_in(unsigned int, unsigned int, char const*, bool)+0x248) [0x563076fa09c8] 3: (ceph::buffer::list::rebuild(ceph::buffer::ptr&)+0x3c) [0x563076fa0d8c] 4: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1e1) [0x563076fa23e1] 5: (FileJournal::prepare_entry(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, ceph::buffer::list*)+0x87b) [0x563076d47deb] 6: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x491) [0x563076c688c1] 7: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x18d) [0x56307698b08d] 8: (OSD::handle_osd_map(MOSDMap*)+0x1485) [0x56307694c595] 9: (OSD::_dispatch(Message*)+0x261) [0x563076962291] 10: (OSD::ms_dispatch(Message*)+0x20f) [0x5630769628cf] 11: (DispatchQueue::entry()+0x78b) [0x56307705337b] 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563076f7447d] 13: (()+0x8184) [0x7f20e78f9184] 14: (clone()+0x6d) [0x7f20e5a2537d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.319.log --- end dump of recent events --- 2016-08-10 19:41:40.698677 7f20d37f1700 -1 *** Caught signal (Aborted) ** in thread 7f20d37f1700 thread_name:ms_dispatch ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: (()+0x8ebb02) [0x563076ea0b02] 2: (()+0x10330) [0x7f20e7901330] 3: (gsignal()+0x37) [0x7f20e5961c37] 4: (abort()+0x148) [0x7f20e5965028] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x563076f97f85] 6: (ceph::buffer::ptr::copy_in(unsigned int, unsigned int, char const*, bool)+0x248) [0x563076fa09c8] 7: (ceph::buffer::list::rebuild(ceph::buffer::ptr&)+0x3c) [0x563076fa0d8c] 8: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1e1) [0x563076fa23e1] 9: (FileJournal::prepare_entry(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, ceph::buffer::list*)+0x87b) [0x563076d47deb] 10: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x491) [0x563076c688c1] 11: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x18d) [0x56307698b08d] 12: (OSD::handle_osd_map(MOSDMap*)+0x1485) [0x56307694c595] 13: (OSD::_dispatch(Message*)+0x261) [0x563076962291] 14: (OSD::ms_dispatch(Message*)+0x20f) [0x5630769628cf] 15: (DispatchQueue::entry()+0x78b) [0x56307705337b] 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x563076f7447d] 17: (()+0x8184) [0x7f20e78f9184] 18: (clone()+0x6d) [0x7f20e5a2537d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2016-08-10 19:41:40.698677 7f20d37f1700 -1 *** Caught signal (Aborted) ** in thread 7f20d37f1700 thread_name:ms_dispatch ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: (()+0x8ebb02) [0x563076ea0b02] 2: (()+0x10330) [0x7f20e7901330] 3: (gsignal()+0x37) [0x7f20e5961c37] 4: (abort()+0x148) [0x7f20e5965028] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x563076f97f85] 6: (ceph::buffer::ptr::copy_in(unsigned int, unsigned int, char const*, bool)+0x248) [0x563076fa09c8] 7: (ceph::buffer::list::rebuild(ceph::buffer::ptr&)+0x3c) [0x563076fa0d8c] 8: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1e1) [0x563076fa23e1] 9: (FileJournal::prepare_entry(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, ceph::buffer::list*)+0x87b) [0x563076d47deb] 10: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x491) [0x563076c688c1] 11: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x18d) [0x56307698b08d] 12: (OSD::handle_osd_map(MOSDMap*)+0x1485) [0x56307694c595] 13: (OSD::_dispatch(Message*)+0x261) [0x563076962291] 14: (OSD::ms_dispatch(Message*)+0x20f) [0x5630769628cf] 15: (DispatchQueue::entry()+0x78b) [0x56307705337b] 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x563076f7447d] 17: (()+0x8184) [0x7f20e78f9184] 18: (clone()+0x6d) [0x7f20e5a2537d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.319.log --- end dump of recent events --- 104.20.63.56
Files
Actions