Project

General

Profile

Bug #21416

osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to) after upgrade to luminous

Added by Patrick Fruh about 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
Start date:
09/16/2017
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:

Description

After upgrading my hosts to luminous, I'm seeing loads of segfaults on some of my OSDs, which haven't had any issues pre luminous.

ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

Sep 12 22:43:25 node1.ceph ceph-osd[1686]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/osd/PGLog.cc: In function 'void PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t>*, std::set<std::basic_string<char> >*, bool*)' thread 7f920b3cd700 time 2017-09-12 22:43:25.412926
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55b57fccc510]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2: (PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t, std::less<eversion_t>, std::allocator<eversion_t> >*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool*)+0xbd7) [0x55b57f8719d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 3: (PGLog::trim(eversion_t, pg_info_t&)+0xd9) [0x55b57f871b19]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 4: (OSD::handle_pg_trim(boost::intrusive_ptr<OpRequest>)+0x3a8) [0x55b57f740908]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 5: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1b1) [0x55b57f76c121]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 6: (OSD::_dispatch(Message*)+0x3bc) [0x55b57f76cb4c]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 7: (OSD::ms_dispatch(Message*)+0x87) [0x55b57f76ce87]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 8: (DispatchQueue::entry()+0x792) [0x55b57ff42cb2]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b57fd5775d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 10: (()+0x7dc5) [0x7f9229d19dc5]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 11: (clone()+0x6d) [0x7f9228e0d76d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2017-09-12 22:43:25.416144 7f920b3cd700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/osd/PGLog.cc: In function 'void PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t>*, std::set<std::basic_string<char> >*, bool*)' thread 7f920b3cd700 time 2017-09-12 22:43:25.412926
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55b57fccc510]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2: (PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t, std::less<eversion_t>, std::allocator<eversion_t> >*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool*)+0xbd7) [0x55b57f8719d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 3: (PGLog::trim(eversion_t, pg_info_t&)+0xd9) [0x55b57f871b19]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 4: (OSD::handle_pg_trim(boost::intrusive_ptr<OpRequest>)+0x3a8) [0x55b57f740908]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 5: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1b1) [0x55b57f76c121]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 6: (OSD::_dispatch(Message*)+0x3bc) [0x55b57f76cb4c]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 7: (OSD::ms_dispatch(Message*)+0x87) [0x55b57f76ce87]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 8: (DispatchQueue::entry()+0x792) [0x55b57ff42cb2]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b57fd5775d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 10: (()+0x7dc5) [0x7f9229d19dc5]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 11: (clone()+0x6d) [0x7f9228e0d76d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 0> 2017-09-12 22:43:25.416144 7f920b3cd700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/osd/PGLog.cc: In function 'void PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t>*, std::set<std::basic_string<char> >*, bool*)' thread 7f920b3cd700 time 2017-09-12 22:43:25.412926
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55b57fccc510]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2: (PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t, std::less<eversion_t>, std::allocator<eversion_t> >*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool*)+0xbd7) [0x55b57f8719d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 3: (PGLog::trim(eversion_t, pg_info_t&)+0xd9) [0x55b57f871b19]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 4: (OSD::handle_pg_trim(boost::intrusive_ptr<OpRequest>)+0x3a8) [0x55b57f740908]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 5: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1b1) [0x55b57f76c121]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 6: (OSD::_dispatch(Message*)+0x3bc) [0x55b57f76cb4c]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 7: (OSD::ms_dispatch(Message*)+0x87) [0x55b57f76ce87]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 8: (DispatchQueue::entry()+0x792) [0x55b57ff42cb2]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b57fd5775d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 10: (()+0x7dc5) [0x7f9229d19dc5]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 11: (clone()+0x6d) [0x7f9228e0d76d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: *** Caught signal (Aborted) **
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: in thread 7f920b3cd700 thread_name:ms_dispatch
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 1: (()+0xa23b21) [0x55b57fc8db21]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2: (()+0xf370) [0x7f9229d21370]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 3: (gsignal()+0x37) [0x7f9228d4b1d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 4: (abort()+0x148) [0x7f9228d4c8c8]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x55b57fccc684]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 6: (PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t, std::less<eversion_t>, std::allocator<eversion_t> >*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool*)+0xbd7) [0x55b57f8719d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 7: (PGLog::trim(eversion_t, pg_info_t&)+0xd9) [0x55b57f871b19]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 8: (OSD::handle_pg_trim(boost::intrusive_ptr<OpRequest>)+0x3a8) [0x55b57f740908]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 9: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1b1) [0x55b57f76c121]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 10: (OSD::_dispatch(Message*)+0x3bc) [0x55b57f76cb4c]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 11: (OSD::ms_dispatch(Message*)+0x87) [0x55b57f76ce87]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 12: (DispatchQueue::entry()+0x792) [0x55b57ff42cb2]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b57fd5775d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 14: (()+0x7dc5) [0x7f9229d19dc5]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 15: (clone()+0x6d) [0x7f9228e0d76d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2017-09-12 22:43:25.441817 7f920b3cd700 -1 *** Caught signal (Aborted) **
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: in thread 7f920b3cd700 thread_name:ms_dispatch
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 1: (()+0xa23b21) [0x55b57fc8db21]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2: (()+0xf370) [0x7f9229d21370]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 3: (gsignal()+0x37) [0x7f9228d4b1d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 4: (abort()+0x148) [0x7f9228d4c8c8]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x55b57fccc684]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 6: (PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t, std::less<eversion_t>, std::allocator<eversion_t> >*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool*)+0xbd7) [0x55b57f8719d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 7: (PGLog::trim(eversion_t, pg_info_t&)+0xd9) [0x55b57f871b19]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 8: (OSD::handle_pg_trim(boost::intrusive_ptr<OpRequest>)+0x3a8) [0x55b57f740908]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 9: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1b1) [0x55b57f76c121]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 10: (OSD::_dispatch(Message*)+0x3bc) [0x55b57f76cb4c]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 11: (OSD::ms_dispatch(Message*)+0x87) [0x55b57f76ce87]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 12: (DispatchQueue::entry()+0x792) [0x55b57ff42cb2]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b57fd5775d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 14: (()+0x7dc5) [0x7f9229d19dc5]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 15: (clone()+0x6d) [0x7f9228e0d76d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 0> 2017-09-12 22:43:25.441817 7f920b3cd700 -1 *** Caught signal (Aborted) **
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: in thread 7f920b3cd700 thread_name:ms_dispatch
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 1: (()+0xa23b21) [0x55b57fc8db21]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 2: (()+0xf370) [0x7f9229d21370]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 3: (gsignal()+0x37) [0x7f9228d4b1d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 4: (abort()+0x148) [0x7f9228d4c8c8]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x55b57fccc684]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 6: (PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set<eversion_t, std::less<eversion_t>, std::allocator<eversion_t> >*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool*)+0xbd7) [0x55b57f8719d7]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 7: (PGLog::trim(eversion_t, pg_info_t&)+0xd9) [0x55b57f871b19]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 8: (OSD::handle_pg_trim(boost::intrusive_ptr<OpRequest>)+0x3a8) [0x55b57f740908]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 9: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1b1) [0x55b57f76c121]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 10: (OSD::_dispatch(Message*)+0x3bc) [0x55b57f76cb4c]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 11: (OSD::ms_dispatch(Message*)+0x87) [0x55b57f76ce87]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 12: (DispatchQueue::entry()+0x792) [0x55b57ff42cb2]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b57fd5775d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 14: (()+0x7dc5) [0x7f9229d19dc5]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: 15: (clone()+0x6d) [0x7f9228e0d76d]
Sep 12 22:43:25 node1.ceph ceph-osd[1686]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 12 22:43:25 node1.ceph systemd[1]: ceph-osd@1.service: main process exited, code=killed, status=6/ABRT
Sep 12 22:43:25 node1.ceph systemd[1]: Unit ceph-osd@1.service entered failed state.
Sep 12 22:43:25 node1.ceph systemd[1]: ceph-osd@1.service failed.

I've attached a pretty long coherent log of one of my OSDs, showing the segfaults that started with the upgrade to luminous.
I've even upgraded this OSD from filestore to bluestore and it's segfaulting (different errors) with that as well.

You can see the segfaults happening in threads ms_dispatch, bstore_kv_sync, ceph-osd, rocksdb:bg0, tp_osd_tp and tp_peering

Sep 16 12:02:56 node1.ceph systemd[1]: Starting Ceph object storage daemon osd.1...
Sep 16 12:02:56 node1.ceph systemd[1]: Started Ceph object storage daemon osd.1.
Sep 16 12:02:56 node1.ceph ceph-osd[13875]: 2017-09-16 12:02:56.257560 7fe478ee1d00 -1 Public network was set, but cluster network was not set
Sep 16 12:02:56 node1.ceph ceph-osd[13875]: 2017-09-16 12:02:56.257566 7fe478ee1d00 -1     Using public network also for cluster network
Sep 16 12:02:56 node1.ceph ceph-osd[13875]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: *** Caught signal (Segmentation fault) **
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: in thread 7fe478ee1d00 thread_name:ceph-osd
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 1: (()+0xa23b21) [0x55c8ae72eb21]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 2: (()+0xf5e0) [0x7fe4762f75e0]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 3: (()+0x1cdff) [0x7fe478ac4dff]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x55c8aea9d5e6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x55c8aea9e67c]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x55c8aea9e8d7]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x55c8aeaa6ef9]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 8: (()+0xdbd596) [0x55c8aeac8596]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 9: (()+0xdbd82c) [0x55c8aeac882c]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 10: (()+0xdbd8a6) [0x55c8aeac88a6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 11: (rocksdb::MergingIterator::Next()+0x24d) [0x55c8aeab053d]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 12: (rocksdb::DBIter::Next()+0xa6) [0x55c8aeb2f5b6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x9a) [0x55c8ae684faa]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 14: (BitmapFreelistManager::enumerate_next(unsigned long*, unsigned long*)+0xd7) [0x55c8ae6dbbd7]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 15: (BlueStore::_open_alloc()+0x1dd) [0x55c8ae5dbecd]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 16: (BlueStore::_mount(bool)+0x443) [0x55c8ae648b83]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 17: (OSD::init()+0x3ba) [0x55c8ae211eaa]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 18: (main()+0x2def) [0x55c8ae11981f]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 19: (__libc_start_main()+0xf5) [0x7fe47530cc05]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 20: (()+0x4acb56) [0x55c8ae1b7b56]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 2017-09-16 12:02:57.122423 7fe478ee1d00 -1 *** Caught signal (Segmentation fault) **
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: in thread 7fe478ee1d00 thread_name:ceph-osd
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 1: (()+0xa23b21) [0x55c8ae72eb21]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 2: (()+0xf5e0) [0x7fe4762f75e0]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 3: (()+0x1cdff) [0x7fe478ac4dff]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x55c8aea9d5e6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x55c8aea9e67c]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x55c8aea9e8d7]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x55c8aeaa6ef9]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 8: (()+0xdbd596) [0x55c8aeac8596]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 9: (()+0xdbd82c) [0x55c8aeac882c]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 10: (()+0xdbd8a6) [0x55c8aeac88a6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 11: (rocksdb::MergingIterator::Next()+0x24d) [0x55c8aeab053d]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 12: (rocksdb::DBIter::Next()+0xa6) [0x55c8aeb2f5b6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x9a) [0x55c8ae684faa]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 14: (BitmapFreelistManager::enumerate_next(unsigned long*, unsigned long*)+0xd7) [0x55c8ae6dbbd7]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 15: (BlueStore::_open_alloc()+0x1dd) [0x55c8ae5dbecd]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 16: (BlueStore::_mount(bool)+0x443) [0x55c8ae648b83]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 17: (OSD::init()+0x3ba) [0x55c8ae211eaa]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 18: (main()+0x2def) [0x55c8ae11981f]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 19: (__libc_start_main()+0xf5) [0x7fe47530cc05]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 20: (()+0x4acb56) [0x55c8ae1b7b56]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: -222> 2017-09-16 12:02:56.257560 7fe478ee1d00 -1 Public network was set, but cluster network was not set
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: -221> 2017-09-16 12:02:56.257566 7fe478ee1d00 -1     Using public network also for cluster network
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 0> 2017-09-16 12:02:57.122423 7fe478ee1d00 -1 *** Caught signal (Segmentation fault) **
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: in thread 7fe478ee1d00 thread_name:ceph-osd
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 1: (()+0xa23b21) [0x55c8ae72eb21]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 2: (()+0xf5e0) [0x7fe4762f75e0]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 3: (()+0x1cdff) [0x7fe478ac4dff]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x55c8aea9d5e6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x55c8aea9e67c]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x55c8aea9e8d7]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x55c8aeaa6ef9]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 8: (()+0xdbd596) [0x55c8aeac8596]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 9: (()+0xdbd82c) [0x55c8aeac882c]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 10: (()+0xdbd8a6) [0x55c8aeac88a6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 11: (rocksdb::MergingIterator::Next()+0x24d) [0x55c8aeab053d]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 12: (rocksdb::DBIter::Next()+0xa6) [0x55c8aeb2f5b6]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x9a) [0x55c8ae684faa]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 14: (BitmapFreelistManager::enumerate_next(unsigned long*, unsigned long*)+0xd7) [0x55c8ae6dbbd7]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 15: (BlueStore::_open_alloc()+0x1dd) [0x55c8ae5dbecd]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 16: (BlueStore::_mount(bool)+0x443) [0x55c8ae648b83]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 17: (OSD::init()+0x3ba) [0x55c8ae211eaa]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 18: (main()+0x2def) [0x55c8ae11981f]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 19: (__libc_start_main()+0xf5) [0x7fe47530cc05]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: 20: (()+0x4acb56) [0x55c8ae1b7b56]
Sep 16 12:02:57 node1.ceph ceph-osd[13875]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 16 12:02:57 node1.ceph systemd[1]: ceph-osd@1.service: main process exited, code=killed, status=11/SEGV
Sep 16 12:02:57 node1.ceph systemd[1]: Unit ceph-osd@1.service entered failed state.
Sep 16 12:02:57 node1.ceph systemd[1]: ceph-osd@1.service failed.
Sep 16 12:03:17 node1.ceph systemd[1]: ceph-osd@1.service holdoff time over, scheduling restart.
Sep 16 12:03:17 node1.ceph systemd[1]: Starting Ceph object storage daemon osd.1...
Sep 16 12:03:17 node1.ceph systemd[1]: Started Ceph object storage daemon osd.1.
Sep 16 12:03:17 node1.ceph ceph-osd[14310]: 2017-09-16 12:03:17.519046 7f5ae3163d00 -1 Public network was set, but cluster network was not set
Sep 16 12:03:17 node1.ceph ceph-osd[14310]: 2017-09-16 12:03:17.519053 7f5ae3163d00 -1     Using public network also for cluster network
Sep 16 12:03:17 node1.ceph ceph-osd[14310]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 2017-09-16 12:04:32.154790 7f5ae3163d00 -1 osd.1 19911 log_to_monitors {default=true}
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: *** Caught signal (Segmentation fault) **
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: in thread 7f5ac1ff6700 thread_name:tp_peering
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 1: (()+0xa23b21) [0x56095c70db21]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 2: (()+0xf5e0) [0x7f5ae05795e0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 3: (()+0x1cdff) [0x7f5ae2d46dff]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x56095ca7c5e6]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x56095ca7d67c]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x56095ca7d8d7]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x56095ca85ef9]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 8: (()+0xdbd596) [0x56095caa7596]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 9: (()+0xdbdb5d) [0x56095caa7b5d]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 10: (()+0xdbdb6f) [0x56095caa7b6f]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x56095ca8fc8e]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x56095cb0f0e4]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x56095c664962]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 14: (BlueStore::_omap_rmkey_range(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, std::string const&, std::string const&)+0x111) [0x56095c5c6c91]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 15: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x175e) [0x56095c62332e]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 16: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x56095c6240a0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 17: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x171) [0x56095c226631]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 18: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x76) [0x56095c1b3456]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 19: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x3bb) [0x56095c1ddf3b]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 20: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x17) [0x56095c23f087]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 21: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa8e) [0x56095c7530fe]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 22: (ThreadPool::WorkThread::entry()+0x10) [0x56095c753fe0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 23: (()+0x7e25) [0x7f5ae0571e25]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 24: (clone()+0x6d) [0x7f5adf66534d]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 2017-09-16 12:04:32.798442 7f5ac1ff6700 -1 *** Caught signal (Segmentation fault) **
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: in thread 7f5ac1ff6700 thread_name:tp_peering
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 1: (()+0xa23b21) [0x56095c70db21]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 2: (()+0xf5e0) [0x7f5ae05795e0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 3: (()+0x1cdff) [0x7f5ae2d46dff]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x56095ca7c5e6]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x56095ca7d67c]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x56095ca7d8d7]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x56095ca85ef9]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 8: (()+0xdbd596) [0x56095caa7596]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 9: (()+0xdbdb5d) [0x56095caa7b5d]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 10: (()+0xdbdb6f) [0x56095caa7b6f]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x56095ca8fc8e]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x56095cb0f0e4]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x56095c664962]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 14: (BlueStore::_omap_rmkey_range(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, std::string const&, std::string const&)+0x111) [0x56095c5c6c91]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 15: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x175e) [0x56095c62332e]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 16: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x56095c6240a0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 17: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x171) [0x56095c226631]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 18: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x76) [0x56095c1b3456]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 19: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x3bb) [0x56095c1ddf3b]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 20: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x17) [0x56095c23f087]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 21: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa8e) [0x56095c7530fe]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 22: (ThreadPool::WorkThread::entry()+0x10) [0x56095c753fe0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 23: (()+0x7e25) [0x7f5ae0571e25]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 24: (clone()+0x6d) [0x7f5adf66534d]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: -5413> 2017-09-16 12:03:17.519046 7f5ae3163d00 -1 Public network was set, but cluster network was not set
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: -5412> 2017-09-16 12:03:17.519053 7f5ae3163d00 -1     Using public network also for cluster network
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: -4086> 2017-09-16 12:04:32.154790 7f5ae3163d00 -1 osd.1 19911 log_to_monitors {default=true}
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 0> 2017-09-16 12:04:32.798442 7f5ac1ff6700 -1 *** Caught signal (Segmentation fault) **
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: in thread 7f5ac1ff6700 thread_name:tp_peering
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 1: (()+0xa23b21) [0x56095c70db21]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 2: (()+0xf5e0) [0x7f5ae05795e0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 3: (()+0x1cdff) [0x7f5ae2d46dff]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x56095ca7c5e6]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x56095ca7d67c]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x56095ca7d8d7]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x56095ca85ef9]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 8: (()+0xdbd596) [0x56095caa7596]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 9: (()+0xdbdb5d) [0x56095caa7b5d]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 10: (()+0xdbdb6f) [0x56095caa7b6f]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x56095ca8fc8e]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x56095cb0f0e4]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x56095c664962]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 14: (BlueStore::_omap_rmkey_range(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, std::string const&, std::string const&)+0x111) [0x56095c5c6c91]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 15: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x175e) [0x56095c62332e]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 16: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x56095c6240a0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 17: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x171) [0x56095c226631]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 18: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x76) [0x56095c1b3456]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 19: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x3bb) [0x56095c1ddf3b]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 20: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x17) [0x56095c23f087]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 21: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa8e) [0x56095c7530fe]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 22: (ThreadPool::WorkThread::entry()+0x10) [0x56095c753fe0]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 23: (()+0x7e25) [0x7f5ae0571e25]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: 24: (clone()+0x6d) [0x7f5adf66534d]
Sep 16 12:04:32 node1.ceph ceph-osd[14310]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 16 12:04:32 node1.ceph systemd[1]: ceph-osd@1.service: main process exited, code=killed, status=11/SEGV
Sep 16 12:04:32 node1.ceph systemd[1]: Unit ceph-osd@1.service entered failed state.
Sep 16 12:04:32 node1.ceph systemd[1]: ceph-osd@1.service failed.

osd.log View (688 KB) Patrick Fruh, 09/16/2017 10:44 AM


Related issues

Copied to Ceph - Backport #35072: luminous: osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to) after upgrade to luminous Resolved
Copied to Ceph - Backport #35073: mimic: osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to) after upgrade to luminous Resolved

History

#1 Updated by Patrick Fruh about 1 year ago

Apparently it's only happening to my first 3 of 41 OSDs (0,1,2) in my "node1" since the luminous upgrade.

I checked smartctl and it's clean without any errors, so a hardware defect seems unlikely.

#2 Updated by Patrick Fruh about 1 year ago

Those 3 OSDs share a journaling / DB SSD - so it might be that failing, currently running a smart self test on it

#3 Updated by Patrick Fruh about 1 year ago

Self test of all 3 OSDs and journal / DB SSD came out ok.

#4 Updated by Sage Weil about 1 year ago

  • Subject changed from OSD segfaults after luminous upgrade - bluestore and filestore to osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to) after upgrade to luminous
  • Status changed from New to Verified
  • Priority changed from Normal to Urgent

#5 Updated by Sage Weil about 1 year ago

When you say "upgraded this osd from filestore to bluestore", do you mean that you wiped the device, recreated a bluestore osd in its place, let it recover, and then started seeing the new errors? (The error message seem totally unrelated... but the fact that a fresh replacement OSD would see new, different problems makes me suspicious of the hardware.)

#6 Updated by Patrick Fruh about 1 year ago

Yes, the errors seemingly started to show up when the OSDs were under high load (recovering themselves or recovering other OSDs). However after wiping them again and changing them back to filestore, everything is now working fine again, even under high load.

#7 Updated by Patrick Fruh about 1 year ago

Looking at the most recent logs, some OSDs still crashed yesterday early morning (even after going back to filestore and finishing recovery), I don't know if the error is related:

Sep 20 02:47:57 node1.ceph ceph-osd[7051]: 2017-09-20 02:47:57.075709 7f4591be9700 -1 osd.0 pg_epoch: 23702 pg[7.1f5( v 23702'6584626 (23697'6581556,23702'6584626] local-lis/les=23341/23342 n=3146 ec=104/104 lis/c 23341/23341 les/c/f 23342/23342/0 23312/23341/23341) [0,15,11] r=0 lpr=23341 crt=23702'6584626 lcod 23702'6584625 mlcod 23702'6584625 active+clean+scrubbing] _scan_snaps no head for 7:afaba73b:::rbd_data.68c683238e1f29.00000000000022a1:3d (have MIN)
Sep 20 02:52:10 node1.ceph ceph-osd[7051]: 2017-09-20 02:52:10.185743 7f45903e6700 -1 osd.0 pg_epoch: 23702 pg[7.360( v 23702'8555567 (23697'8552485,23702'8555567] local-lis/les=23431/23432 n=3094 ec=2543/2543 lis/c 23431/23431 les/c/f 23432/23432/0 23312/23431/23431) [0,31,8] r=0 lpr=23431 crt=23702'8555567 lcod 23702'8555566 mlcod 23702'8555566 active+clean+scrubbing+deep] _scan_snaps no head for 7:06d39870:::rbd_data.bd1c92ae8944a.0000000000005648:d2 (have MIN)
Sep 20 02:52:20 node1.ceph ceph-osd[7051]: 2017-09-20 02:52:20.808277 7f45903e6700 -1 osd.0 pg_epoch: 23702 pg[7.360( v 23702'8555581 (23697'8552485,23702'8555581] local-lis/les=23431/23432 n=3094 ec=2543/2543 lis/c 23431/23431 les/c/f 23432/23432/0 23312/23431/23431) [0,31,8] r=0 lpr=23431 crt=23702'8555581 lcod 23702'8555580 mlcod 23702'8555580 active+clean+scrubbing+deep] _scan_snaps no head for 7:06d5926c:::rbd_data.a74f60238e1f29.000000000000112a:cf (have MIN)
Sep 20 02:52:20 node1.ceph ceph-osd[7051]: 2017-09-20 02:52:20.808312 7f45903e6700 -1 osd.0 pg_epoch: 23702 pg[7.360( v 23702'8555581 (23697'8552485,23702'8555581] local-lis/les=23431/23432 n=3094 ec=2543/2543 lis/c 23431/23431 les/c/f 23432/23432/0 23312/23431/23431) [0,31,8] r=0 lpr=23431 crt=23702'8555581 lcod 23702'8555580 mlcod 23702'8555580 active+clean+scrubbing+deep] _scan_snaps no head for 7:06d5926c:::rbd_data.a74f60238e1f29.000000000000112a:9c (have MIN)
Sep 20 02:53:23 node1.ceph ceph-osd[7051]: 2017-09-20 02:53:23.572447 7f45903e6700 -1 osd.0 pg_epoch: 23702 pg[7.360( v 23702'8555677 (23697'8552585,23702'8555677] local-lis/les=23431/23432 n=3094 ec=2543/2543 lis/c 23431/23431 les/c/f 23432/23432/0 23312/23431/23431) [0,31,8] r=0 lpr=23431 crt=23702'8555677 lcod 23702'8555676 mlcod 23702'8555676 active+clean+scrubbing+deep] _scan_snaps no head for 7:06e4dd89:::rbd_data.a74f60238e1f29.0000000000000832:9c (have MIN)
Sep 20 02:53:29 node1.ceph ceph-osd[7051]: 2017-09-20 02:53:29.239197 7f45903e6700 -1 osd.0 pg_epoch: 23702 pg[7.360( v 23702'8555687 (23697'8552685,23702'8555687] local-lis/les=23431/23432 n=3094 ec=2543/2543 lis/c 23431/23431 les/c/f 23432/23432/0 23312/23431/23431) [0,31,8] r=0 lpr=23431 crt=23702'8555687 lcod 23702'8555686 mlcod 23702'8555686 active+clean+scrubbing+deep] _scan_snaps no head for 7:06e69ad6:::rbd_data.68c683238e1f29.0000000000001eb3:3d (have MIN)
Sep 20 02:56:14 node1.ceph ceph-osd[7051]: 2017-09-20 02:56:14.208765 7f4591be9700 -1 osd.0 pg_epoch: 23702 pg[7.3e9( v 23702'5074005 (23702'5070965,23702'5074005] local-lis/les=23393/23394 n=3090 ec=2543/2543 lis/c 23393/23393 les/c/f 23394/23394/0 23312/23393/23393) [0,15,8] r=0 lpr=23393 crt=23702'5074005 lcod 23702'5074004 mlcod 23702'5074004 active+clean+scrubbing] _scan_snaps no head for 7:97c5a729:::rbd_data.68c683238e1f29.0000000000000580:3d (have MIN)
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: *** Caught signal (Segmentation fault) **
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: in thread 7f4591be9700 thread_name:tp_osd_tp
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 1: (()+0xa23b21) [0x55c161f97b21]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 2: (()+0xf5e0) [0x7f45af75e5e0]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 3: (()+0x1cdff) [0x7f45b1f2bdff]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x55c1623065e6]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x55c16230767c]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x55c1623078d7]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x55c16230fef9]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 8: (()+0xdbd596) [0x55c162331596]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 9: (()+0xdbdb5d) [0x55c162331b5d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 10: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x55c162319c8e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 11: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x55c1623990e4]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 12: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x55c161eee962]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 13: (DBObjectMap::DBObjectMapIteratorImpl::lower_bound(std::string const&)+0x47) [0x55c161f11cf7]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 14: (DBObjectMap::scan(std::shared_ptr<DBObjectMap::_Header>, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x110) [0x55c161f16530]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 15: (DBObjectMap::get_values(ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0xaf) [0x55c161f18f6f]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 16: (FileStore::omap_get_values(coll_t const&, ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x18d) [0x55c161d9355d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 17: (MapCacher::MapCacher<std::string, ceph::buffer::list>::get_keys(std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x268) [0x55c161c6f1b8]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 18: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0x100) [0x55c161c6bd10]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 19: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd8) [0x55c161c6c198]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 20: (PG::_scan_snaps(ScrubMap&)+0x70e) [0x55c161b0441e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 21: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x22e) [0x55c161b0532e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 22: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x3dd) [0x55c161b3157d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 23: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x462) [0x55c161b32f72]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 24: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfec) [0x55c161a7ecac]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 25: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55c161fdc029]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 26: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c161fddfc0]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 27: (()+0x7e25) [0x7f45af756e25]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 28: (clone()+0x6d) [0x7f45ae84a34d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 2017-09-20 02:56:31.662730 7f4591be9700 -1 *** Caught signal (Segmentation fault) **
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: in thread 7f4591be9700 thread_name:tp_osd_tp
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 1: (()+0xa23b21) [0x55c161f97b21]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 2: (()+0xf5e0) [0x7f45af75e5e0]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 3: (()+0x1cdff) [0x7f45b1f2bdff]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x55c1623065e6]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x55c16230767c]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x55c1623078d7]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x55c16230fef9]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 8: (()+0xdbd596) [0x55c162331596]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 9: (()+0xdbdb5d) [0x55c162331b5d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 10: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x55c162319c8e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 11: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x55c1623990e4]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 12: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x55c161eee962]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 13: (DBObjectMap::DBObjectMapIteratorImpl::lower_bound(std::string const&)+0x47) [0x55c161f11cf7]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 14: (DBObjectMap::scan(std::shared_ptr<DBObjectMap::_Header>, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x110) [0x55c161f16530]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 15: (DBObjectMap::get_values(ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0xaf) [0x55c161f18f6f]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 16: (FileStore::omap_get_values(coll_t const&, ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x18d) [0x55c161d9355d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 17: (MapCacher::MapCacher<std::string, ceph::buffer::list>::get_keys(std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x268) [0x55c161c6f1b8]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 18: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0x100) [0x55c161c6bd10]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 19: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd8) [0x55c161c6c198]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 20: (PG::_scan_snaps(ScrubMap&)+0x70e) [0x55c161b0441e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 21: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x22e) [0x55c161b0532e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 22: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x3dd) [0x55c161b3157d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 23: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x462) [0x55c161b32f72]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 24: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfec) [0x55c161a7ecac]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 25: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55c161fdc029]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 26: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c161fddfc0]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 27: (()+0x7e25) [0x7f45af756e25]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 28: (clone()+0x6d) [0x7f45ae84a34d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 0> 2017-09-20 02:56:31.662730 7f4591be9700 -1 *** Caught signal (Segmentation fault) **
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: in thread 7f4591be9700 thread_name:tp_osd_tp
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 1: (()+0xa23b21) [0x55c161f97b21]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 2: (()+0xf5e0) [0x7f45af75e5e0]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 3: (()+0x1cdff) [0x7f45b1f2bdff]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x55c1623065e6]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x55c16230767c]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x55c1623078d7]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x55c16230fef9]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 8: (()+0xdbd596) [0x55c162331596]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 9: (()+0xdbdb5d) [0x55c162331b5d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 10: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x55c162319c8e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 11: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x55c1623990e4]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 12: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x55c161eee962]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 13: (DBObjectMap::DBObjectMapIteratorImpl::lower_bound(std::string const&)+0x47) [0x55c161f11cf7]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 14: (DBObjectMap::scan(std::shared_ptr<DBObjectMap::_Header>, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x110) [0x55c161f16530]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 15: (DBObjectMap::get_values(ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0xaf) [0x55c161f18f6f]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 16: (FileStore::omap_get_values(coll_t const&, ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x18d) [0x55c161d9355d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 17: (MapCacher::MapCacher<std::string, ceph::buffer::list>::get_keys(std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x268) [0x55c161c6f1b8]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 18: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0x100) [0x55c161c6bd10]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 19: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd8) [0x55c161c6c198]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 20: (PG::_scan_snaps(ScrubMap&)+0x70e) [0x55c161b0441e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 21: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x22e) [0x55c161b0532e]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 22: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x3dd) [0x55c161b3157d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 23: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x462) [0x55c161b32f72]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 24: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfec) [0x55c161a7ecac]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 25: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55c161fdc029]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 26: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c161fddfc0]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 27: (()+0x7e25) [0x7f45af756e25]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: 28: (clone()+0x6d) [0x7f45ae84a34d]
Sep 20 02:56:31 node1.ceph ceph-osd[7051]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 20 02:56:31 node1.ceph systemd[1]: ceph-osd@0.service: main process exited, code=killed, status=11/SEGV
Sep 20 02:56:31 node1.ceph systemd[1]: Unit ceph-osd@0.service entered failed state.
Sep 20 02:56:31 node1.ceph systemd[1]: ceph-osd@0.service failed.
Sep 20 02:56:51 node1.ceph systemd[1]: ceph-osd@0.service holdoff time over, scheduling restart.
Sep 20 02:56:51 node1.ceph systemd[1]: Starting Ceph object storage daemon osd.0...
Sep 20 02:56:51 node1.ceph systemd[1]: Started Ceph object storage daemon osd.0.
Sep 20 02:56:52 node1.ceph ceph-osd[12827]: 2017-09-20 02:56:52.058834 7fa52d465d00 -1 Public network was set, but cluster network was not set
Sep 20 02:56:52 node1.ceph ceph-osd[12827]: 2017-09-20 02:56:52.058846 7fa52d465d00 -1     Using public network also for cluster network
Sep 20 02:56:52 node1.ceph ceph-osd[12827]: starting osd.0 at - osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Sep 20 02:57:14 node1.ceph ceph-osd[12827]: 2017-09-20 02:57:14.864876 7fa52d465d00 -1 osd.0 23702 log_to_monitors {default=true}
Sep 20 03:11:13 node1.ceph ceph-osd[12827]: 2017-09-20 03:11:13.898451 7fa50bbe5700 -1 osd.0 pg_epoch: 23712 pg[8.c7( v 23712'5244794 (23697'5241702,23712'5244794] local-lis/les=23708/23712 n=3379 ec=107/107 lis/c 23708/23708 les/c/f 23712/23712/0 23708/23708/16900) [6,0,34] r=1 lpr=23708 luod=0'0 crt=23712'5244794 lcod 23712'5244793 active] _scan_snaps no head for 8:e306ea48:::rbd_data.9aca26238e1f29.0000000000003619:65 (have MIN)
Sep 20 03:12:00 node1.ceph ceph-osd[12827]: 2017-09-20 03:12:00.823561 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[8.3ea( v 23712'1976155 (23697'1973071,23712'1976155] local-lis/les=23708/23710 n=3413 ec=2548/2548 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16926) [17,32,0] r=2 lpr=23708 luod=0'0 crt=23712'1976155 lcod 23712'1976154 active] _scan_snaps no head for 8:57c8258f:::rbd_data.9aca26238e1f29.000000000000409d:65 (have MIN)
Sep 20 03:14:51 node1.ceph ceph-osd[12827]: 2017-09-20 03:14:51.229050 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[7.322( v 23712'8893405 (23697'8890317,23712'8893405] local-lis/les=23708/23710 n=3025 ec=2543/2543 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16920) [12,0,27] r=1 lpr=23708 luod=0'0 crt=23712'8893405 lcod 23712'8893404 active] _scan_snaps no head for 7:44c420f4:::rbd_data.a6351b238e1f29.000000000002e2c6:9e (have MIN)
Sep 20 03:14:52 node1.ceph ceph-osd[12827]: 2017-09-20 03:14:52.758835 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[7.322( v 23712'8893405 (23697'8890317,23712'8893405] local-lis/les=23708/23710 n=3025 ec=2543/2543 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16920) [12,0,27] r=1 lpr=23708 luod=0'0 crt=23712'8893405 lcod 23712'8893404 active] _scan_snaps no head for 7:44d30056:::rbd_data.bd1c92ae8944a.00000000000027e9:d4 (have MIN)
Sep 20 03:14:52 node1.ceph ceph-osd[12827]: 2017-09-20 03:14:52.758875 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[7.322( v 23712'8893405 (23697'8890317,23712'8893405] local-lis/les=23708/23710 n=3025 ec=2543/2543 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16920) [12,0,27] r=1 lpr=23708 luod=0'0 crt=23712'8893405 lcod 23712'8893404 active] _scan_snaps no head for 7:44d30056:::rbd_data.bd1c92ae8944a.00000000000027e9:d2 (have MIN)
Sep 20 03:14:53 node1.ceph ceph-osd[12827]: 2017-09-20 03:14:53.126656 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[7.322( v 23712'8893405 (23697'8890317,23712'8893405] local-lis/les=23708/23710 n=3025 ec=2543/2543 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16920) [12,0,27] r=1 lpr=23708 luod=0'0 crt=23712'8893405 lcod 23712'8893404 active] _scan_snaps no head for 7:44d64919:::rbd_data.a6351b238e1f29.000000000002ecf5:9e (have MIN)
Sep 20 03:14:53 node1.ceph ceph-osd[12827]: 2017-09-20 03:14:53.246801 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[7.322( v 23712'8893405 (23697'8890317,23712'8893405] local-lis/les=23708/23710 n=3025 ec=2543/2543 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16920) [12,0,27] r=1 lpr=23708 luod=0'0 crt=23712'8893405 lcod 23712'8893404 active] _scan_snaps no head for 7:44d6d538:::rbd_data.bd1c92ae8944a.0000000000000c66:d4 (have MIN)
Sep 20 03:14:53 node1.ceph ceph-osd[12827]: 2017-09-20 03:14:53.246836 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[7.322( v 23712'8893405 (23697'8890317,23712'8893405] local-lis/les=23708/23710 n=3025 ec=2543/2543 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16920) [12,0,27] r=1 lpr=23708 luod=0'0 crt=23712'8893405 lcod 23712'8893404 active] _scan_snaps no head for 7:44d6d538:::rbd_data.bd1c92ae8944a.0000000000000c66:d2 (have MIN)
Sep 20 03:14:54 node1.ceph ceph-osd[12827]: 2017-09-20 03:14:54.765036 7fa50cbe7700 -1 osd.0 pg_epoch: 23712 pg[7.322( v 23712'8893405 (23697'8890317,23712'8893405] local-lis/les=23708/23710 n=3025 ec=2543/2543 lis/c 23708/23708 les/c/f 23710/23712/0 23708/23708/16920) [12,0,27] r=1 lpr=23708 luod=0'0 crt=23712'8893405 lcod 23712'8893404 active] _scan_snaps no head for 7:44e5b763:::rbd_data.b0d052238e1f29.000000000000030a:b1 (have MIN)

#8 Updated by Patrick Fruh about 1 year ago

Looking deeper into it, it's only my osd.0 that crashed and only some osds have this "pg_epoch" spam, the other logs are perfectly clean except for the startup log.

#9 Updated by Patrick Fruh about 1 year ago

So, to sum everything up:

  1. I upgraded my 6 hosts with 41 OSDs total from Ceph 10.2.9 to 12.2.0 and from CentOS 7.3 to 7.4
  2. I set the first 3 OSDs on my first host out, and let them empty (those 3 OSDs already started showing issues there, but I didn't notice at first, all others were fine first log I posted)
  3. I wiped the 3 OSDs and their shared journaling SSD and set them up again with bluestore and the SSD as the DB device
  4. As the recovery started and while other OSDs (apart from the first 3) were recovering, the 3 bluestore OSDs started acting crazy and kept segfaulting regularily (second log I posted, possibly due to the high load caused by the recoveries)
  5. I wiped those 3 OSDs again and reverted them to filestore, the SSD again as their shared journaling device.
  6. The OSDs are now working without any issues (unlike before, after the upgrade, so the "fresh" setup with filestore seems to have fixed part of it), except for osd.0, which crashed once after moving it back to filestore (third log I posted)

(Sorry for all the posts, can't seem to edit old ones.)

#10 Updated by Patrick Donnelly 5 months ago

  • Affected Versions v14.0.0 added
  • ceph-qa-suite fs added

Instance in fs testing: /ceph/teuthology-archive/pdonnell-2018-07-20_06:53:09-fs-wip-pdonnell-testing-20180720.044448-testing-basic-smithi/2800613/teuthology.log

#11 Updated by Patrick Donnelly 4 months ago

  • Target version set to v14.0.0
  • Backport set to mimic,luminous

Another: /ceph/teuthology-archive/pdonnell-2018-07-31_23:55:57-fs-wip-pdonnell-testing-20180731.215430-testing-basic-smithi/2845133/teuthology.log

#12 Updated by Patrick Donnelly 4 months ago

/ceph/teuthology-archive/pdonnell-2018-08-02_13:06:29-multimds-wip-pdonnell-testing-20180802.044402-testing-basic-smithi/2852575

with logs/cores

#13 Updated by Sage Weil 4 months ago

the fix is clearly to not trim past can_rollback_to. the other question, though, is why can_rollback_to is so far behind!

for this run, osd.0 is the culprit:
/a/nojha-2018-08-08_23:48:33-multimds-wip-test-crt-distro-basic-smithi/2883705/remote

writes are coming in at 22'6698...

2018-08-09 09:57:38.338 7f702a907700  1 -- 172.21.15.15:6802/32650 <== osd.4 172.21.15.140:6801/32831 18743 ==== MOSDECSubOpWrite(3.3s2 22/20 ECSubWrite(tid=12345, reqid=client.4408.0:48075, at_version=22'6698, trim_to=22'3267, roll_forward_to=22'3239)) v2 ==== 6456+0+0 (387819236 0 0) 0x55e812ba5000 con 0x55e7a69ad500

replies are going back at 22'3329
2018-08-09 09:57:38.509 7f701a890700  1 -- 172.21.15.15:6802/32650 --> 172.21.15.140:6801/32831 -- MOSDECSubOpWriteReply(3.3s0 22/20 ECSubWriteReply(tid=5615, last_complete=22'3329, committed=1, applied=1)) v2 -- 0x55e7e3b1e6c0 con 0

#14 Updated by Sage Weil 4 months ago

...and the reason is just that the osd.0 is way behind, and the primary is not throttling it's work accordingly. And apparently the client workload isn't throttling it's submitted work either. This is a long-standing problem we've never come up with a good solution for.

it is unsurprisingly the ffsb workload, which is really good at submitting lots of ios in parallel without any throttling.

#15 Updated by Neha Ojha 4 months ago

  • Status changed from Verified to Testing
  • Assignee set to Neha Ojha

#17 Updated by Sage Weil 4 months ago

  • Status changed from Testing to Pending Backport

#18 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #35072: luminous: osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to) after upgrade to luminous added

#19 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #35073: mimic: osd/PGLog.cc: 60: FAILED assert(s <= can_rollback_to) after upgrade to luminous added

#20 Updated by Nathan Cutler 3 months ago

Luminous backport is non-trivial; see http://tracker.ceph.com/issues/35072#note-2

#21 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF