Bug #23120: OSDs continously crash during recovery - bluestore - Ceph

Actions

Copy link

Bug #23120

closed

OSDs continously crash during recovery

Added by Oliver Freyermuth about 6 years ago. Updated over 5 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

Ceph - v12.2.3

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I have several OSDs continuously crashing during recovery. This is Luminous 12.2.3.

 ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable)
 1: (()+0xa3c591) [0x55b3e5a85591]
 2: (()+0xf5e0) [0x7f8c237ca5e0]
 3: (gsignal()+0x37) [0x7f8c227f31f7]
 4: (abort()+0x148) [0x7f8c227f48e8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x55b3e5ac4664]
 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1487) [0x55b3e5997a27]
 7: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x55b3e5998a70]
 8: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x55b3e5708a85]
 9: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x55b3e5828191]
 10: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x55b3e5838b27]
 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55b3e573d680]
 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x59c) [0x55b3e56a900c]
 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x55b3e552ef29]
 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55b3e57abad7]
 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x55b3e555d99e]
 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55b3e5aca009]
 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b3e5acbfa0]
 18: (()+0x7e25) [0x7f8c237c2e25]
 19: (clone()+0x6d) [0x7f8c228b634d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This is using the officially released RPMs.

I've uploaded the logfile of one such OSD as:
ca0a29ae-0993-4faa-be4d-9ba2f7d6f905

The cluster will likely be recreated soon, since the system is now borked anyway, so please let me know quickly if more info is needed.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #23120

OSDs continously crash during recovery

Updated by Oliver Freyermuth about 6 years ago

Updated by Oliver Freyermuth about 6 years ago

Updated by Oliver Freyermuth about 6 years ago

Updated by Oliver Freyermuth about 6 years ago

Updated by Oliver Freyermuth about 6 years ago

Updated by Oliver Freyermuth about 6 years ago

Updated by Peter Woodman about 6 years ago

Updated by Oliver Freyermuth about 6 years ago

Updated by Greg Farnum about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Oliver Freyermuth about 6 years ago

Updated by Peter Woodman about 6 years ago

Updated by Peter Woodman about 6 years ago

Updated by Sage Weil over 5 years ago