osd: pg log hard limit can cause crash during upgrade
During an upgrade from an earlier version, a primary running the new code will send a trim_to value to a replica that triggers an assert in the old code in certain circumstances. Namely, when the pg log is being trimmed beyond info.last_clean, during backfill.
This can be triggered by the luminous-x:stress-split suite with a small pg log to force backfill. In this run upgrading from 12.2.5 (no hard limit) to mimic (hard limit present) with osd_min_pg_log_entries = 1 and osd_max_pg_log_entries = 2 to force more backfilling, we hit this assert:
/builddir/build/BUILD/ceph-12.2.5/src/osd/PGLog.cc: 170: FAILED assert(trim_to <= info.last_complete) ceph version 12.2.5-42.0.TEST.bz1636267.el7cp (559ef7e0c955a21506efea93cfccafcf153e74b7) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f64c7427db0] 2: (PGLog::trim(eversion_t, pg_info_t&)+0x26f) [0x7f64c6fc4b8f] 3: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, ObjectStore::Transaction&, bool)+0x36d) [0x7f64c6f4f23d] 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t> const&, eversion_t const&, eversion_t const&, bool, ObjectStore::Transaction&)+0x74) [0x7f64c7066344] 5: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x31a) [0x7f64c718eaea] 6: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x7f64c719f6e7] 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f64c709fdb0] 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x59c) [0x7f64c700b2cc] 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7f64c6e8e9c9] 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f64c7111ea7] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7f64c6ebdace] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f64c742d8c9] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f64c742f860] 14: (()+0x7dc5) [0x7f64c41dddc5] 15: (clone()+0x6d) [0x7f64c32d276d]
We can avoid this by adding an osdmap flag to enable the hard limit, dependent on OSDs reporting a pg log hard limit feature bit, similar to how we handled recovery deletes.
A workaround for users is to upgrade and restart all OSDs to a version with the pg hard limit, or only upgrade when all PGs are active+clean.
#6 Updated by Neha Ojha about 1 month ago
Quoting my reply to ceph-devel for reference:
"Nathan, I don't think we want to revert it for 13.2.2.
This is because the pg log hard limit feature currently doesn't seem
to work well in a partial upgrade, recovery/backfill scenario. So,
even if we do revert it in 13.2.3, this still leaves us a chance of
going into a split scenario, where some osds in the field, running
13.2.2(with hard limit code) and others on 13.2.3(without the code),
may encounter http://tracker.ceph.com/issues/36686.
Therefore, users who have succesfully upgraded to 13.2.2, shouldn't be
at any risk.
For users trying to upgrade to a version >= 13.2.2, I am going to make
a note of this issue and add the suggested workaround in Pending
Release Notes for mimic."