Project

General

Profile

Bug #39625

ceph daemon mon.a config set mon_health_to_clog false cause leader mon assert

Added by huang jun 4 months ago. Updated about 1 month ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
Start date:
05/08/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
nautilus, mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

As the subject discribed, use 'ceph daemon mon.a config set mon_health_to_clog false'
will cause leader mon (mon.a) failed assert:

2019-05-08 13:53:39.360 7ff36eda3700 10 mon.a@0(leader) e1 handle_conf_change mon_health_to_clog
2019-05-08 13:53:39.389 7ff36eda3700 -1 /usr/src/ceph/src/common/Timer.cc: In function 'bool SafeTimer::cancel_event(Context*)' thread 7ff36eda3700 time 2019-05-08 13:53:39.360803
/usr/src/ceph/src/common/Timer.cc: 153: FAILED ceph_assert(lock.is_locked())

 ceph version 14.2.0-84-g63ccc22 (63ccc22b2f38b56daaae28eb2906914840a714d9) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1aa) [0x7ff3787992a6]
 2: (()+0x1295528) [0x7ff378799528]
 3: (SafeTimer::cancel_event(Context*)+0x4c) [0x7ff37874a836]
 4: (Monitor::health_tick_stop()+0x10c) [0x7ff382919ca2]
 5: (Monitor::health_events_cleanup()+0x18) [0x7ff38291a3d8]
 6: (Monitor::health_to_clog_update_conf(std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&)+0x165) [0x7ff38291a58d]
 7: (Monitor::handle_conf_change(ConfigProxy const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&)+0x86e) [0x7ff382902780]
 8: (ConfigProxy::call_observers(std::map<ceph::md_config_obs_impl<ConfigProxy>*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >, std::less<ceph::md_config_obs_impl<ConfigProxy>*>, std::allocator<std::pair<ceph::md_config_obs_impl<ConfigProxy>* const, std::set<std::string, std::less<std::string>, std::allocator<std::string> > > > >&)+0xb3) [0x7ff3787bd17f]
 9: (ConfigProxy::apply_changes(std::ostream*)+0xad) [0x7ff3787bebc7]
 10: (CephContext::do_command(std::basic_string_view<char, std::char_traits<char> >, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, std::basic_string_view<char, std::char_traits<char> >, ceph::buffer::v14_2_0::list*)+0x162f) [0x7ff3787b7175]
 11: (CephContextHook::call(std::basic_string_view<char, std::char_traits<char> >, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, std::basic_string_view<char, std::char_traits<char> >, ceph::buffer::v14_2_0::list&)+0x5b) [0x7ff3787c0999]
 12: (AdminSocket::do_accept()+0xfa2) [0x7ff37878d578]
 13: (AdminSocket::entry()+0x2d7) [0x7ff37878bfef]
 14: (_ZSt13__invoke_implIvM11AdminSocketDoFvvEPS0_JEET_St21__invoke_memfun_derefOT0_OT1_DpOT2_()+0x67) [0x7ff37879416b]
 15: (_ZSt8__invokeIM11AdminSocketDoFvvEJPS0_EENSt15__invoke_resultIT_JDpT0_EE4typeEOS5_DpOS6_()+0x3f) [0x7ff378792ef0]
 16: (_ZNSt6thread8_InvokerISt5tupleIJM11AdminSocketDoFvvEPS2_EEE9_M_invokeIJLm0ELm1EEEEDTcl8__invokespcl10_S_declvalIXT_EEEEESt12_Index_tupleIJXspT_EEE()+0x43) [0x7ff378797c21]
 17: (_ZNSt6thread8_InvokerISt5tupleIJM11AdminSocketDoFvvEPS2_EEEclEv()+0x1d) [0x7ff378797bd7]
 18: (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJM11AdminSocketDoFvvEPS3_EEEEE6_M_runEv()+0x1c) [0x7ff378797bb6]
 19: (()+0x1b031ef) [0x7ff3790071ef]
 20: (()+0x7e25) [0x7ff374283e25]
 21: (clone()+0x6d) [0x7ff37314cbad]

The reason maybe that:
It need to hold mon lock when doing handle_conf_change for 'mon_health_to_clog*' and 'mon_scrub_interval' changes,
bc those configure items need to reset the timer in leader mon, which need the mon lock.


Related issues

Copied to Ceph - Backport #40541: luminous: ceph daemon mon.a config set mon_health_to_clog false cause leader mon assert New
Copied to Ceph - Backport #40542: nautilus: ceph daemon mon.a config set mon_health_to_clog false cause leader mon assert Resolved
Copied to Ceph - Backport #41287: mimic: ceph daemon mon.a config set mon_health_to_clog false cause leader mon assert In Progress

History

#1 Updated by Kefu Chai 4 months ago

  • Status changed from New to Need Review
  • Backport set to nautilus, luminous
  • Pull request ID set to 28018

#2 Updated by Kefu Chai 3 months ago

  • Status changed from Need Review to Pending Backport

#3 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #40541: luminous: ceph daemon mon.a config set mon_health_to_clog false cause leader mon assert added

#4 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #40542: nautilus: ceph daemon mon.a config set mon_health_to_clog false cause leader mon assert added

#5 Updated by Neha Ojha about 2 months ago

Kefu, why don't we need to backport this to mimic?

#6 Updated by Kefu Chai about 1 month ago

  • Backport changed from nautilus, luminous to nautilus, mimic, luminous

#7 Updated by Nathan Cutler about 1 month ago

  • Copied to Backport #41287: mimic: ceph daemon mon.a config set mon_health_to_clog false cause leader mon assert added

Also available in: Atom PDF