Actions
Bug #38372
closedsegfault in "AuthMonitor::increase_max_global_id()"
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2019-02-18T20:26:29.795 INFO:tasks.ceph.mon.a.smithi029.stderr:*** Caught signal (Segmentation fault) ** 2019-02-18T20:26:29.796 INFO:tasks.ceph.mon.a.smithi029.stderr: in thread 7fc7f92ef700 thread_name:msgr-worker-2 2019-02-18T20:26:29.816 INFO:tasks.ceph.mon.a.smithi029.stderr: ceph version 14.0.1-3843-g1438970 (1438970519ae8035fabaeb26444462672c92c7cc) nautilus (dev) 2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 1: (()+0x12890) [0x7fc806ea8890] 2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 2: (ceph::buffer::list::list(ceph::buffer::list&&)+0x5c) [0x7fc808380c3c] 2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 3: (void std::vector<AuthMonitor::Incremental, std::allocator<AuthMonitor::Incremental> >::_M_realloc_insert<AuthMonitor::Incremental const&>(__gnu_cxx::__normal_iterator<AuthMonitor::Incremental*, std::vector<AuthMonitor::Incremental, std::allocator<AuthMonitor::Incremental> > >, AuthMonitor::Incremental const&)+0x124) [0x560c8e2665b4] 2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 4: (AuthMonitor::increase_max_global_id()+0x152) [0x560c8e255f32] 2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 5: (AuthMonitor::assign_global_id(bool)+0x104) [0x560c8e2586d4] 2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 6: (Monitor::handle_auth_request(Connection*, AuthConnectionMeta*, bool, unsigned int, ceph::buffer::list const&, ceph::buffer::list*)+0xb59) [0x560c8e1f4169] 2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 7: (ProtocolV2::_handle_auth_request(ceph::buffer::list&, bool)+0xd8) [0x7fc808344e48] 2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 8: (ProtocolV2::handle_auth_request(char*, unsigned int)+0x6b8) [0x7fc808345e18] 2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 9: (ProtocolV2::handle_frame_payload(char*, int)+0x5f1) [0x7fc80834eda1] 2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 10: (ProtocolV2::run_continuation(Ct<ProtocolV2>*)+0x3c) [0x7fc8083391ac] 2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 11: (AsyncConnection::process()+0x186) [0x7fc808302886] 2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 12: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa2d) [0x7fc80835554d] 2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 13: (()+0x58f058) [0x7fc80835a058] 2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: 14: (()+0xbe733) [0x7fc8069c2733] 2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: 15: (()+0x76db) [0x7fc806e9d6db] 2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: 16: (clone()+0x3f) [0x7fc80607e88f] 2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr:2019-02-18 20:26:29.796 7fc7f92ef700 -1 *** Caught signal (Segmentation fault) ** 2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: in thread 7fc7f92ef700 thread_name:msgr-worker-2
Updated by Jason Dillaman about 5 years ago
Updated by Greg Farnum about 5 years ago
- Assignee set to Sage Weil
Sage just rewrote part of this and I see it's under the ProtocolV2 stack, so giving it to him...
Updated by Greg Farnum about 5 years ago
I see the last log lines from the bad thread are
-85> 2019-02-18 20:26:29.792 7fc7f92ef700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> conn(0x560c92b7d600 0x560c92ba9b00 :-1 s=ACCEPTING pgs=0 cs=0 l=0)._handle_peer_banner_payload supported=0 required=0 -77> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader) e1 handle_auth_request con 0x560c92b7d600 (start) method 2 payload 22 -73> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader).auth v2 AuthMonitor::assign_global_id mon=0/1 last_allocated=9223 max_global_id=14096 -72> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader).auth v2 next_global_id should be 9224 -71> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader).auth v2 increasing max_global_id to 14096
Updated by Sage Weil about 5 years ago
- Status changed from New to In Progress
- Priority changed from High to Urgent
Monitor.cc auth methods protected by auth_lock, but AuthMontior' assign_global_id() is under the normal mon->lock
Updated by Sage Weil about 5 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil about 5 years ago
actually, the original crash here was slightly different than I thought: the old assign_global_id() was passed false from handle_auth_request(), but it didn't consider that when calling increase_global_id().
that said, the new code in teh PR is better because we remove all of the other unprotected access to monmap and other members without the mon lock.
Updated by Sage Weil about 5 years ago
- Has duplicate Bug #38333: mon crash in AuthMonitor::Incremental::encode buffer code added
Updated by Sage Weil about 5 years ago
- Status changed from Fix Under Review to Resolved
Updated by Sage Weil about 5 years ago
- Has duplicate Bug #38425: mon: segmentation fault in AuthMonitor::create_pending added
Actions