Project

General

Profile

Actions

Bug #38372

closed

segfault in "AuthMonitor::increase_max_global_id()"

Added by Jason Dillaman about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-02-18T20:26:29.795 INFO:tasks.ceph.mon.a.smithi029.stderr:*** Caught signal (Segmentation fault) **
2019-02-18T20:26:29.796 INFO:tasks.ceph.mon.a.smithi029.stderr: in thread 7fc7f92ef700 thread_name:msgr-worker-2
2019-02-18T20:26:29.816 INFO:tasks.ceph.mon.a.smithi029.stderr: ceph version 14.0.1-3843-g1438970 (1438970519ae8035fabaeb26444462672c92c7cc) nautilus (dev)
2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 1: (()+0x12890) [0x7fc806ea8890]
2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 2: (ceph::buffer::list::list(ceph::buffer::list&&)+0x5c) [0x7fc808380c3c]
2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 3: (void std::vector<AuthMonitor::Incremental, std::allocator<AuthMonitor::Incremental> >::_M_realloc_insert<AuthMonitor::Incremental const&>(__gnu_cxx::__normal_iterator<AuthMonitor::Incremental*, std::vector<AuthMonitor::Incremental, std::allocator<AuthMonitor::Incremental> > >, AuthMonitor::Incremental const&)+0x124) [0x560c8e2665b4]
2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 4: (AuthMonitor::increase_max_global_id()+0x152) [0x560c8e255f32]
2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 5: (AuthMonitor::assign_global_id(bool)+0x104) [0x560c8e2586d4]
2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 6: (Monitor::handle_auth_request(Connection*, AuthConnectionMeta*, bool, unsigned int, ceph::buffer::list const&, ceph::buffer::list*)+0xb59) [0x560c8e1f4169]
2019-02-18T20:26:29.817 INFO:tasks.ceph.mon.a.smithi029.stderr: 7: (ProtocolV2::_handle_auth_request(ceph::buffer::list&, bool)+0xd8) [0x7fc808344e48]
2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 8: (ProtocolV2::handle_auth_request(char*, unsigned int)+0x6b8) [0x7fc808345e18]
2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 9: (ProtocolV2::handle_frame_payload(char*, int)+0x5f1) [0x7fc80834eda1]
2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 10: (ProtocolV2::run_continuation(Ct<ProtocolV2>*)+0x3c) [0x7fc8083391ac]
2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 11: (AsyncConnection::process()+0x186) [0x7fc808302886]
2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 12: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa2d) [0x7fc80835554d]
2019-02-18T20:26:29.818 INFO:tasks.ceph.mon.a.smithi029.stderr: 13: (()+0x58f058) [0x7fc80835a058]
2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: 14: (()+0xbe733) [0x7fc8069c2733]
2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: 15: (()+0x76db) [0x7fc806e9d6db]
2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: 16: (clone()+0x3f) [0x7fc80607e88f]
2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr:2019-02-18 20:26:29.796 7fc7f92ef700 -1 *** Caught signal (Segmentation fault) **
2019-02-18T20:26:29.819 INFO:tasks.ceph.mon.a.smithi029.stderr: in thread 7fc7f92ef700 thread_name:msgr-worker-2

http://qa-proxy.ceph.com/teuthology/jdillaman-2019-02-18_14:48:01-rbd-wip-jd-testing-distro-basic-smithi/3607760/teuthology.log


Related issues 2 (0 open2 closed)

Has duplicate RADOS - Bug #38333: mon crash in AuthMonitor::Incremental::encode buffer codeDuplicate02/15/2019

Actions
Has duplicate RADOS - Bug #38425: mon: segmentation fault in AuthMonitor::create_pendingDuplicate

Actions
Actions #2

Updated by Greg Farnum about 5 years ago

  • Assignee set to Sage Weil

Sage just rewrote part of this and I see it's under the ProtocolV2 stack, so giving it to him...

Actions #3

Updated by Greg Farnum about 5 years ago

I see the last log lines from the bad thread are

   -85> 2019-02-18 20:26:29.792 7fc7f92ef700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >>  conn(0x560c92b7d600 0x560c92ba9b00 :-1 s=ACCEPTING pgs=0 cs=0 l=0)._handle_peer_banner_payload supported=0 required=0
   -77> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader) e1 handle_auth_request con 0x560c92b7d600 (start) method 2 payload 22
   -73> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader).auth v2 AuthMonitor::assign_global_id mon=0/1 last_allocated=9223 max_global_id=14096
   -72> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader).auth v2 next_global_id should be 9224
   -71> 2019-02-18 20:26:29.792 7fc7f92ef700 10 mon.a@0(leader).auth v2 increasing max_global_id to 14096
Actions #4

Updated by Sage Weil about 5 years ago

  • Status changed from New to In Progress
  • Priority changed from High to Urgent

Monitor.cc auth methods protected by auth_lock, but AuthMontior' assign_global_id() is under the normal mon->lock

Actions #5

Updated by Sage Weil about 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Sage Weil about 5 years ago

actually, the original crash here was slightly different than I thought: the old assign_global_id() was passed false from handle_auth_request(), but it didn't consider that when calling increase_global_id().

that said, the new code in teh PR is better because we remove all of the other unprotected access to monmap and other members without the mon lock.

Actions #7

Updated by Sage Weil about 5 years ago

  • Has duplicate Bug #38333: mon crash in AuthMonitor::Incremental::encode buffer code added
Actions #8

Updated by Sage Weil about 5 years ago

  • Status changed from Fix Under Review to Resolved
Actions #9

Updated by Sage Weil about 5 years ago

  • Has duplicate Bug #38425: mon: segmentation fault in AuthMonitor::create_pending added
Actions

Also available in: Atom PDF