Project

General

Profile

Actions

Bug #13826

closed

segfault from PrebufferedStreambuf::overflow

Added by Peter Gervai over 8 years ago. Updated about 8 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Granted, I may have something done to mons on my test cluster (tried to add new mons to the cluster and visibly failed), but the result is not satisfactory by any means:

...
2015-11-18 17:05:46.478259 7fffed8ac700 10 mon.hermod@0(leader) e3 ms_verify_authorizer 10.5.10.13:6789/0 mon protocol 2
2015-11-18 17:05:46.478519 7fffed8ac700 0 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8190
[New Thread 0x7fffed7ab700 (LWP 2257)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffed8ac700 (LWP 2256)]
0x00007ffff5e69e99 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0 0x00007ffff5e69e99 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff5e6ab0b in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff5e6abb0 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff5e6b025 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00005555559ba66f in PrebufferedStreambuf::overflow(int) ()
#5 0x00007ffff5e4ad65 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff5e42316 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x0000555555b9b704 in Pipe::_pipe_prefix(std::ostream&) const ()
#8 0x0000555555baeb16 in Pipe::reader() ()
#9 0x0000555555bb7edd in Pipe::Reader::entry() ()
#10 0x00007ffff70740a4 in start_thread (arg=0x7fffed8ac700) at pthread_create.c:309
#11 0x00007ffff55d104d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The result is that mon cannot be started. I'm new at ceph so it'll take probably quite a time not to reproduce this again and again.:)
If there isn't enough info feel free to close, I hope I wont' be in a position to be reproduce it later.


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #14958: PK11_DestroyContext() is called twice if PK11_DigestFinal() failsResolved03/03/2016

Actions
Actions #1

Updated by Peter Gervai over 8 years ago

ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)

Actions #2

Updated by Nathan Cutler over 8 years ago

  • Tracker changed from Tasks to Bug
  • Project changed from Stable releases to Ceph
  • Category set to Monitor
Actions #3

Updated by Peter Gervai over 8 years ago

I believe I have found the culprit.
I tried to add 2 new monitors. One was fine - same environment as the sole initial one.
The other, however, was on a container with an older kernel, which seem to have determined the version of ceph which was around 0.80 or so. This old one was kind of accepting the new one (main) mon's connection, and screwed it up big time, and the 9.2.0 mon choked.
I shut down the "old" mon and the other have started just fine (apart from heavy unhealthiness).

Actions #4

Updated by Peter Gervai over 8 years ago

But not that. As a newbie I tried tofigure out how to use ceph-deploy to add a new mon to a running cluster. It seems it doesn't. When I follow manual adding it works, when I try various combinations of c-d it fails. I'll try to ask around how it is supposed to work since the docs are either silent on the topic or it's not possible to find.

Actions #6

Updated by Samuel Just over 8 years ago

  • Priority changed from Normal to Urgent
Actions #7

Updated by Samuel Just about 8 years ago

  • Is duplicate of Bug #14821: OSD segfault in ms_get_authorizer -- hammer ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) added
Actions #8

Updated by Samuel Just about 8 years ago

  • Status changed from New to Duplicate
Actions #9

Updated by Samuel Just about 8 years ago

  • Status changed from Duplicate to 12
Actions #10

Updated by Samuel Just about 8 years ago

  • Is duplicate of deleted (Bug #14821: OSD segfault in ms_get_authorizer -- hammer ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff))
Actions #11

Updated by Samuel Just about 8 years ago

Ok, bug seems to be in Monitor::ms_verify_authorizer. It treats the return value from cephx_verify_authorizer as an int rather than a bool and fails to notice the cases where the authorization fails.

Actions #12

Updated by Samuel Just about 8 years ago

  • Related to Bug #13525: mon: should not set isvalid = true when cephx_verify_authorizer return false added
Actions #13

Updated by Samuel Just about 8 years ago

9085c820491f25a04ec02accc5098c1ab9b57311 I guess fixed it.

Actions #14

Updated by Nathan Cutler about 8 years ago

  • Related to Backport #13589: infernalis: mon: should not set isvalid = true when cephx_verify_authorizer return false added
Actions #15

Updated by Nathan Cutler about 8 years ago

The infernalis backport is http://tracker.ceph.com/issues/13589 - https://github.com/ceph/ceph/pull/6392. It is included in the 9.2.1 release.

Actions #16

Updated by Nathan Cutler about 8 years ago

@Peter - can you upgrade to 9.2.1 and try to reproduce again?

Actions #17

Updated by Kefu Chai about 8 years ago

  • Status changed from 12 to Duplicate
Actions #18

Updated by Kefu Chai about 8 years ago

  • Related to deleted (Bug #13525: mon: should not set isvalid = true when cephx_verify_authorizer return false)
Actions #19

Updated by Kefu Chai about 8 years ago

  • Is duplicate of Bug #13525: mon: should not set isvalid = true when cephx_verify_authorizer return false added
Actions #20

Updated by Peter Gervai about 8 years ago

Nathan Cutler wrote:

@Peter - can you upgrade to 9.2.1 and try to reproduce again?

Most possibly I cannot, since it requires rapid random mon adds and removes between various versions and possible occasional screwup of the config; unknown steps in unknown directions. I try to see whether the test VMs are still around someday, but don't wait for me.

Actions #21

Updated by Brad Hubbard about 8 years ago

Simple reproducer.

$ ceph-deploy new boxenX boxenY
$ ceph-deploy mon create-initial

Move to boxenY

$ sudo service ceph stop
$ sudo rm -rf --one-file-system /var/lib/ceph/*
$ sudo rm -rf --one-file-system /etc/ceph/*
$ ceph-deploy new boxenY
$ ceph-deploy mon create-initial

You will get either the crash seen here, or the crash seen in http://tracker.ceph.com/issues/13527 or both (first 13527, then this one after enabling debug logging).

This crash is fixed by commit e9e05333ac7c64758bf14d80f6179e001c0fdbfd from https://github.com/ceph/ceph/pull/6698 so I think we need to backport it to infernalis and hammer.

Actions #22

Updated by Brad Hubbard about 8 years ago

  • Is duplicate of deleted (Bug #13525: mon: should not set isvalid = true when cephx_verify_authorizer return false)
Actions #23

Updated by Brad Hubbard about 8 years ago

  • Related to deleted (Backport #13589: infernalis: mon: should not set isvalid = true when cephx_verify_authorizer return false)
Actions #24

Updated by Brad Hubbard about 8 years ago

  • Related to Backport #14957: infernalis: segfault from PrebufferedStreambuf::overflow added
Actions #25

Updated by Brad Hubbard about 8 years ago

  • Related to Backport #14956: hammer: segfault from PrebufferedStreambuf::overflow added
Actions #26

Updated by Loïc Dachary about 8 years ago

  • Related to deleted (Backport #14957: infernalis: segfault from PrebufferedStreambuf::overflow)
Actions #27

Updated by Loïc Dachary about 8 years ago

  • Copied to Backport #14957: infernalis: segfault from PrebufferedStreambuf::overflow added
Actions #28

Updated by Loïc Dachary about 8 years ago

  • Related to deleted (Backport #14956: hammer: segfault from PrebufferedStreambuf::overflow)
Actions #29

Updated by Loïc Dachary about 8 years ago

  • Copied to Backport #14956: hammer: segfault from PrebufferedStreambuf::overflow added
Actions #30

Updated by Nathan Cutler about 8 years ago

  • Copied to deleted (Backport #14957: infernalis: segfault from PrebufferedStreambuf::overflow)
Actions #31

Updated by Nathan Cutler about 8 years ago

  • Copied to deleted (Backport #14956: hammer: segfault from PrebufferedStreambuf::overflow)
Actions #32

Updated by Nathan Cutler about 8 years ago

I created a tracker issue for PR#6698 - it is http://tracker.ceph.com/issues/14958 and it has been staged for backport.

Actions #33

Updated by Nathan Cutler about 8 years ago

  • Related to Bug #14958: PK11_DestroyContext() is called twice if PK11_DigestFinal() fails added
Actions

Also available in: Atom PDF