segfault from PrebufferedStreambuf::overflow
Granted, I may have something done to mons on my test cluster (tried to add new mons to the cluster and visibly failed), but the result is not satisfactory by any means:
2015-11-18 17:05:46.478259 7fffed8ac700 10 mon.hermod@0(leader) e3 ms_verify_authorizer 10.5.10.13:6789/0 mon protocol 2
2015-11-18 17:05:46.478519 7fffed8ac700 0 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8190
[New Thread 0x7fffed7ab700 (LWP 2257)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffed8ac700 (LWP 2256)]
0x00007ffff5e69e99 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#0 0x00007ffff5e69e99 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff5e6ab0b in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff5e6abb0 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff5e6b025 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00005555559ba66f in PrebufferedStreambuf::overflow(int) ()
#5 0x00007ffff5e4ad65 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff5e42316 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x0000555555b9b704 in Pipe::_pipe_prefix(std::ostream&) const ()
#8 0x0000555555baeb16 in Pipe::reader() ()
#9 0x0000555555bb7edd in Pipe::Reader::entry() ()
#10 0x00007ffff70740a4 in start_thread (arg=0x7fffed8ac700) at pthread_create.c:309
#11 0x00007ffff55d104d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
The result is that mon cannot be started. I'm new at ceph so it'll take probably quite a time not to reproduce this again and again.:)
If there isn't enough info feel free to close, I hope I wont' be in a position to be reproduce it later.
#3 Updated by Peter Gervai over 4 years ago
I believe I have found the culprit.
I tried to add 2 new monitors. One was fine - same environment as the sole initial one.
The other, however, was on a container with an older kernel, which seem to have determined the version of ceph which was around 0.80 or so. This old one was kind of accepting the new one (main) mon's connection, and screwed it up big time, and the 9.2.0 mon choked.
I shut down the "old" mon and the other have started just fine (apart from heavy unhealthiness).
#4 Updated by Peter Gervai over 4 years ago
But not that. As a newbie I tried tofigure out how to use ceph-deploy to add a new mon to a running cluster. It seems it doesn't. When I follow manual adding it works, when I try various combinations of c-d it fails. I'll try to ask around how it is supposed to work since the docs are either silent on the topic or it's not possible to find.
#20 Updated by Peter Gervai over 4 years ago
Nathan Cutler wrote:
@Peter - can you upgrade to 9.2.1 and try to reproduce again?
Most possibly I cannot, since it requires rapid random mon adds and removes between various versions and possible occasional screwup of the config; unknown steps in unknown directions. I try to see whether the test VMs are still around someday, but don't wait for me.
#21 Updated by Brad Hubbard over 4 years ago
$ ceph-deploy new boxenX boxenY
$ ceph-deploy mon create-initial
Move to boxenY
$ sudo service ceph stop
$ sudo rm -rf --one-file-system /var/lib/ceph/*
$ sudo rm -rf --one-file-system /etc/ceph/*
$ ceph-deploy new boxenY
$ ceph-deploy mon create-initial
You will get either the crash seen here, or the crash seen in http://tracker.ceph.com/issues/13527 or both (first 13527, then this one after enabling debug logging).
This crash is fixed by commit e9e05333ac7c64758bf14d80f6179e001c0fdbfd from https://github.com/ceph/ceph/pull/6698 so I think we need to backport it to infernalis and hammer.