Actions
Bug #18629
openosd: unclear error when authentication with monitors fails
Status:
New
Priority:
Normal
Assignee:
-
Category:
Administration/Usability
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Seen on a cluster of Linode VMs (16 osds, 6 fail with this error). Here's the backtrace:
(gdb) bt #0 0x00007fe58e85423b in raise () from /lib64/libpthread.so.0 #1 0x00005609af8ad9a4 in reraise_fatal (signum=6) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/global/signal_handler.cc:74 #2 handle_fatal_signal (signum=6) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/global/signal_handler.cc:138 #3 <signal handler called> #4 0x00007fe58d45c1d7 in raise () from /lib64/libc.so.6 #5 0x00007fe58d45d8c8 in abort () from /lib64/libc.so.6 #6 0x00007fe58fd0bf24 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x7fe590031b8e "m_workers.empty()", file=file@entry=0x7fe590031a38 "/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6893-g5db320c/rpm/el7/BUILD/ceph-11.1.0-6893-g5db320c/s"..., line=line@entry=46, func=func@entry=0x7fe590031e60 <ceph::HeartbeatMap::~HeartbeatMap()::__PRETTY_FUNCTION__> "ceph::HeartbeatMap::~HeartbeatMap()") at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/common/assert.cc:77 #7 0x00007fe58feb25ec in ceph::HeartbeatMap::~HeartbeatMap (this=0x5609b1660240, __in_chrg=<optimized out>) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/common/HeartbeatMap.cc:46 #8 0x00007fe58fe9b27f in CephContext::~CephContext (this=0x5609b1748000, __in_chrg=<optimized out>) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/common/ceph_context.cc:611 #9 0x00007fe58fe9b5fc in CephContext::put (this=0x5609b1748000) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/common/ceph_context.cc:652 #10 0x00005609af8a7c95 in intrusive_ptr_release (cct=<optimized out>) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/global/global_init.cc:341 #11 0x00005609af33092a in ~intrusive_ptr (this=0x7ffd9a0533e0, __in_chrg=<optimized out>) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:97 #12 main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/ceph-11.1.0-6893-g5db320c/src/ceph_osd.cc:113
Here's the log:
2017-01-20 15:05:18.729353 7fe5996acc40 0 set uid:gid to 167:167 (ceph:ceph) 2017-01-20 15:05:18.730446 7fe5996acc40 0 ceph version 11.1.0-6893-g5db320c (5db320ce93a335015bf5e2b77ff234e44c764b9a), process ceph-osd, pid 12603 2017-01-20 15:05:18.738306 7fe5996acc40 0 pidfile_write: ignore empty --pid-file 2017-01-20 15:05:18.796123 7fe5996acc40 0 load: jerasure load: lrc load: isa 2017-01-20 15:05:18.800945 7fe5996acc40 0 filestore(/var/lib/ceph/osd/ceph-4) backend xfs (magic 0x58465342) 2017-01-20 15:05:18.802324 7fe5996acc40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2017-01-20 15:05:18.802338 7fe5996acc40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-01-20 15:05:18.802339 7fe5996acc40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: splice() is disabled via 'filestore splice' config option 2017-01-20 15:05:18.806902 7fe5996acc40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-01-20 15:05:18.806976 7fe5996acc40 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: extsize is disabled by conf 2017-01-20 15:05:18.810474 7fe5996acc40 0 filestore(/var/lib/ceph/osd/ceph-4) start omap initiation 2017-01-20 15:05:18.826942 7fe5996acc40 1 leveldb: Recovering log #1793 2017-01-20 15:05:18.899648 7fe5996acc40 1 leveldb: Level-0 table #1796: started 2017-01-20 15:05:18.949737 7fe5996acc40 1 leveldb: Level-0 table #1796: 865556 bytes OK 2017-01-20 15:05:18.952992 7fe5996acc40 1 leveldb: Delete type=3 #8 2017-01-20 15:05:18.953413 7fe5996acc40 1 leveldb: Delete type=0 #1793 2017-01-20 15:05:18.954736 7fe5996acc40 0 filestore(/var/lib/ceph/osd/ceph-4) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2017-01-20 15:05:19.019568 7fe5996acc40 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 27: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 1 2017-01-20 15:05:19.021536 7fe5996acc40 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 27: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 1 2017-01-20 15:05:19.030159 7fe5996acc40 1 filestore(/var/lib/ceph/osd/ceph-4) upgrade 2017-01-20 15:05:19.058455 7fe5996acc40 0 _get_class not permitted to load lua 2017-01-20 15:05:19.080692 7fe5996acc40 0 <cls> /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6893-g5db320c/rpm/el7/BUILD/ceph-11.1.0-6893-g5db320c/src/cls/hello/cls_hello.cc:296: loading cls_hello 2017-01-20 15:05:19.084272 7fe5996acc40 0 _get_class not permitted to load kvs 2017-01-20 15:05:19.084620 7fe5996acc40 0 <cls> /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6893-g5db320c/rpm/el7/BUILD/ceph-11.1.0-6893-g5db320c/src/cls/cephfs/cls_cephfs.cc:198: loading cephfs 2017-01-20 15:05:19.087462 7fe5996acc40 0 osd.4 12 crush map has features 2200130813952, adjusting msgr requires for clients 2017-01-20 15:05:19.088100 7fe5996acc40 0 osd.4 12 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons 2017-01-20 15:05:19.088273 7fe5996acc40 0 osd.4 12 crush map has features 2200130813952, adjusting msgr requires for osds 2017-01-20 15:05:22.298089 7fe5996acc40 0 osd.4 12 load_pgs 2017-01-20 15:05:31.265103 7fe5996acc40 0 osd.4 12 load_pgs opened 551 pgs 2017-01-20 15:05:31.265564 7fe5996acc40 0 osd.4 12 using 1 op queue with priority op cut off at 64. 2017-01-20 15:05:31.267875 7fe5996acc40 -1 osd.4 12 log_to_monitors {default=true} 2017-01-20 15:05:31.278199 7fe5996acc40 1 journal close /var/lib/ceph/osd/ceph-4/journal 2017-01-20 15:05:31.348586 7fe5996acc40 -1 ^[[0;31m ** ERROR: osd init failed: (1) Operation not permitted^[[0m 2017-01-20 15:05:31.354780 7fe5996acc40 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6893-g5db320c/rpm/el7/BUILD/ceph-11.1.0-6893-g5db320c/src/common/HeartbeatMap.cc: In function 'ceph::HeartbeatMap::~HeartbeatMap()' thread 7fe5996acc40 time 2017-01-20 15:05:31.350426 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6893-g5db320c/rpm/el7/BUILD/ceph-11.1.0-6893-g5db320c/src/common/HeartbeatMap.cc: 46: FAILED assert(m_workers.empty()) ceph version 11.1.0-6893-g5db320c (5db320ce93a335015bf5e2b77ff234e44c764b9a) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fe58fd0bdb0] 2: (ceph::HeartbeatMap::~HeartbeatMap()+0x13c) [0x7fe58feb25ec] 3: (CephContext::~CephContext()+0x37f) [0x7fe58fe9b27f] 4: (CephContext::put()+0x17c) [0x7fe58fe9b5fc] 5: (main()+0xc4a) [0x5609af33092a] 6: (__libc_start_main()+0xf5) [0x7fe58d448b35] 7: (()+0x2c3a69) [0x5609af3aaa69] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Log attached.
Files
Updated by Josh Durgin over 7 years ago
- Project changed from rgw to Ceph
- Subject changed from osd: m_workers.empty() assertion failure in HeartbeatMap to osd: unclear error when authentication with monitors fails
The full log includes:
-34> 2017-01-20 15:05:19.147939 7f274da9b700 1 -- 192.168.132.144:6800/12683 <== mon.1 192.168.211.143:6789/0 3 ==== auth_reply(proto 2 -1 (1) Operation not permitted) v1 ==== 24+0+0 (1263862845 0 0) 0x555da4b698c0 con 0x555da4797000
This should be visible in a more admin-friendly message about auth failing, at the error log level.
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category set to Administration/Usability
- Component(RADOS) OSD added
Actions