Bug #38329
OSD crashes in get_str_map while creating with ceph-volume
0%
Description
see https://bugzilla.redhat.com/show_bug.cgi?id=1661583
# ceph-volume lvm prepare --data /dev/sdd Running command: /bin/ceph-authtool --gen-print-key Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 7faf689b-b1dd-4f5b-8d9a-dcb063949dda Running command: /usr/sbin/vgcreate --force --yes ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1 /dev/sdd stdout: Physical volume "/dev/sdd" successfully created. stdout: Volume group "ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1" successfully created Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1 stdout: Logical volume "osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda" created. Running command: /bin/ceph-authtool --gen-print-key Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-4 Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-4 Running command: /bin/chown -h ceph:ceph /dev/ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1/osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda Running command: /bin/chown -R ceph:ceph /dev/dm-0 Running command: /bin/ln -s /dev/ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1/osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda /var/lib/ceph/osd/ceph-4/block Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-4/activate.monmap stderr: /bin/ceph:128: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working import rados got monmap epoch 8 Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-4/keyring --create-keyring --name osd.4 --add-key AQBi/BxcgL4tNRAA1ncksjAiwRFwsCZXvLbgAw== stdout: creating /var/lib/ceph/osd/ceph-4/keyring stdout: added entity osd.4 auth auth(key=AQBi/BxcgL4tNRAA1ncksjAiwRFwsCZXvLbgAw== with 0 caps) Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4/keyring Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4/ Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 4 --monmap /var/lib/ceph/osd/ceph-4/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-4/ --osd-uuid 7faf689b-b1dd-4f5b-8d9a-dcb063949dda --setuser ceph --setgroup ceph stdout: /usr/include/c++/8/bits/basic_string.h:1048: std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::const_reference std::__cxx11::basic_string<_CharT, _Traits, _Alloc>: :operator[](std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) const [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>; std::__ cxx11::basic_string<_CharT, _Traits, _Alloc>::const_reference = const char&; std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]: Assertion '__ pos <= size()' failed. stderr: 2018-12-21 15:46:00.788 7fe6fa91a740 -1 bluestore(/var/lib/ceph/osd/ceph-4/) _read_fsid unparsable uuid stderr: *** Caught signal (Aborted) ** stderr: in thread 7fe6fa91a740 thread_name:ceph-osd stderr: ceph version 14.0.1 (5f51cd286b747b1729006a5b98fb08b1b646237a) nautilus (dev) stderr: 1: (()+0x13030) [0x7fe6fb05e030] stderr: 2: (gsignal()+0x10f) [0x7fe6fab6800f] stderr: 3: (abort()+0x127) [0x7fe6fab52895] stderr: 4: (trim(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1d0) [0x555d1a6f8220] stderr: 5: (get_str_map(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<cha r>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std ::char_traits<char>, std::allocator<char> > > > >*, char const*)+0x200) [0x555d1a6f85d0] stderr: 6: (BlueStore::_open_db(bool, bool)+0x12de) [0x555d1a3f033e] stderr: 7: (BlueStore::mkfs()+0x102f) [0x555d1a4473ef] stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x174) [0x555d19f6aa54] stderr: 9: (main()+0x15b9) [0x555d19e73259] stderr: 10: (__libc_start_main()+0xf3) [0x7fe6fab53ee3] stderr: 11: (_start()+0x2e) [0x555d19f4a84e] stderr: 2018-12-21 15:46:01.590 7fe6fa91a740 -1 *** Caught signal (Aborted) ** stderr: in thread 7fe6fa91a740 thread_name:ceph-osd stderr: ceph version 14.0.1 (5f51cd286b747b1729006a5b98fb08b1b646237a) nautilus (dev) stderr: 1: (()+0x13030) [0x7fe6fb05e030] stderr: 2: (gsignal()+0x10f) [0x7fe6fab6800f] stderr: 3: (abort()+0x127) [0x7fe6fab52895] stderr: 4: (trim(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1d0) [0x555d1a6f8220] stderr: 5: (get_str_map(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<cha r>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std ::char_traits<char>, std::allocator<char> > > > >*, char const*)+0x200) [0x555d1a6f85d0] stderr: 6: (BlueStore::_open_db(bool, bool)+0x12de) [0x555d1a3f033e] stderr: 7: (BlueStore::mkfs()+0x102f) [0x555d1a4473ef] stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x174) [0x555d19f6aa54] stderr: 9: (main()+0x15b9) [0x555d19e73259] stderr: 10: (__libc_start_main()+0xf3) [0x7fe6fab53ee3] stderr: 11: (_start()+0x2e) [0x555d19f4a84e] stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. stderr: -15> 2018-12-21 15:46:00.788 7fe6fa91a740 -1 bluestore(/var/lib/ceph/osd/ceph-4/) _read_fsid unparsable uuid stderr: 0> 2018-12-21 15:46:01.590 7fe6fa91a740 -1 *** Caught signal (Aborted) ** stderr: in thread 7fe6fa91a740 thread_name:ceph-osd stderr: ceph version 14.0.1 (5f51cd286b747b1729006a5b98fb08b1b646237a) nautilus (dev) stderr: 1: (()+0x13030) [0x7fe6fb05e030] stderr: 2: (gsignal()+0x10f) [0x7fe6fab6800f] stderr: 3: (abort()+0x127) [0x7fe6fab52895] stderr: 4: (trim(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1d0) [0x555d1a6f8220] stderr: 5: (get_str_map(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<cha r>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std ::char_traits<char>, std::allocator<char> > > > >*, char const*)+0x200) [0x555d1a6f85d0] stderr: 6: (BlueStore::_open_db(bool, bool)+0x12de) [0x555d1a3f033e] stderr: 7: (BlueStore::mkfs()+0x102f) [0x555d1a4473ef] stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x174) [0x555d19f6aa54] stderr: 9: (main()+0x15b9) [0x555d19e73259] stderr: 10: (__libc_start_main()+0xf3) [0x7fe6fab53ee3] stderr: 11: (_start()+0x2e) [0x555d19f4a84e] stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --> Was unable to complete a new OSD, will rollback changes Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.4 --yes-i-really-mean-it stderr: /bin/ceph:128: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working import rados purged osd.4 --> RuntimeError: Command failed with exit code 250: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 4 --monmap /var/lib/ceph/osd/ceph-4/activate.monmap -- keyfile - --osd-data /var/lib/ceph/osd/ceph-4/ --osd-uuid 7faf689b-b1dd-4f5b-8d9a-dcb063949dda --setuser ceph --setgroup ceph
Version-Release number of selected component (if applicable):
ceph-osd-14.0.1-2.fc30.x86_64
Related issues
History
#1 Updated by Nathan Cutler about 5 years ago
- Project changed from Ceph to ceph-volume
- Subject changed from ceph-disk crashes while making OSD to ceph-volume crashes while making OSD
- Category deleted (
OSD)
#2 Updated by Alfredo Deza about 5 years ago
- Project changed from ceph-volume to bluestore
- Subject changed from ceph-volume crashes while making OSD to OSD crashes while creating with ceph-volume
- Description updated (diff)
Changing back to the Ceph tracker, this is not a crash in ceph-volume or specific to ceph-volume that I can see
#3 Updated by Sage Weil about 5 years ago
- have any options been customized?
- what version is this? 14.0.1-2.fc30 is a random dev checkpoint commit from master from october. if this is what's in the downstream fedora repo, we should get it removed ASAP!
#4 Updated by Tomasz Torcz about 5 years ago
(original reporter here)
I have following customisation in ceph.conf:
osd scrub load threshold = 1.5 # peer with who? 0-osd, 1-host osd crush chooseleaf = 0
As for the version, looking into Fedora build system, this snapshot (+gcc9 fixes) is what's going to be in released version of Fedora 30 in 2 months. Kaleb (the reporter) is one of the ceph maintainers in Fedora.
#5 Updated by Nathan Cutler about 5 years ago
- Related to Bug #38144: nautilus: 14.0.1 build fails in fedora rawhide mass rebuild w/ gcc/g++ 9 added
#6 Updated by Nathan Cutler about 5 years ago
Added related-to link to #38144 where the GCC 9 FTBFS is being discussed. A patch has been proposed there, but it includes changes to a submodule (SPDK/DPDK) so is not straightforward to implement as a PR.
Getting the GCC 9 issue fixed would be good from an openSUSE Tumbleweed perspective as well.
#7 Updated by Sage Weil about 5 years ago
- Subject changed from OSD crashes while creating with ceph-volume to OSD crashes in get_str_map while creating with ceph-volume
- Status changed from New to Fix Under Review
- Priority changed from Normal to Urgent
- Backport set to luminous,mimic
reproduce this and got a core.
I think the problem is an empty string passed to trim() in str_map.cc. Fix here: https://github.com/ceph/ceph/pull/26698
#8 Updated by Sage Weil about 5 years ago
- Status changed from Fix Under Review to Pending Backport
#9 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38586: luminous: OSD crashes in get_str_map while creating with ceph-volume added
#10 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38587: mimic: OSD crashes in get_str_map while creating with ceph-volume added
#11 Updated by Kaleb KEITHLEY about 5 years ago
FYI and FWIW, Boris Ranto put 14.0.1 into F30/rawhide. It's sort of Standard Operating Procedure (SOP) to put early releases into rawhide; if Boris hadn't done it, I probably would have eventually.
Once something is in, it's nearly impossible to remove except by updating to a newer version.
And ceph-14 in f30 has already been updated to 14.1.0, and will be updated again to 14.1.1 or 14.2.0 once one of those becomes available.
#12 Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved