Project

General

Profile

Bug #38329

OSD crashes in get_str_map while creating with ceph-volume

Added by Kaleb KEITHLEY 9 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
Start date:
02/15/2019
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

see https://bugzilla.redhat.com/show_bug.cgi?id=1661583

# ceph-volume lvm prepare --data /dev/sdd           
Running command: /bin/ceph-authtool --gen-print-key                                                                                                                           
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 7faf689b-b1dd-4f5b-8d9a-dcb063949dda     
Running command: /usr/sbin/vgcreate --force --yes ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1 /dev/sdd                                                                          
 stdout: Physical volume "/dev/sdd" successfully created.                                                                                                                     
 stdout: Volume group "ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1" successfully created
Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1
 stdout: Logical volume "osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda" created.                                                                                             
Running command: /bin/ceph-authtool --gen-print-key                                                                                                                           
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-4
Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-4
Running command: /bin/chown -h ceph:ceph /dev/ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1/osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda                                        
Running command: /bin/chown -R ceph:ceph /dev/dm-0                                                                                                                            
Running command: /bin/ln -s /dev/ceph-db3893ef-93db-4b3f-a80e-11cca7911ba1/osd-block-7faf689b-b1dd-4f5b-8d9a-dcb063949dda /var/lib/ceph/osd/ceph-4/block                      
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-4/activate.monmap stderr: /bin/ceph:128: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working   
  import rados                                                                                                                                                                
got monmap epoch 8
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-4/keyring --create-keyring --name osd.4 --add-key AQBi/BxcgL4tNRAA1ncksjAiwRFwsCZXvLbgAw==                         
 stdout: creating /var/lib/ceph/osd/ceph-4/keyring                                                                                                                            
 stdout: added entity osd.4 auth auth(key=AQBi/BxcgL4tNRAA1ncksjAiwRFwsCZXvLbgAw== with 0 caps)                                                                               
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4/                                         
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 4 --monmap /var/lib/ceph/osd/ceph-4/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-4/ --osd-uuid 7faf689b-b1dd-4f5b-8d9a-dcb063949dda --setuser ceph --setgroup ceph                                                                                      
 stdout: /usr/include/c++/8/bits/basic_string.h:1048: std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::const_reference std::__cxx11::basic_string<_CharT, _Traits, _Alloc>:
:operator[](std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) const [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>; std::__
cxx11::basic_string<_CharT, _Traits, _Alloc>::const_reference = const char&; std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]: Assertion '__
pos <= size()' failed.                                                                       
 stderr: 2018-12-21 15:46:00.788 7fe6fa91a740 -1 bluestore(/var/lib/ceph/osd/ceph-4/) _read_fsid unparsable uuid                                                              
 stderr: *** Caught signal (Aborted) **
 stderr: in thread 7fe6fa91a740 thread_name:ceph-osd                                                                                                                          
 stderr: ceph version 14.0.1 (5f51cd286b747b1729006a5b98fb08b1b646237a) nautilus (dev)                                                                                        
 stderr: 1: (()+0x13030) [0x7fe6fb05e030]                                                                                                                                     
 stderr: 2: (gsignal()+0x10f) [0x7fe6fab6800f]                                                                                                                                
 stderr: 3: (abort()+0x127) [0x7fe6fab52895]                                                                                                                                  
 stderr: 4: (trim(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1d0) [0x555d1a6f8220]                                             
 stderr: 5: (get_str_map(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<cha
r>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
 std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> > > > >*, char const*)+0x200) [0x555d1a6f85d0]                                                                                      
 stderr: 6: (BlueStore::_open_db(bool, bool)+0x12de) [0x555d1a3f033e]
 stderr: 7: (BlueStore::mkfs()+0x102f) [0x555d1a4473ef]                                                                                                                       
 stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x174) [0x555d19f6aa54]
 stderr: 9: (main()+0x15b9) [0x555d19e73259]                                                                                                                                  
 stderr: 10: (__libc_start_main()+0xf3) [0x7fe6fab53ee3]                                                                                                                      
 stderr: 11: (_start()+0x2e) [0x555d19f4a84e]                                         
 stderr: 2018-12-21 15:46:01.590 7fe6fa91a740 -1 *** Caught signal (Aborted) **                                                                  
 stderr: in thread 7fe6fa91a740 thread_name:ceph-osd                                                                                                                          
 stderr: ceph version 14.0.1 (5f51cd286b747b1729006a5b98fb08b1b646237a) nautilus (dev)                
stderr: 1: (()+0x13030) [0x7fe6fb05e030]                          
 stderr: 2: (gsignal()+0x10f) [0x7fe6fab6800f]                
 stderr: 3: (abort()+0x127) [0x7fe6fab52895]                                                                                                                                  
 stderr: 4: (trim(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1d0) [0x555d1a6f8220]                                             
 stderr: 5: (get_str_map(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<cha
r>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> > > > >*, char const*)+0x200) [0x555d1a6f85d0]                                                                                      
 stderr: 6: (BlueStore::_open_db(bool, bool)+0x12de) [0x555d1a3f033e]
 stderr: 7: (BlueStore::mkfs()+0x102f) [0x555d1a4473ef]                                                                                                                       
 stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x174) [0x555d19f6aa54]
 stderr: 9: (main()+0x15b9) [0x555d19e73259]                                                                                                                                  
 stderr: 10: (__libc_start_main()+0xf3) [0x7fe6fab53ee3]                 
 stderr: 11: (_start()+0x2e) [0x555d19f4a84e]                                                              
 stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.                                                                           
 stderr: -15> 2018-12-21 15:46:00.788 7fe6fa91a740 -1 bluestore(/var/lib/ceph/osd/ceph-4/) _read_fsid unparsable uuid                                                         
 stderr: 0> 2018-12-21 15:46:01.590 7fe6fa91a740 -1 *** Caught signal (Aborted) **                                                                                             
 stderr: in thread 7fe6fa91a740 thread_name:ceph-osd                                                                                                                           
 stderr: ceph version 14.0.1 (5f51cd286b747b1729006a5b98fb08b1b646237a) nautilus (dev)                                                                                         
 stderr: 1: (()+0x13030) [0x7fe6fb05e030]                                                    
 stderr: 2: (gsignal()+0x10f) [0x7fe6fab6800f]                                                                                                                                
 stderr: 3: (abort()+0x127) [0x7fe6fab52895]
 stderr: 4: (trim(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1d0) [0x555d1a6f8220]                                             
 stderr: 5: (get_str_map(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<cha
r>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
 std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> > > > >*, char const*)+0x200) [0x555d1a6f85d0]                                                                                      
 stderr: 6: (BlueStore::_open_db(bool, bool)+0x12de) [0x555d1a3f033e]                                                                                                         
 stderr: 7: (BlueStore::mkfs()+0x102f) [0x555d1a4473ef]                                                                                                                        
 stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x174) [0x555d19f6aa54]
 stderr: 9: (main()+0x15b9) [0x555d19e73259]                                                                                                                                   
 stderr: 10: (__libc_start_main()+0xf3) [0x7fe6fab53ee3]                                                                                                                      
 stderr: 11: (_start()+0x2e) [0x555d19f4a84e]                        
 stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.                                                                          
--> Was unable to complete a new OSD, will rollback changes                                                                                                                    
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.4 --yes-i-really-mean-it           
 stderr: /bin/ceph:128: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working   
  import rados                                                                        
purged osd.4                                                                                                                                     
-->  RuntimeError: Command failed with exit code 250: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 4 --monmap /var/lib/ceph/osd/ceph-4/activate.monmap --
keyfile - --osd-data /var/lib/ceph/osd/ceph-4/ --osd-uuid 7faf689b-b1dd-4f5b-8d9a-dcb063949dda --setuser ceph --setgroup ceph             

Version-Release number of selected component (if applicable):
ceph-osd-14.0.1-2.fc30.x86_64


Related issues

Related to Ceph - Bug #38144: nautilus: 14.0.1 build fails in fedora rawhide mass rebuild w/ gcc/g++ 9 New 02/01/2019
Copied to bluestore - Backport #38586: luminous: OSD crashes in get_str_map while creating with ceph-volume Resolved
Copied to bluestore - Backport #38587: mimic: OSD crashes in get_str_map while creating with ceph-volume Resolved

History

#1 Updated by Nathan Cutler 9 months ago

  • Project changed from Ceph to ceph-volume
  • Subject changed from ceph-disk crashes while making OSD to ceph-volume crashes while making OSD
  • Category deleted (OSD)

#2 Updated by Alfredo Deza 9 months ago

  • Project changed from ceph-volume to bluestore
  • Subject changed from ceph-volume crashes while making OSD to OSD crashes while creating with ceph-volume
  • Description updated (diff)

Changing back to the Ceph tracker, this is not a crash in ceph-volume or specific to ceph-volume that I can see

#3 Updated by Sage Weil 9 months ago

- have any options been customized?
- what version is this? 14.0.1-2.fc30 is a random dev checkpoint commit from master from october. if this is what's in the downstream fedora repo, we should get it removed ASAP!

#4 Updated by Tomasz Torcz 9 months ago

(original reporter here)
I have following customisation in ceph.conf:

osd scrub load threshold = 1.5

# peer with who? 0-osd, 1-host
osd crush chooseleaf = 0

As for the version, looking into Fedora build system, this snapshot (+gcc9 fixes) is what's going to be in released version of Fedora 30 in 2 months. Kaleb (the reporter) is one of the ceph maintainers in Fedora.

#5 Updated by Nathan Cutler 9 months ago

  • Related to Bug #38144: nautilus: 14.0.1 build fails in fedora rawhide mass rebuild w/ gcc/g++ 9 added

#6 Updated by Nathan Cutler 9 months ago

Added related-to link to #38144 where the GCC 9 FTBFS is being discussed. A patch has been proposed there, but it includes changes to a submodule (SPDK/DPDK) so is not straightforward to implement as a PR.

Getting the GCC 9 issue fixed would be good from an openSUSE Tumbleweed perspective as well.

#7 Updated by Sage Weil 9 months ago

  • Subject changed from OSD crashes while creating with ceph-volume to OSD crashes in get_str_map while creating with ceph-volume
  • Status changed from New to Need Review
  • Priority changed from Normal to Urgent
  • Backport set to luminous,mimic

reproduce this and got a core.

I think the problem is an empty string passed to trim() in str_map.cc. Fix here: https://github.com/ceph/ceph/pull/26698

#8 Updated by Sage Weil 9 months ago

  • Status changed from Need Review to Pending Backport

#9 Updated by Nathan Cutler 8 months ago

  • Copied to Backport #38586: luminous: OSD crashes in get_str_map while creating with ceph-volume added

#10 Updated by Nathan Cutler 8 months ago

  • Copied to Backport #38587: mimic: OSD crashes in get_str_map while creating with ceph-volume added

#11 Updated by Kaleb KEITHLEY 8 months ago

FYI and FWIW, Boris Ranto put 14.0.1 into F30/rawhide. It's sort of Standard Operating Procedure (SOP) to put early releases into rawhide; if Boris hadn't done it, I probably would have eventually.

Once something is in, it's nearly impossible to remove except by updating to a newer version.

And ceph-14 in f30 has already been updated to 14.1.0, and will be updated again to 14.1.1 or 14.2.0 once one of those becomes available.

#12 Updated by Nathan Cutler 7 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF