Project

General

Profile

Actions

Bug #43766

open

OSD crash after change of osd_memory_target

Added by Martin Mlynář over 4 years ago. Updated over 2 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm having troubles changing osd_memory_target on my test cluster. I've upgraded whole cluster from luminous to nautiuls, all OSDs are running bluestore. Because this testlab is short in RAM, I wanted to lower osd_memory_target to save some memory.

# ceph version
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)

# ceph config set osd osd_memory_target 2147483648
# ceph config dump
WHO   MASK LEVEL    OPTION                    VALUE        RO
  mon      advanced auth_client_required      cephx        *
  mon      advanced auth_cluster_required     cephx        *
  mon      advanced auth_service_required     cephx        *
  mon      advanced mon_allow_pool_delete     true
  mon      advanced mon_max_pg_per_osd        500
  mgr      advanced mgr/balancer/active       true
  mgr      advanced mgr/balancer/mode         crush-compat
  osd      advanced osd_crush_update_on_start true
  osd      advanced osd_max_backfills         4
  osd      basic    osd_memory_target         2147483648

Now any OSD is unable to start/restart:

# /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph

LOG /var/log/ceph/ceph-osd.0.log:

min_mon_release 14 (nautilus)
0: [v2:10.0.92.69:3300/0,v1:10.0.92.69:6789/0] mon.testlab-ceph-03
1: [v2:10.0.92.72:3300/0,v1:10.0.92.72:6789/0] mon.testlab-ceph-04
2: [v2:10.0.92.67:3300/0,v1:10.0.92.67:6789/0] mon.testlab-ceph-01
3: [v2:10.0.92.68:3300/0,v1:10.0.92.68:6789/0] mon.testlab-ceph-02

   -54> 2020-01-21 11:45:19.289 7f6aa5d78700  1 monclient:  mon.2 has (v2) addrs [v2:10.0.92.67:3300/0,v1:10.0.92.67:6789/0] but i'm connected to v1:10.0.92.67:6789/0, reconnecting
   -53> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient: _reopen_session rank -1
   -52> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient(hunting): picked mon.testlab-ceph-01 con 0x563319682880 addr [v2:10.0.92.67:3300/0,v1:10.0.92.67:6789/0]
   -51> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient(hunting): picked mon.testlab-ceph-04 con 0x563319682d00 addr [v2:10.0.92.72:3300/0,v1:10.0.92.72:6789/0]
   -50> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient(hunting): picked mon.testlab-ceph-02 con 0x563319683180 addr [v2:10.0.92.68:3300/0,v1:10.0.92.68:6789/0]
   -49> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient(hunting): start opening mon connection
   -48> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient(hunting): start opening mon connection
   -47> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient(hunting): start opening mon connection
   -46> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient(hunting): _renew_subs
   -45> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient(hunting): get_auth_request con 0x563319682880 auth_method 0
   -44> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient(hunting): get_auth_request method 2 preferred_modes [1,2]
   -43> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient(hunting): _init_auth method 2
   -42> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient(hunting): handle_auth_reply_more payload 9
   -41> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient(hunting): handle_auth_reply_more payload_len 9
   -40> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient(hunting): handle_auth_reply_more responding with 36 bytes
   -39> 2020-01-21 11:45:19.289 7f6aa6579700 10 monclient(hunting): get_auth_request con 0x563319682d00 auth_method 0
   -38> 2020-01-21 11:45:19.289 7f6aa6579700 10 monclient(hunting): get_auth_request method 2 preferred_modes [1,2]
   -37> 2020-01-21 11:45:19.289 7f6aa6579700 10 monclient(hunting): _init_auth method 2
   -36> 2020-01-21 11:45:19.289 7f6aa757b700 10 monclient(hunting): get_auth_request con 0x563319683180 auth_method 0
   -35> 2020-01-21 11:45:19.289 7f6aa757b700 10 monclient(hunting): get_auth_request method 2 preferred_modes [1,2]
   -34> 2020-01-21 11:45:19.289 7f6aa757b700 10 monclient(hunting): _init_auth method 2
   -33> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient(hunting): handle_auth_done global_id 5638238 payload 386
   -32> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient: _finish_hunting 0
   -31> 2020-01-21 11:45:19.289 7f6aa6d7a700  1 monclient: found mon.testlab-ceph-01
   -30> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient: _send_mon_message to mon.testlab-ceph-01 at v2:10.0.92.67:3300/0
   -29> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient: _finish_auth 0
   -28> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2020-01-21 11:44:49.293059)
   -27> 2020-01-21 11:45:19.289 7f6aa6d7a700 10 monclient: _send_mon_message to mon.testlab-ceph-01 at v2:10.0.92.67:3300/0
   -26> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient: handle_monmap mon_map magic: 0 v1
   -25> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient:  got monmap 17 from mon.testlab-ceph-01 (according to old e17)
   -24> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient: dump:
epoch 17
fsid f42082cc-c35a-44fe-b7ef-c2eb2ff1fe43
last_changed 2020-01-20 10:35:23.579081
created 2018-04-25 17:07:31.881451
min_mon_release 14 (nautilus)
0: [v2:10.0.92.69:3300/0,v1:10.0.92.69:6789/0] mon.testlab-ceph-03
1: [v2:10.0.92.72:3300/0,v1:10.0.92.72:6789/0] mon.testlab-ceph-04
2: [v2:10.0.92.67:3300/0,v1:10.0.92.67:6789/0] mon.testlab-ceph-01
3: [v2:10.0.92.68:3300/0,v1:10.0.92.68:6789/0] mon.testlab-ceph-02

   -23> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient: handle_config config(3 keys) v1
   -22> 2020-01-21 11:45:19.289 7f6aa7781c80 10 monclient: get_monmap_and_config success
   -21> 2020-01-21 11:45:19.289 7f6aa7781c80 10 monclient: shutdown
   -20> 2020-01-21 11:45:19.289 7f6aa4575700  4 set_mon_vals no callback set
   -19> 2020-01-21 11:45:19.289 7f6aa5d78700 10 monclient: discarding stray monitor message mon_map magic: 0 v1
   -18> 2020-01-21 11:45:19.289 7f6aa4575700 10 set_mon_vals osd_crush_update_on_start = true
   -17> 2020-01-21 11:45:19.289 7f6aa4575700 10 set_mon_vals osd_max_backfills = 4
   -16> 2020-01-21 11:45:19.289 7f6aa4575700 10 set_mon_vals osd_memory_target = 2147483648
   -15> 2020-01-21 11:45:19.297 7f6aa7781c80  0 set uid:gid to 64045:64045 (ceph:ceph)
   -14> 2020-01-21 11:45:19.297 7f6aa7781c80  0 ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable), process ceph-osd, pid 728019
   -13> 2020-01-21 11:45:20.681 7f6aa7781c80  0 pidfile_write: ignore empty --pid-file
   -12> 2020-01-21 11:45:20.685 7f6aa7781c80  5 asok(0x563319688000) init /var/run/ceph/ceph-osd.0.asok
   -11> 2020-01-21 11:45:20.685 7f6aa7781c80  5 asok(0x563319688000) bind_and_listen /var/run/ceph/ceph-osd.0.asok
   -10> 2020-01-21 11:45:20.685 7f6aa7781c80  5 asok(0x563319688000) register_command 0 hook 0x5633196003f0
    -9> 2020-01-21 11:45:20.685 7f6aa7781c80  5 asok(0x563319688000) register_command version hook 0x5633196003f0
    -8> 2020-01-21 11:45:20.685 7f6aa7781c80  5 asok(0x563319688000) register_command git_version hook 0x5633196003f0
    -7> 2020-01-21 11:45:20.685 7f6aa7781c80  5 asok(0x563319688000) register_command help hook 0x563319602220
    -6> 2020-01-21 11:45:20.685 7f6aa7781c80  5 asok(0x563319688000) register_command get_command_descriptions hook 0x563319602260
    -5> 2020-01-21 11:45:20.685 7f6aa4d76700  5 asok(0x563319688000) entry start
    -4> 2020-01-21 11:45:20.685 7f6aa7781c80  5 object store type is bluestore
    -3> 2020-01-21 11:45:20.689 7f6aa7781c80  1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel
    -2> 2020-01-21 11:45:20.689 7f6aa7781c80  1 bdev(0x56331a2d8000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
    -1> 2020-01-21 11:45:20.689 7f6aa7781c80  1 bdev(0x56331a2d8000 /var/lib/ceph/osd/ceph-0/block) open size 2000381018112 (0x1d1c0000000, 1.8 TiB) block_size 4096 (4 KiB) rotational discard not supported
     0> 2020-01-21 11:45:20.693 7f6aa7781c80 -1 *** Caught signal (Aborted) **
 in thread 7f6aa7781c80 thread_name:ceph-osd

 ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)
 1: (()+0x12730) [0x7f6aa8229730]
 2: (gsignal()+0x10b) [0x7f6aa7d0d7bb]
 3: (abort()+0x121) [0x7f6aa7cf8535]
 4: (()+0x8c983) [0x7f6aa80c0983]
 5: (()+0x928c6) [0x7f6aa80c68c6]
 6: (()+0x92901) [0x7f6aa80c6901]
 7: (()+0x92b34) [0x7f6aa80c6b34]
 8: (()+0x5a3f53) [0x56330f0a0f53]
 9: (Option::size_t const md_config_t::get_val<Option::size_t>(ConfigValues const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x81) [0x56330f0a6c91]
 10: (BlueStore::_set_cache_sizes()+0x15a) [0x56330f521d8a]
 11: (BlueStore::_open_bdev(bool)+0x173) [0x56330f524b23]
 12: (BlueStore::get_devices(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0xef) [0x56330f58b7ef]
 13: (BlueStore::get_numa_node(int*, std::set<int, std::less<int>, std::allocator<int> >*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0x7b) [0x56330f53371b]
 14: (main()+0x2870) [0x56330f06e440]
 15: (__libc_start_main()+0xeb) [0x7f6aa7cfa09b]
 16: (_start()+0x2a) [0x56330f0a0c6a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.0.log
--- end dump of recent events ---

When I remove this option:

# ceph config rm osd osd_memory_target

OSD starts without any trouble. I've seen same behaviour when I wrote this parameter into /etc/ceph/ceph.conf.

I've been able to compile ceph-osd with debug symbols and perform gdb stepping:

   -24> 2020-01-22 13:12:53.614 7f83ed064700  4 set_mon_vals no callback set
   -23> 2020-01-22 13:12:53.614 7f83ee867700 10 monclient: discarding stray monitor message auth_reply(proto 2 0 (0) Success) v1
   -22> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals osd_crush_update_on_start = true
   -21> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals osd_max_backfills = 64
   -20> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals osd_memory_target = 2147483648
   -19> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals osd_recovery_max_active = 40
   -18> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals osd_recovery_max_single_start = 1000
   -17> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals osd_recovery_sleep_hdd = 0.000000
   -16> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals osd_recovery_sleep_hybrid = 0.000000
   -15> 2020-01-22 13:12:53.627 7f83f0276c40  0 set uid:gid to 64045:64045 (ceph:ceph)
   -14> 2020-01-22 13:12:53.627 7f83f0276c40  0 ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable), process ceph-osd, pid 1111622
   -13> 2020-01-22 13:12:53.649 7f83f0276c40  0 pidfile_write: ignore empty --pid-file
   -12> 2020-01-22 13:12:53.657 7f83f0276c40  5 asok(0x5580518fa000) init /var/run/ceph/ceph-osd.6.asok
   -11> 2020-01-22 13:12:53.657 7f83f0276c40  5 asok(0x5580518fa000) bind_and_listen /var/run/ceph/ceph-osd.6.asok
   -10> 2020-01-22 13:12:53.657 7f83f0276c40  5 asok(0x5580518fa000) register_command 0 hook 0x558051872fc0
    -9> 2020-01-22 13:12:53.657 7f83f0276c40  5 asok(0x5580518fa000) register_command version hook 0x558051872fc0
    -8> 2020-01-22 13:12:53.657 7f83f0276c40  5 asok(0x5580518fa000) register_command git_version hook 0x558051872fc0
    -7> 2020-01-22 13:12:53.657 7f83f0276c40  5 asok(0x5580518fa000) register_command help hook 0x558051874220
    -6> 2020-01-22 13:12:53.657 7f83f0276c40  5 asok(0x5580518fa000) register_command get_command_descriptions hook 0x558051874260
    -5> 2020-01-22 13:12:53.657 7f83ed865700  5 asok(0x5580518fa000) entry start
    -4> 2020-01-22 13:12:53.670 7f83f0276c40  5 object store type is bluestore
    -3> 2020-01-22 13:12:53.675 7f83f0276c40  1 bdev create path /var/lib/ceph/osd/ceph-6/block type kernel
    -2> 2020-01-22 13:12:53.675 7f83f0276c40  1 bdev(0x5580518f3f80 /var/lib/ceph/osd/ceph-6/block) open path /var/lib/ceph/osd/ceph-6/block
    -1> 2020-01-22 13:12:53.675 7f83f0276c40  1 bdev(0x5580518f3f80 /var/lib/ceph/osd/ceph-6/block) open size 3000588304384 (0x2baa1000000, 2.7 TiB) block_size 4096 (4 KiB) rotational discard not supported
     0> 2020-01-22 13:12:53.714 7f83f0276c40 -1 *** Caught signal (Aborted) **
 in thread 7f83f0276c40 thread_name:ceph-osd

 ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)
 1: (()+0x2c19654) [0x558045ec6654]
 2: (()+0x12730) [0x7f83f0d1f730]
 3: (gsignal()+0x10b) [0x7f83f08027bb]
 4: (abort()+0x121) [0x7f83f07ed535]
 5: (()+0x8c983) [0x7f83f0bb5983]
 6: (()+0x928c6) [0x7f83f0bbb8c6]
 7: (()+0x92901) [0x7f83f0bbb901]
 8: (()+0x92b34) [0x7f83f0bbbb34]
 9: (void boost::throw_exception<boost::bad_get>(boost::bad_get const&)+0x7b) [0x5580454d5430]
 10: (Option::size_t&& boost::relaxed_get<Option::size_t, boost::blank, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, uuid_d>(boost::variant<boost::blank, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, uuid_d>&&)+0x5b) [0x5580454d6223]
 11: (Option::size_t&& boost::strict_get<Option::size_t, boost::blank, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, uuid_d>(boost::variant<boost::blank, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, uuid_d>&&)+0x20) [0x5580454d4a39]
 12: (Option::size_t&& boost::get<Option::size_t, boost::blank, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, uuid_d>(boost::variant<boost::blank, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, uuid_d>&&)+0x20) [0x5580454d1ed7]
 13: (Option::size_t const md_config_t::get_val<Option::size_t>(ConfigValues const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x48) [0x5580454ce882]
 14: (Option::size_t const ConfigProxy::get_val<Option::size_t>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x58) [0x5580454cb9b8]
 15: (BlueStore::_set_cache_sizes()+0x159) [0x558045ce2213]
 16: (BlueStore::_open_bdev(bool)+0x301) [0x558045ce6be3]
 17: (BlueStore::get_devices(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0xf9) [0x558045d0f16d]
 18: (BlueStore::get_numa_node(int*, std::set<int, std::less<int>, std::allocator<int> >*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0x79) [0x558045d0eb55]
 19: (main()+0x3aae) [0x5580454c2460]
 20: (__libc_start_main()+0xeb) [0x7f83f07ef09b]
 21: (_start()+0x2a) [0x5580454bda2a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

In int BlueStore::_set_cache_sizes():

(gdb) n
4116      cache_autotune_interval =
(gdb) n
4117          cct->_conf.get_val<double>("bluestore_cache_autotune_interval");
(gdb) p cache_autotune_interval
$3 = 5
(gdb) n
4118      osd_memory_target = cct->_conf.get_val<Option::size_t>("osd_memory_target");
(gdb) s
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<std::allocator<char> > (this=0x7fffffffc140, __s=0x555558d26c2f "osd_memory_target", __a=...)
    at /usr/include/c++/8/bits/basic_string.h:515
515          : _M_dataplus(_M_local_data(), __a)
(gdb) n
516          { _M_construct(__s, __s ? __s + traits_type::length(__s) : __s+npos); }
(gdb)
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_get> >'
  what():  boost::bad_get: failed value get using boost::get


Files

ceph_mkfs_crash.log (142 KB) ceph_mkfs_crash.log Christian Rohmann, 07/29/2021 08:54 AM

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #43868: ceph-osd --osd-objectstore bluestore --mkfs crashesDuplicate

Actions
Actions #1

Updated by Martin Mlynář over 4 years ago

I've been able to reproduce this easily on clean installed virtual server with 1 mon, 1 osd, 1 mgr:

root@ceph:/# cat /etc/debian_version 
10.2
root@ceph:/# apt-cache policy ceph
ceph:
  Installed: 14.2.6-4~bpo10+1
  Candidate: 14.2.6-4~bpo10+1
  Version table:
 *** 14.2.6-4~bpo10+1 100
        100 http://deb.debian.org/debian buster-backports/main amd64 Packages
        100 /var/lib/dpkg/status
     12.2.11+dfsg1-2.1 500
        500 http://deb.debian.org/debian buster/main amd64 Packages

Same behavior with any settings of osd_memory_target, even with original 4G value. OSD crashes at start everytime when osd_memory_target is present in config.

Actions #2

Updated by Martin Mlynář over 4 years ago

Problem seems to be in debian packages. Tried the same on ubuntu:

# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS" 

with:

# apt-cache policy ceph
ceph:
  Installed: 14.2.6-1bionic
  Candidate: 14.2.6-1bionic
  Version table:
 *** 14.2.6-1bionic 500
        500 https://download.ceph.com/debian-nautilus bionic/main amd64 Packages
        100 /var/lib/dpkg/status
     12.2.12-0ubuntu0.18.04.4 500
        500 http://cz.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
     12.2.12-0ubuntu0.18.04.2 500
        500 http://cz.archive.ubuntu.com/ubuntu bionic-security/main amd64 Packages
     12.2.4-0ubuntu1 500
        500 http://cz.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

and option works just fine (osd starts, don't know if it works)

Actions #3

Updated by Bernd Zeimetz about 4 years ago

Issue here was actually a patch to make ceph build with clang. I was confused by the fact that size_t is actually a struct in src/common/options.cc. This is fixed in debian unstable and will go into backports very soon.

The underlying problem, that lots of TYPE_SIZE options are either broken or can't be used with all supported values is a different one, I'll open a new bug report for that.

This one can be closed.

Actions #4

Updated by Igor Fedotov about 4 years ago

  • Related to Bug #43868: ceph-osd --osd-objectstore bluestore --mkfs crashes added
Actions #5

Updated by Igor Fedotov about 4 years ago

  • Related to deleted (Bug #43868: ceph-osd --osd-objectstore bluestore --mkfs crashes)
Actions #6

Updated by Igor Fedotov about 4 years ago

  • Has duplicate Bug #43868: ceph-osd --osd-objectstore bluestore --mkfs crashes added
Actions #7

Updated by Igor Fedotov about 4 years ago

Bernd Zeimetz wrote:

Issue here was actually a patch to make ceph build with clang. I was confused by the fact that size_t is actually a struct in src/common/options.cc. This is fixed in debian unstable and will go into backports very soon.

The underlying problem, that lots of TYPE_SIZE options are either broken or can't be used with all supported values is a different one, I'll open a new bug report for that.

This one can be closed.

Berndt, for the sake of completeness, could you please add the reference to the new ticket, please.

Actions #8

Updated by Nathan Cutler almost 4 years ago

  • Status changed from New to Need More Info
Actions #9

Updated by Christian Rohmann over 2 years ago

I was actually to reply to https://tracker.ceph.com/issues/43868 or even https://tracker.ceph.com/issues/24006 as they both are somewhat about
"_read_fsid unparsable uuid" and crashes on mkfs. But https://tracker.ceph.com/issues/43868 was marked a duplicate of this one ... so here we go:

I just ran into a crash when running a

ceph-osd -i 0 --mkfs --osd-objectstore=bluestore --osd-uuid $UID --monmap /tmp/monmap

on Ubuntu Bionic and Ceph 15.2.13-1bionic and doing an mkfs.
I hopefully attached all useful information in file ceph_mkfs_crash.log.

Running the exact same command again worked just fine.

There also were posts to the ML which state a similar issue:

Actions #10

Updated by Igor Fedotov over 2 years ago

Christian Rohmann wrote:

I was actually to reply to https://tracker.ceph.com/issues/43868 or even https://tracker.ceph.com/issues/24006 as they both are somewhat about
"_read_fsid unparsable uuid" and crashes on mkfs. But https://tracker.ceph.com/issues/43868 was marked a duplicate of this one ... so here we go:

I just ran into a crash when running a

[...]

on Ubuntu Bionic and Ceph 15.2.13-1bionic and doing an mkfs.
I hopefully attached all useful information in file ceph_mkfs_crash.log.

Running the exact same command again worked just fine.

There also were posts to the ML which state a similar issue:

Hi Christian.
wondering if this was run from the container or on a bare metal?

Thanks

Actions #11

Updated by Christian Rohmann over 2 years ago

Igor Fedotov wrote:

wondering if this was run from the container or on a bare metal?

This was bare metal, no containers involved. And it was only one device and on one of the 4 servers I setup.
I ran the same command again and things were just fine then.

Actions

Also available in: Atom PDF