Project

General

Profile

Support #38707

Ceph OSD Down & Out - can't bring back up - Caught signal (Segmentation fault) in thread ceph-osd

Added by Liam Retrams about 5 years ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

I noticed that in my 3-node, 12-osd cluster (3 OSD per Node), one node has all 3 of its OSDs marked "Down" and "Out". I tried to bring them back 'In" and "Up", but, this is what the log shows:

My setup is WAL and block.db is on SSD, but the OSD is SATA HDD. Each server has 2 SSDs, each SSD has 3 partitions - one partition is for WAL, one is for block.db, and of course there is SATA disk for OSD.

Any idea what this could be?

2019-03-11 15:43:43.831453 7f18b8892e00  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-03-11 15:43:43.831468 7f18b8892e00  0 ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable), process ceph-osd, pid 2988913
2019-03-11 15:43:43.836761 7f18b8892e00  0 pidfile_write: ignore empty --pid-file
2019-03-11 15:43:43.844687 7f18b8892e00  0 load: jerasure load: lrc load: isa
2019-03-11 15:43:43.844789 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block type kernel
2019-03-11 15:43:43.844798 7f18b8892e00  1 bdev(0x563466b4cb40 /var/lib/ceph/osd/ceph-8/block) open path /var/lib/ceph/osd/ceph-8/block
2019-03-11 15:43:43.845001 7f18b8892e00  1 bdev(0x563466b4cb40 /var/lib/ceph/osd/ceph-8/block) open size 6001170317312 (0x57541a00000, 5.46TiB) block_size 4096 (4KiB) rotational
2019-03-11 15:43:43.845283 7f18b8892e00  1 bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2019-03-11 15:43:43.845299 7f18b8892e00  1 bdev(0x563466b4cb40 /var/lib/ceph/osd/ceph-8/block) close
2019-03-11 15:43:44.169681 7f18b8892e00  1 bluestore(/var/lib/ceph/osd/ceph-8) _mount path /var/lib/ceph/osd/ceph-8
2019-03-11 15:43:44.170038 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block type kernel
2019-03-11 15:43:44.170043 7f18b8892e00  1 bdev(0x563466b4cd80 /var/lib/ceph/osd/ceph-8/block) open path /var/lib/ceph/osd/ceph-8/block
2019-03-11 15:43:44.170205 7f18b8892e00  1 bdev(0x563466b4cd80 /var/lib/ceph/osd/ceph-8/block) open size 6001170317312 (0x57541a00000, 5.46TiB) block_size 4096 (4KiB) rotational
2019-03-11 15:43:44.170470 7f18b8892e00  1 bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2019-03-11 15:43:44.170522 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block.db type kernel
2019-03-11 15:43:44.170526 7f18b8892e00  1 bdev(0x563466b4d200 /var/lib/ceph/osd/ceph-8/block.db) open path /var/lib/ceph/osd/ceph-8/block.db
2019-03-11 15:43:44.170647 7f18b8892e00  1 bdev(0x563466b4d200 /var/lib/ceph/osd/ceph-8/block.db) open size 5997854720 (0x165800000, 5.59GiB) block_size 4096 (4KiB) non-rotational
2019-03-11 15:43:44.170655 7f18b8892e00  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-8/block.db size 5.59GiB
2019-03-11 15:43:44.172927 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block type kernel
2019-03-11 15:43:44.172937 7f18b8892e00  1 bdev(0x563466b4d440 /var/lib/ceph/osd/ceph-8/block) open path /var/lib/ceph/osd/ceph-8/block
2019-03-11 15:43:44.173124 7f18b8892e00  1 bdev(0x563466b4d440 /var/lib/ceph/osd/ceph-8/block) open size 6001170317312 (0x57541a00000, 5.46TiB) block_size 4096 (4KiB) rotational
2019-03-11 15:43:44.173136 7f18b8892e00  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-8/block size 5.46TiB
2019-03-11 15:43:44.173171 7f18b8892e00  1 bluefs mount
2019-03-11 15:43:44.178468 7f18b8892e00 -1 *** Caught signal (Segmentation fault) **
 in thread 7f18b8892e00 thread_name:ceph-osd

 ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)
 1: (()+0xa56bd4) [0x56345cdc8bd4]
 2: (()+0x110c0) [0x7f18b5e980c0]
 3: (BlueFS::_replay(bool)+0x1616) [0x56345cd7fb96]
 4: (BlueFS::mount()+0x1e1) [0x56345cd82aa1]
 5: (BlueStore::_open_db(bool)+0x1698) [0x56345cc8c6b8]
 6: (BlueStore::_mount(bool)+0x2b4) [0x56345ccc5cf4]
 7: (OSD::init()+0x3e2) [0x56345c813fe2]
 8: (main()+0x3092) [0x56345c71d3c2]
 9: (__libc_start_main()+0xf1) [0x7f18b4e4d2e1]
 10: (_start()+0x2a) [0x56345c7a9f9a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -75> 2019-03-11 15:43:43.826473 7f18b8892e00  5 asok(0x563466baf4a0) register_command perfcounters_dump hook 0x563466b4a1c0
   -74> 2019-03-11 15:43:43.826489 7f18b8892e00  5 asok(0x563466baf4a0) register_command 1 hook 0x563466b4a1c0
   -73> 2019-03-11 15:43:43.826491 7f18b8892e00  5 asok(0x563466baf4a0) register_command perf dump hook 0x563466b4a1c0
   -72> 2019-03-11 15:43:43.826494 7f18b8892e00  5 asok(0x563466baf4a0) register_command perfcounters_schema hook 0x563466b4a1c0
   -71> 2019-03-11 15:43:43.826496 7f18b8892e00  5 asok(0x563466baf4a0) register_command perf histogram dump hook 0x563466b4a1c0
   -70> 2019-03-11 15:43:43.826498 7f18b8892e00  5 asok(0x563466baf4a0) register_command 2 hook 0x563466b4a1c0
   -69> 2019-03-11 15:43:43.826499 7f18b8892e00  5 asok(0x563466baf4a0) register_command perf schema hook 0x563466b4a1c0
   -68> 2019-03-11 15:43:43.826501 7f18b8892e00  5 asok(0x563466baf4a0) register_command perf histogram schema hook 0x563466b4a1c0
   -67> 2019-03-11 15:43:43.826503 7f18b8892e00  5 asok(0x563466baf4a0) register_command perf reset hook 0x563466b4a1c0
   -66> 2019-03-11 15:43:43.826511 7f18b8892e00  5 asok(0x563466baf4a0) register_command config show hook 0x563466b4a1c0
   -65> 2019-03-11 15:43:43.826513 7f18b8892e00  5 asok(0x563466baf4a0) register_command config help hook 0x563466b4a1c0
   -64> 2019-03-11 15:43:43.826516 7f18b8892e00  5 asok(0x563466baf4a0) register_command config set hook 0x563466b4a1c0
   -63> 2019-03-11 15:43:43.826518 7f18b8892e00  5 asok(0x563466baf4a0) register_command config get hook 0x563466b4a1c0
   -62> 2019-03-11 15:43:43.826519 7f18b8892e00  5 asok(0x563466baf4a0) register_command config diff hook 0x563466b4a1c0
   -61> 2019-03-11 15:43:43.826522 7f18b8892e00  5 asok(0x563466baf4a0) register_command config diff get hook 0x563466b4a1c0
   -60> 2019-03-11 15:43:43.826524 7f18b8892e00  5 asok(0x563466baf4a0) register_command log flush hook 0x563466b4a1c0
   -59> 2019-03-11 15:43:43.826526 7f18b8892e00  5 asok(0x563466baf4a0) register_command log dump hook 0x563466b4a1c0
   -58> 2019-03-11 15:43:43.826528 7f18b8892e00  5 asok(0x563466baf4a0) register_command log reopen hook 0x563466b4a1c0
   -57> 2019-03-11 15:43:43.826538 7f18b8892e00  5 asok(0x563466baf4a0) register_command dump_mempools hook 0x563466e5ada8
   -56> 2019-03-11 15:43:43.831453 7f18b8892e00  0 set uid:gid to 64045:64045 (ceph:ceph)
   -55> 2019-03-11 15:43:43.831468 7f18b8892e00  0 ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable), process ceph-osd, pid 2988913
   -54> 2019-03-11 15:43:43.831501 7f18b8892e00  5 object store type is bluestore
   -53> 2019-03-11 15:43:43.836104 7f18b2aee700  2 Event(0x563466b4c500 nevent=5000 time_id=1).set_owner idx=0 owner=139744053749504
   -52> 2019-03-11 15:43:43.836145 7f18b22ed700  2 Event(0x563466b4c740 nevent=5000 time_id=1).set_owner idx=1 owner=139744045356800
   -51> 2019-03-11 15:43:43.836152 7f18b1aec700  2 Event(0x563466b4c980 nevent=5000 time_id=1).set_owner idx=2 owner=139744036964096
   -50> 2019-03-11 15:43:43.836545 7f18b8892e00  1 -- 172.17.1.54:0/0 learned_addr learned my addr 172.17.1.54:0/0
   -49> 2019-03-11 15:43:43.836554 7f18b8892e00  1 -- 172.17.1.54:6802/2988913 _finish_bind bind my_inst.addr is 172.17.1.54:6802/2988913
   -48> 2019-03-11 15:43:43.836608 7f18b8892e00  1 -- 10.10.10.5:0/0 learned_addr learned my addr 10.10.10.5:0/0
   -47> 2019-03-11 15:43:43.836615 7f18b8892e00  1 -- 10.10.10.5:6802/2988913 _finish_bind bind my_inst.addr is 10.10.10.5:6802/2988913
   -46> 2019-03-11 15:43:43.836682 7f18b8892e00  1 -- 10.10.10.5:0/0 learned_addr learned my addr 10.10.10.5:0/0
   -45> 2019-03-11 15:43:43.836687 7f18b8892e00  1 -- 10.10.10.5:6803/2988913 _finish_bind bind my_inst.addr is 10.10.10.5:6803/2988913
   -44> 2019-03-11 15:43:43.836754 7f18b8892e00  1 -- 172.17.1.54:0/0 learned_addr learned my addr 172.17.1.54:0/0
   -43> 2019-03-11 15:43:43.836759 7f18b8892e00  1 -- 172.17.1.54:6803/2988913 _finish_bind bind my_inst.addr is 172.17.1.54:6803/2988913
   -42> 2019-03-11 15:43:43.836761 7f18b8892e00  0 pidfile_write: ignore empty --pid-file
   -41> 2019-03-11 15:43:43.838350 7f18b8892e00  5 asok(0x563466baf4a0) init /var/run/ceph/ceph-osd.8.asok
   -40> 2019-03-11 15:43:43.838362 7f18b8892e00  5 asok(0x563466baf4a0) bind_and_listen /var/run/ceph/ceph-osd.8.asok
   -39> 2019-03-11 15:43:43.838411 7f18b8892e00  5 asok(0x563466baf4a0) register_command 0 hook 0x563466b481a8
   -38> 2019-03-11 15:43:43.838419 7f18b8892e00  5 asok(0x563466baf4a0) register_command version hook 0x563466b481a8
   -37> 2019-03-11 15:43:43.838424 7f18b8892e00  5 asok(0x563466baf4a0) register_command git_version hook 0x563466b481a8
   -36> 2019-03-11 15:43:43.838429 7f18b8892e00  5 asok(0x563466baf4a0) register_command help hook 0x563466b4a620
   -35> 2019-03-11 15:43:43.838431 7f18b8892e00  5 asok(0x563466baf4a0) register_command get_command_descriptions hook 0x563466b4a630
   -34> 2019-03-11 15:43:43.838488 7f18b031b700  5 asok(0x563466baf4a0) entry start
   -33> 2019-03-11 15:43:43.838497 7f18b8892e00 10 monclient: build_initial_monmap
   -32> 2019-03-11 15:43:43.844687 7f18b8892e00  0 load: jerasure load: lrc load: isa
   -31> 2019-03-11 15:43:43.844745 7f18b8892e00  5 adding auth protocol: none
   -30> 2019-03-11 15:43:43.844750 7f18b8892e00  5 adding auth protocol: none
   -29> 2019-03-11 15:43:43.844789 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block type kernel
   -28> 2019-03-11 15:43:43.844798 7f18b8892e00  1 bdev(0x563466b4cb40 /var/lib/ceph/osd/ceph-8/block) open path /var/lib/ceph/osd/ceph-8/block
   -27> 2019-03-11 15:43:43.845001 7f18b8892e00  1 bdev(0x563466b4cb40 /var/lib/ceph/osd/ceph-8/block) open size 6001170317312 (0x57541a00000, 5.46TiB) block_size 4096 (4KiB) rotational
   -26> 2019-03-11 15:43:43.845283 7f18b8892e00  1 bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
   -25> 2019-03-11 15:43:43.845299 7f18b8892e00  1 bdev(0x563466b4cb40 /var/lib/ceph/osd/ceph-8/block) close
   -24> 2019-03-11 15:43:44.169462 7f18b8892e00  5 asok(0x563466baf4a0) register_command objecter_requests hook 0x563466b4a6b0
   -23> 2019-03-11 15:43:44.169528 7f18b8892e00  1 -- 172.17.1.54:6802/2988913 start start
   -22> 2019-03-11 15:43:44.169536 7f18b8892e00  1 -- - start start
   -21> 2019-03-11 15:43:44.169537 7f18b8892e00  1 -- - start start
   -20> 2019-03-11 15:43:44.169538 7f18b8892e00  1 -- 172.17.1.54:6803/2988913 start start
   -19> 2019-03-11 15:43:44.169542 7f18b8892e00  1 -- 10.10.10.5:6803/2988913 start start
   -18> 2019-03-11 15:43:44.169544 7f18b8892e00  1 -- 10.10.10.5:6802/2988913 start start
   -17> 2019-03-11 15:43:44.169547 7f18b8892e00  1 -- - start start
   -16> 2019-03-11 15:43:44.169667 7f18b8892e00  2 osd.8 0 init /var/lib/ceph/osd/ceph-8 (looks like hdd)
   -15> 2019-03-11 15:43:44.169673 7f18b8892e00  2 osd.8 0 journal /var/lib/ceph/osd/ceph-8/journal
   -14> 2019-03-11 15:43:44.169681 7f18b8892e00  1 bluestore(/var/lib/ceph/osd/ceph-8) _mount path /var/lib/ceph/osd/ceph-8
   -13> 2019-03-11 15:43:44.170038 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block type kernel
   -12> 2019-03-11 15:43:44.170043 7f18b8892e00  1 bdev(0x563466b4cd80 /var/lib/ceph/osd/ceph-8/block) open path /var/lib/ceph/osd/ceph-8/block
   -11> 2019-03-11 15:43:44.170205 7f18b8892e00  1 bdev(0x563466b4cd80 /var/lib/ceph/osd/ceph-8/block) open size 6001170317312 (0x57541a00000, 5.46TiB) block_size 4096 (4KiB) rotational
   -10> 2019-03-11 15:43:44.170470 7f18b8892e00  1 bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
    -9> 2019-03-11 15:43:44.170522 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block.db type kernel
    -8> 2019-03-11 15:43:44.170526 7f18b8892e00  1 bdev(0x563466b4d200 /var/lib/ceph/osd/ceph-8/block.db) open path /var/lib/ceph/osd/ceph-8/block.db
    -7> 2019-03-11 15:43:44.170647 7f18b8892e00  1 bdev(0x563466b4d200 /var/lib/ceph/osd/ceph-8/block.db) open size 5997854720 (0x165800000, 5.59GiB) block_size 4096 (4KiB) non-rotational
    -6> 2019-03-11 15:43:44.170655 7f18b8892e00  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-8/block.db size 5.59GiB
    -5> 2019-03-11 15:43:44.172927 7f18b8892e00  1 bdev create path /var/lib/ceph/osd/ceph-8/block type kernel
    -4> 2019-03-11 15:43:44.172937 7f18b8892e00  1 bdev(0x563466b4d440 /var/lib/ceph/osd/ceph-8/block) open path /var/lib/ceph/osd/ceph-8/block
    -3> 2019-03-11 15:43:44.173124 7f18b8892e00  1 bdev(0x563466b4d440 /var/lib/ceph/osd/ceph-8/block) open size 6001170317312 (0x57541a00000, 5.46TiB) block_size 4096 (4KiB) rotational
    -2> 2019-03-11 15:43:44.173136 7f18b8892e00  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-8/block size 5.46TiB
    -1> 2019-03-11 15:43:44.173171 7f18b8892e00  1 bluefs mount
     0> 2019-03-11 15:43:44.178468 7f18b8892e00 -1 *** Caught signal (Segmentation fault) **
 in thread 7f18b8892e00 thread_name:ceph-osd

 ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)
 1: (()+0xa56bd4) [0x56345cdc8bd4]
 2: (()+0x110c0) [0x7f18b5e980c0]
 3: (BlueFS::_replay(bool)+0x1616) [0x56345cd7fb96]
 4: (BlueFS::mount()+0x1e1) [0x56345cd82aa1]
 5: (BlueStore::_open_db(bool)+0x1698) [0x56345cc8c6b8]
 6: (BlueStore::_mount(bool)+0x2b4) [0x56345ccc5cf4]
 7: (OSD::init()+0x3e2) [0x56345c813fe2]
 8: (main()+0x3092) [0x56345c71d3c2]
 9: (__libc_start_main()+0xf1) [0x7f18b4e4d2e1]
 10: (_start()+0x2a) [0x56345c7a9f9a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.8.log
--- end dump of recent events ---

History

#1 Updated by Igor Fedotov 12 months ago

  • Status changed from New to Closed

Also available in: Atom PDF