Project

General

Profile

Actions

Bug #21062

closed

ceph-osd crashes in bluestore

Added by Artemy Kapitula over 6 years ago. Updated over 4 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph-osd luminous crashes on startup after emulating power loss (via magic sysrq reboot):

Aug 22 13:27:47 dpr-2a1702-061-crd systemd1: Started Ceph object storage daemon osd.ceph.36.
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.352925 7fe7efb49d80 -1 WARNING: the following dangerous and experimental features are enabled: btrfs
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.353097 7fe7efb49d80 -1 WARNING: the following dangerous and experimental features are enabled: btrfs
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.353105 7fe7efb49d80 0 set uid:gid to 167:167 (ceph:ceph)
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.353114 7fe7efb49d80 0 ceph version 12.1.2-463-gbdaf63e (bdaf63e2951e0d23440c19c3ab075e48cf65132b) luminous (rc), process
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: starting osd.36 at - osd_data /var/lib/ceph/osd/ceph-36 /var/lib/ceph/osd/ceph-36/journal
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.373036 7fe7efb49d80 0 pidfile_write: ignore empty --pid-file
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.429496 7fe7efb49d80 -1 WARNING: the following dangerous and experimental features are enabled: btrfs
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.435158 7fe7efb49d80 0 load: jerasure load: lrc load: isa
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733362 7fe7efb49d80 0 set rocksdb option compaction_readahead_size = 2097152
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733380 7fe7efb49d80 0 set rocksdb option compression = kNoCompression
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733385 7fe7efb49d80 0 set rocksdb option max_write_buffer_number = 4
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733395 7fe7efb49d80 0 set rocksdb option min_write_buffer_number_to_merge = 1
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733399 7fe7efb49d80 0 set rocksdb option recycle_log_file_num = 4
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733402 7fe7efb49d80 0 set rocksdb option writable_file_max_buffer_size = 0
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733406 7fe7efb49d80 0 set rocksdb option write_buffer_size = 268435456
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733430 7fe7efb49d80 0 set rocksdb option compaction_readahead_size = 2097152
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733435 7fe7efb49d80 0 set rocksdb option compression = kNoCompression
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733438 7fe7efb49d80 0 set rocksdb option max_write_buffer_number = 4
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733441 7fe7efb49d80 0 set rocksdb option min_write_buffer_number_to_merge = 1
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733444 7fe7efb49d80 0 set rocksdb option recycle_log_file_num = 4
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733447 7fe7efb49d80 0 set rocksdb option writable_file_max_buffer_size = 0
Aug 22 13:27:47 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:27:47.733450 7fe7efb49d80 0 set rocksdb option write_buffer_size = 268435456
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:29:28.193934 7fe7efb49d80 -1 bluefs allocate failed to allocate 0x100000 on bdev 2, dne
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:29:28.193940 7fe7efb49d80 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x1a19
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:29:28.389892 7fe7efb49d80 -1 rocksdb: Corruption: file is too short to be an sstable
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:29:28.389909 7fe7efb49d80 -1 bluestore(/var/lib/ceph/osd/ceph-36) _open_db erroring opening db:
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 2017-08-22 13:29:28.390166 7fe7efb49d80 -1 bluefs _allocate failed to allocate 0x400000 on bdev 2, dne/root/rpmbuild/BUILD/ceph-12.1.2-463-gb
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: /root/rpmbuild/BUILD/ceph-12.1.2-463-gbdaf63e/src/os/bluestore/BlueFS.cc: 1198: FAILED assert(r == 0)
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: ceph version 12.1.2-463-gbdaf63e (bdaf63e2951e0d23440c19c3ab075e48cf65132b) luminous (rc)
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 1: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x5604fdb166a0]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 2: (BlueFS::_compact_log_async(std::unique_lock<std::mutex>&)+0x1831) [0x5604fda9d9c1]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 3: (BlueFS::sync_metadata()+0x545) [0x5604fda9e045]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 4: (BlueFS::umount()+0x74) [0x5604fda9e234]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 5: (BlueStore::_open_db(bool)+0x568) [0x5604fd9c4af8]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 6: (BlueStore::_mount(bool)+0x3fe) [0x5604fd9f6d7e]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 7: (OSD::init()+0x3ba) [0x5604fd5c202a]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 8: (main()+0x2def) [0x5604fd4c97ef]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 9: (__libc_start_main()+0xf5) [0x7fe7ec01baf5]
Aug 22 13:29:28 dpr-2a1702-061-crd ceph-custom-daemon13792: 10: (()+0x4abb29) [0x5604fd567b29]

Actions #1

Updated by Artemy Kapitula over 6 years ago

There are two similar OSDs with the same error.

Actions #2

Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to bluestore
  • Category deleted (OSD)
Actions #3

Updated by Sage Weil over 6 years ago

  • Status changed from New to Can't reproduce

this looks like an early luminous rc. we haven't seen these errors since 12.2.z

Actions #4

Updated by Jie Chen over 4 years ago

I had the same problem

ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)

Logļ¼š

-20> 2019-12-27 06:32:02.935 7f1cc9373dc0  4 rocksdb: [db/db_impl_open.cc:583] Recovering log #110569 mode 0
-19> 2019-12-27 06:32:03.432 7f1cc9373dc0 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #110607 mode 0
-18> 2019-12-27 06:32:04.174 7f1cc9373dc0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1577428324175337, "job": 1, "event": "recovery_finished"}
-17> 2019-12-27 06:32:04.174 7f1cc9373dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
-16> 2019-12-27 06:32:04.200 7f1cc9373dc0 1 get compressor snappy = 0
-15> 2019-12-27 06:32:04.200 7f1cc9373dc0 1 load compressor snappy
-14> 2019-12-27 06:32:04.200 7f1cc9373dc0 1 add compressor snappy 0x55894ececdb0
-13> 2019-12-27 06:32:04.200 7f1cc9373dc0 1 get compressor snappy = 0x55894ececdb0
-12> 2019-12-27 06:32:04.200 7f1cc9373dc0 1 load: compressor snappy loaded and registered
-11> 2019-12-27 06:32:04.200 7f1cc9373dc0 1 get compressor snappy = 0x55894ececdb0
-10> 2019-12-27 06:32:04.200 7f1cc9373dc0 1 freelist init
-9> 2019-12-27 06:32:04.214 7f1cc9373dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _open_alloc opening allocation metadata
-8> 2019-12-27 06:32:05.347 7f1cc9373dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _open_alloc loaded 488 GiB in 4297586 extents
-7> 2019-12-27 06:32:05.353 7f1cc9373dc0 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
-6> 2019-12-27 06:32:05.369 7f1cc9373dc0 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
-5> 2019-12-27 06:32:05.371 7f1cc9373dc0 1 bluefs umount
-4> 2019-12-27 06:32:05.371 7f1cc9373dc0 1 bluefs _allocate failed to allocate 0x400000 on bdev 1, free 0x300000; fallback to bdev 2
-3> 2019-12-27 06:32:05.371 7f1cc9373dc0 1 bluefs _allocate unable to allocate 0x400000 on bdev 2, free 0xffffffffffffffff; fallback to slow device expander
-2> 2019-12-27 06:32:05.511 7f1cc9373dc0 -1 bluefs _allocate failed to allocate 0x400000 on bdev 1, free 0x300000
-1> 2019-12-27 06:32:05.515 7f1cc9373dc0 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/src/os/bluestore/BlueFS.cc: In function 'void BlueFS::_compact_log_async(std::unique_lock&lt;std::mutex&gt;&)' thread 7f1cc9373dc0 time 2019-12-27 06:32:05.512095
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/src/os/bluestore/BlueFS.cc: 1809: FAILED ceph_assert(r == 0)
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x558940a18a24]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x558940a18bf2]
3: (BlueFS::_compact_log_async(std::unique_lock&lt;std::mutex&gt;&)+0x1c41) [0x5589410290e1]
4: (BlueFS::sync_metadata()+0x215) [0x5589410293d5]
5: (BlueFS::umount()+0xea) [0x5589410296fa]
6: (BlueStore::_close_bluefs()+0xd) [0x558940ec494d]
7: (BlueStore::_open_db_and_around(bool)+0x157) [0x558940f11317]
8: (BlueStore::_mount(bool, bool)+0x6a4) [0x558940f4fc24]
9: (OSD::init()+0x3aa) [0x558940ac2efa]
10: (main()+0x14fa) [0x558940a1d1da]
11: (__libc_start_main()+0xf5) [0x7f1cc4d93505]
12: (()+0x4b2695) [0x558940a52695]
0> 2019-12-27 06:32:05.519 7f1cc9373dc0 -1 ** Caught signal (Aborted) *
in thread 7f1cc9373dc0 thread_name:ceph-osd
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (()+0xf5f0) [0x7f1cc5fb35f0]
2: (gsignal()+0x37) [0x7f1cc4da7337]
3: (abort()+0x148) [0x7f1cc4da8a28]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x558940a18a73]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x558940a18bf2]
6: (BlueFS::_compact_log_async(std::unique_lock&lt;std::mutex&gt;&)+0x1c41) [0x5589410290e1]
7: (BlueFS::sync_metadata()+0x215) [0x5589410293d5]
8: (BlueFS::umount()+0xea) [0x5589410296fa]
9: (BlueStore::_close_bluefs()+0xd) [0x558940ec494d]
10: (BlueStore::_open_db_and_around(bool)+0x157) [0x558940f11317]
11: (BlueStore::_mount(bool, bool)+0x6a4) [0x558940f4fc24]
12: (OSD::init()+0x3aa) [0x558940ac2efa]
13: (main()+0x14fa) [0x558940a1d1da]
14: (__libc_start_main()+0xf5) [0x7f1cc4d93505]
15: (()+0x4b2695) [0x558940a52695]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context

Actions #5

Updated by Igor Fedotov over 4 years ago

Jie Chen, in your case it looks like lack of free space at both db and main device to keep more DB data.

Actions #6

Updated by Jie Chen over 4 years ago

Igor Fedotov wrote:

Jie Chen, in your case it looks like lack of free space at both db and main device to keep more DB data.

Thank you very much, but how to restore it. OSD failed to start

Actions

Also available in: Atom PDF