Bug #49110
closedBlueFS.cc: 1542: FAILED assert(r == 0)
0%
Description
All the SSD based OSDs in my ceph cluster crashed.
The initial error was:
2021-01-31 05:15:21.130636 7f81c7bd6700 1 bluefs add_block_extent bdev 1 0xe8db800000~300000 2021-01-31 05:15:34.251287 7f81c7bd6700 1 bluefs _allocate failed to allocate 0x400000 on bdev 1, free 0x3c0000; fallback to bdev 2 2021-01-31 05:15:34.251309 7f81c7bd6700 -1 bluefs _allocate failed to allocate 0x on bdev 2, dne 2021-01-31 05:15:34.256487 7f81c7bd6700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2 .13/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, uint64_t, uint64_t)' thread 7f81c7bd6700 time 2021-01-31 05:15:34.251323 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueFS.cc: 1542: FAILED assert(r == 0) ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x562989083ae0] 2: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x1704) [0x562989002804] 3: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x2a1) [0x5629890032b1] 4: (BlueRocksWritableFile::Sync()+0x63) [0x56298901d073] 5: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x149) [0x562989409159] 6: (rocksdb::WritableFileWriter::Sync(bool)+0xe8) [0x562989409e28] 7: (rocksdb::DBImpl::WriteToWAL(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, rocksdb::log::Writer*, bool, bool, unsigned long)+0x41a) [0x56298945598a] 8: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x94b) [0x56298945656b] 9: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x562989457537] 10: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xcf) [0x562988f9253f] 11: (BlueStore::_kv_sync_thread()+0x1c6f) [0x562988f267df] 12: (BlueStore::KVSyncThread::entry()+0xd) [0x562988f6fa6d] 13: (()+0x7ea5) [0x7f81d8ca6ea5] 14: (clone()+0x6d) [0x7f81d7d968dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
a fuller log is in initial-osd.108.cut.txt
Any try of recovery fails with:
2021-02-02 12:53:18.400571 7f5da959dec0 1 bdev(0x55670aad6600 /var/lib/ceph/osd/ceph-108/block) open path /var/lib/ceph/osd/ceph-108/block 2021-02-02 12:53:18.400716 7f5da959dec0 1 bdev(0x55670aad6600 /var/lib/ceph/osd/ceph-108/block) open size 1000203091968 (0xe8e0c00000, 932GiB) block_size 4096 (4KiB) non-rotational 2021-02-02 12:53:18.400731 7f5da959dec0 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-108/block size 932GiB 2021-02-02 12:53:18.400758 7f5da959dec0 1 bluefs mount 2021-02-02 12:53:18.401032 7f5da959dec0 1 bluefs _init_alloc id 1 alloc_size 0x10000 size 0xe8e0c00000 2021-02-02 13:21:09.304110 7f5da959dec0 -1 *** Caught signal (Segmentation fault) ** in thread 7f5da959dec0 thread_name:ceph-osd ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable) 1: (()+0xa76e51) [0x5566ff53ce51] 2: (()+0xf630) [0x7f5da6a40630] 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x5566ff4efcd2] 4: (BlueFS::_replay(bool)+0x48d) [0x5566ff50378d] 5: (BlueFS::mount()+0x1d4) [0x5566ff507374] 6: (BlueStore::_open_db(bool)+0x1857) [0x5566ff415827] 7: (BlueStore::_mount(bool)+0x2f6) [0x5566ff44acf6] 8: (OSD::init()+0x472) [0x5566fefe9762] 9: (main()+0x2f49) [0x5566feeea719] 10: (__libc_start_main()+0xf5) [0x7f5da5a4c555] 11: (()+0x4c4383) [0x5566fef8a383] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
a fuller log is in
no-recovery-osd.108.txt
OS is CentOS 7.8.2003
# grep baseurl /etc/yum.repos.d/ceph.repo | head -1 baseurl=http://download.ceph.com/rpm-luminous/el7/$basearch # uname -a Linux dch2-ceph-data-013.ceph.intern.net.de 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
A try to fix one of the offending OSDs results in
# ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-109 *** Caught signal (Segmentation fault) ** in thread 7f7258eb0ec0 thread_name:ceph-bluestore- ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable) 1: (()+0x4025e1) [0x557c65a7e5e1] 2: (()+0xf630) [0x7f724dcb0630] 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x557c65851992] 4: (BlueFS::_replay(bool)+0x48d) [0x557c6586544d] 5: (BlueFS::mount()+0x1d4) [0x557c65869034] 6: (BlueStore::_open_db(bool)+0x1857) [0x557c658c06e7] 7: (BlueStore::_fsck(bool, bool)+0x3c7) [0x557c658f65c7] 8: (main()+0xf24) [0x557c657a8c24] 9: (__libc_start_main()+0xf5) [0x7f724c681555] 10: (()+0x1c444f) [0x557c6584044f] 2021-02-02 17:17:19.152141 7f7258eb0ec0 -1 *** Caught signal (Segmentation fault) **
# gdb ceph-bluestore-tool (gdb) run fsck --path /var/lib/ceph/osd/ceph-109 ... Program received signal SIGSEGV, Segmentation fault. 0x0000555555729992 in BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*) () (gdb) bt #0 0x0000555555729992 in BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*) () #1 0x000055555573d44d in BlueFS::_replay(bool) () #2 0x0000555555741034 in BlueFS::mount() () #3 0x00005555557986e7 in BlueStore::_open_db(bool) () #4 0x00005555557ce5c7 in BlueStore::_fsck(bool, bool) () #5 0x0000555555680c24 in main () (gdb) info locals No symbol table info available. (gdb)
# ceph osd tree ... -37 10.91638 host dch2-ceph-data-012 102 sbs 1.81940 osd.102 up 1.00000 1.00000 103 sbs 1.81940 osd.103 up 1.00000 1.00000 104 sbs 1.81940 osd.104 up 1.00000 1.00000 105 sbs 1.81940 osd.105 up 1.00000 1.00000 106 sbs 1.81940 osd.106 up 1.00000 1.00000 107 sbs 1.81940 osd.107 up 1.00000 1.00000 -40 5.45819 host dch2-ceph-data-013 108 gpssd 0.90970 osd.108 down 0 1.00000 109 gpssd 0.90970 osd.109 down 0 1.00000 110 gpssd 0.90970 osd.110 down 0 1.00000 111 gpssd 0.90970 osd.111 down 0 1.00000 112 gpssd 0.90970 osd.112 down 0 1.00000 ...
I will disconnect the affected servers from my productive cluster tomorrow,
and I can than isolate the servers, and if necessary
reconfigure the network towards a stageing cluster.
Files
Updated by Igor Fedotov over 3 years ago
Could you please set debug-bluefs to 20, start OSD again and share the relevant OSD log... Or at least last 20000 lines of the log if it's too large...
Updated by Andreas Buschmann over 3 years ago
My first try with
debug bluefs = 20
failed after 15 hours due to logrotate killing my ceph-osd process.
Is there a way to speed this up?
as in: is there a way to run the ceph-osd with a lower logging level and increase the logging level after X
I am currently logging 3GByte a minute. (through rotatelogs from apache)
Anyway, I have deleted the pool which used the broken OSDs, and now my ceph cluster is healthy.
I am still debugging the broken OSDs, and now the time pressure is of.
Updated by Andreas Buschmann about 3 years ago
- File ceph-osd.108.log.00.xz ceph-osd.108.log.00.xz added
- File ceph-osd.108.log.01.xz ceph-osd.108.log.01.xz added
- File ceph-osd.108.log.02.xz ceph-osd.108.log.02.xz added
- File ceph-osd.108.log.03.xz ceph-osd.108.log.03.xz added
- File ceph-osd.108.log.04.xz ceph-osd.108.log.04.xz added
- File ceph-osd.108.log.05.xz ceph-osd.108.log.05.xz added
- File ceph-osd.108.log.06.xz ceph-osd.108.log.06.xz added
- File ceph-osd.108.log.07.xz ceph-osd.108.log.07.xz added
- File ceph-osd.108.log.08.xz ceph-osd.108.log.08.xz added
- File ceph-osd.108.log.09.xz ceph-osd.108.log.09.xz added
I now have a more complete dump.
Upfront I ran
yum install ceph-debuginfo yum install httpd-tools httpd
so I could pipe ceph-osd.108.log through rotatelogs.
wc -l ceph-osd.108.log.* 4365 ceph-osd.108.log.00 4362 ceph-osd.108.log.01 4363 ceph-osd.108.log.02 4360 ceph-osd.108.log.03 4361 ceph-osd.108.log.04 4363 ceph-osd.108.log.05 4362 ceph-osd.108.log.06 4367 ceph-osd.108.log.07 4362 ceph-osd.108.log.08 4360 ceph-osd.108.log.09 818 ceph-osd.108.log.10 44443 total
If required, I can load up the last 8 minutes previous to that logs.
Before compression each of the files *.00 - *.09 was 1.5GB.
Updated by Andreas Buschmann about 3 years ago
- File ceph-osd.108.log.10.xz ceph-osd.108.log.10.xz added
Here is the last file: ceph-osd.108.log.10.xz
Updated by Igor Fedotov about 3 years ago
Given Ceph version provided and huge BlueFS log file size:
-3> 2021-02-06 09:39:29.927409 7ff72040dec0 10 bluefs _read h 0x560472780d80 0x4779b0d000~f4000 from file(ino 1 size 0x4779b0c000
I think you're facing bug(s) covered by
https://tracker.ceph.com/issues/45903
https://github.com/ceph/ceph/pull/35776
and
https://github.com/ceph/ceph/pull/34876
Unfortunately v12.2.13 doesn't include these fixes - they've been backported to Luminous after it had been released. And no newer Luminous version is available since Luminous is at EOL.
Updated by Igor Fedotov about 3 years ago
- Status changed from New to Triaged
Once fresher Luminous build (highly likely to be done on your own) is obtained you might try recovery procedure provided at https://github.com/ceph/ceph/pull/35776
As a permanent fix I would suggest upgrading the cluster...
Updated by Igor Fedotov about 1 year ago
- Status changed from Triaged to Won't Fix