Bug #49110: BlueFS.cc: 1542: FAILED assert(r == 0) - bluestore - Ceph

Actions

Copy link

Bug #49110

closed

BlueFS.cc: 1542: FAILED assert(r == 0)

Added by Andreas Buschmann about 3 years ago. Updated about 1 year ago.

Status:

Won't Fix

Priority:

Low

Assignee:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v12.2.13

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

All the SSD based OSDs in my ceph cluster crashed.

The initial error was:

2021-01-31 05:15:21.130636 7f81c7bd6700  1 bluefs add_block_extent bdev 1 0xe8db800000~300000
2021-01-31 05:15:34.251287 7f81c7bd6700  1 bluefs _allocate failed to allocate 0x400000 on bdev 1, free 0x3c0000; fallback to bdev 2
2021-01-31 05:15:34.251309 7f81c7bd6700 -1 bluefs _allocate failed to allocate 0x on bdev 2, dne
2021-01-31 05:15:34.256487 7f81c7bd6700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2
.13/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, uint64_t, uint64_t)' thread 7f81c7bd6700 time 2021-01-31 05:15:34.251323
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueFS.cc: 1542: FAILED assert(r == 0)

 ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x562989083ae0]
 2: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x1704) [0x562989002804]
 3: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x2a1) [0x5629890032b1]
 4: (BlueRocksWritableFile::Sync()+0x63) [0x56298901d073]
 5: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x149) [0x562989409159]
 6: (rocksdb::WritableFileWriter::Sync(bool)+0xe8) [0x562989409e28]
 7: (rocksdb::DBImpl::WriteToWAL(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, rocksdb::log::Writer*, bool, bool, unsigned long)+0x41a) [0x56298945598a]
 8: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x94b) [0x56298945656b]
 9: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x562989457537]
 10: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xcf) [0x562988f9253f]
 11: (BlueStore::_kv_sync_thread()+0x1c6f) [0x562988f267df]
 12: (BlueStore::KVSyncThread::entry()+0xd) [0x562988f6fa6d]
 13: (()+0x7ea5) [0x7f81d8ca6ea5]
 14: (clone()+0x6d) [0x7f81d7d968dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

a fuller log is in initial-osd.108.cut.txt

Any try of recovery fails with:

2021-02-02 12:53:18.400571 7f5da959dec0  1 bdev(0x55670aad6600 /var/lib/ceph/osd/ceph-108/block) open path /var/lib/ceph/osd/ceph-108/block
2021-02-02 12:53:18.400716 7f5da959dec0  1 bdev(0x55670aad6600 /var/lib/ceph/osd/ceph-108/block) open size 1000203091968 (0xe8e0c00000, 932GiB) block_size 4096 (4KiB) non-rotational
2021-02-02 12:53:18.400731 7f5da959dec0  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-108/block size 932GiB
2021-02-02 12:53:18.400758 7f5da959dec0  1 bluefs mount
2021-02-02 12:53:18.401032 7f5da959dec0  1 bluefs _init_alloc id 1 alloc_size 0x10000 size 0xe8e0c00000
2021-02-02 13:21:09.304110 7f5da959dec0 -1 *** Caught signal (Segmentation fault) **
 in thread 7f5da959dec0 thread_name:ceph-osd

 ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)
 1: (()+0xa76e51) [0x5566ff53ce51]
 2: (()+0xf630) [0x7f5da6a40630]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x5566ff4efcd2]
 4: (BlueFS::_replay(bool)+0x48d) [0x5566ff50378d]
 5: (BlueFS::mount()+0x1d4) [0x5566ff507374]
 6: (BlueStore::_open_db(bool)+0x1857) [0x5566ff415827]
 7: (BlueStore::_mount(bool)+0x2f6) [0x5566ff44acf6]
 8: (OSD::init()+0x472) [0x5566fefe9762]
 9: (main()+0x2f49) [0x5566feeea719]
 10: (__libc_start_main()+0xf5) [0x7f5da5a4c555]
 11: (()+0x4c4383) [0x5566fef8a383]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

a fuller log is in no-recovery-osd.108.txt

OS is CentOS 7.8.2003

# grep baseurl /etc/yum.repos.d/ceph.repo | head -1
baseurl=http://download.ceph.com/rpm-luminous/el7/$basearch
# uname -a
Linux dch2-ceph-data-013.ceph.intern.net.de 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

A try to fix one of the offending OSDs results in

# ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-109
*** Caught signal (Segmentation fault) **
 in thread 7f7258eb0ec0 thread_name:ceph-bluestore-
 ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)
 1: (()+0x4025e1) [0x557c65a7e5e1]
 2: (()+0xf630) [0x7f724dcb0630]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x557c65851992]
 4: (BlueFS::_replay(bool)+0x48d) [0x557c6586544d]
 5: (BlueFS::mount()+0x1d4) [0x557c65869034]
 6: (BlueStore::_open_db(bool)+0x1857) [0x557c658c06e7]
 7: (BlueStore::_fsck(bool, bool)+0x3c7) [0x557c658f65c7]
 8: (main()+0xf24) [0x557c657a8c24]
 9: (__libc_start_main()+0xf5) [0x7f724c681555]
 10: (()+0x1c444f) [0x557c6584044f]
2021-02-02 17:17:19.152141 7f7258eb0ec0 -1 *** Caught signal (Segmentation fault) **

# gdb ceph-bluestore-tool
(gdb) run fsck --path /var/lib/ceph/osd/ceph-109
...
Program received signal SIGSEGV, Segmentation fault.
0x0000555555729992 in BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*) ()
(gdb) bt
#0  0x0000555555729992 in BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*) ()
#1  0x000055555573d44d in BlueFS::_replay(bool) ()
#2  0x0000555555741034 in BlueFS::mount() ()
#3  0x00005555557986e7 in BlueStore::_open_db(bool) ()
#4  0x00005555557ce5c7 in BlueStore::_fsck(bool, bool) ()
#5  0x0000555555680c24 in main ()
(gdb) info locals
No symbol table info available.
(gdb)

# ceph osd tree
...
-37        10.91638     host dch2-ceph-data-012                         
102   sbs   1.81940         osd.102                 up  1.00000 1.00000 
103   sbs   1.81940         osd.103                 up  1.00000 1.00000 
104   sbs   1.81940         osd.104                 up  1.00000 1.00000 
105   sbs   1.81940         osd.105                 up  1.00000 1.00000 
106   sbs   1.81940         osd.106                 up  1.00000 1.00000 
107   sbs   1.81940         osd.107                 up  1.00000 1.00000 
-40         5.45819     host dch2-ceph-data-013                         
108 gpssd   0.90970         osd.108               down        0 1.00000 
109 gpssd   0.90970         osd.109               down        0 1.00000 
110 gpssd   0.90970         osd.110               down        0 1.00000 
111 gpssd   0.90970         osd.111               down        0 1.00000 
112 gpssd   0.90970         osd.112               down        0 1.00000 
...

I will disconnect the affected servers from my productive cluster tomorrow,
and I can than isolate the servers, and if necessary
reconfigure the network towards a stageing cluster.

Files

Download all files

initial-osd.108.cut.txt (482 KB) initial-osd.108.cut.txt	initial problem	Andreas Buschmann, 02/02/2021 04:46 PM
no-recovery-osd.108.txt (14 KB) no-recovery-osd.108.txt	a try to recover fails	Andreas Buschmann, 02/02/2021 04:56 PM
ceph.conf (890 Bytes) ceph.conf		Andreas Buschmann, 02/02/2021 05:28 PM
ceph-osd.108.log.00.xz (686 KB) ceph-osd.108.log.00.xz	01 of 11	Andreas Buschmann, 02/07/2021 09:20 PM
ceph-osd.108.log.01.xz (688 KB) ceph-osd.108.log.01.xz	02 of 11	Andreas Buschmann, 02/07/2021 09:20 PM
ceph-osd.108.log.02.xz (691 KB) ceph-osd.108.log.02.xz	03 of 11	Andreas Buschmann, 02/07/2021 09:20 PM
ceph-osd.108.log.03.xz (700 KB) ceph-osd.108.log.03.xz	04 of 11	Andreas Buschmann, 02/07/2021 09:20 PM
ceph-osd.108.log.04.xz (693 KB) ceph-osd.108.log.04.xz	05 of 11	Andreas Buschmann, 02/07/2021 09:21 PM
ceph-osd.108.log.05.xz (697 KB) ceph-osd.108.log.05.xz	06 of 11	Andreas Buschmann, 02/07/2021 09:21 PM
ceph-osd.108.log.06.xz (688 KB) ceph-osd.108.log.06.xz	07 of 11	Andreas Buschmann, 02/07/2021 09:21 PM
ceph-osd.108.log.07.xz (693 KB) ceph-osd.108.log.07.xz	08 of 11	Andreas Buschmann, 02/07/2021 09:22 PM
ceph-osd.108.log.08.xz (696 KB) ceph-osd.108.log.08.xz	09 of 11	Andreas Buschmann, 02/07/2021 09:22 PM
ceph-osd.108.log.09.xz (702 KB) ceph-osd.108.log.09.xz	10 of 11	Andreas Buschmann, 02/07/2021 09:23 PM
ceph-osd.108.log.10.xz (154 KB) ceph-osd.108.log.10.xz	11 of 11	Andreas Buschmann, 02/07/2021 09:26 PM

Actions

Copy link

Updated by Igor Fedotov about 3 years ago

Could you please set debug-bluefs to 20, start OSD again and share the relevant OSD log... Or at least last 20000 lines of the log if it's too large...

Actions

Copy link

Updated by Andreas Buschmann about 3 years ago

My first try with

debug bluefs = 20

failed after 15 hours due to logrotate killing my ceph-osd process.

Is there a way to speed this up?
as in: is there a way to run the ceph-osd with a lower logging level and increase the logging level after X
I am currently logging 3GByte a minute. (through rotatelogs from apache)

Anyway, I have deleted the pool which used the broken OSDs, and now my ceph cluster is healthy.
I am still debugging the broken OSDs, and now the time pressure is of.

Actions

Copy link Download all files

Updated by Andreas Buschmann about 3 years ago

File ceph-osd.108.log.00.xz ceph-osd.108.log.00.xz added
File ceph-osd.108.log.01.xz ceph-osd.108.log.01.xz added
File ceph-osd.108.log.02.xz ceph-osd.108.log.02.xz added
File ceph-osd.108.log.03.xz ceph-osd.108.log.03.xz added
File ceph-osd.108.log.04.xz ceph-osd.108.log.04.xz added
File ceph-osd.108.log.05.xz ceph-osd.108.log.05.xz added
File ceph-osd.108.log.06.xz ceph-osd.108.log.06.xz added
File ceph-osd.108.log.07.xz ceph-osd.108.log.07.xz added
File ceph-osd.108.log.08.xz ceph-osd.108.log.08.xz added
File ceph-osd.108.log.09.xz ceph-osd.108.log.09.xz added

I now have a more complete dump.

Upfront I ran

yum install ceph-debuginfo
yum install httpd-tools httpd

so I could pipe ceph-osd.108.log through rotatelogs.

 wc -l ceph-osd.108.log.*
       4365 ceph-osd.108.log.00
       4362 ceph-osd.108.log.01
       4363 ceph-osd.108.log.02
       4360 ceph-osd.108.log.03
       4361 ceph-osd.108.log.04
       4363 ceph-osd.108.log.05
       4362 ceph-osd.108.log.06
       4367 ceph-osd.108.log.07
       4362 ceph-osd.108.log.08
       4360 ceph-osd.108.log.09
        818 ceph-osd.108.log.10
      44443 total

If required, I can load up the last 8 minutes previous to that logs.

Before compression each of the files *.00 - *.09 was 1.5GB.

Actions

Copy link

Updated by Andreas Buschmann about 3 years ago

File ceph-osd.108.log.10.xz ceph-osd.108.log.10.xz added

Here is the last file: ceph-osd.108.log.10.xz

Actions

Copy link

Updated by Igor Fedotov about 3 years ago

Given Ceph version provided and huge BlueFS log file size:
-3> 2021-02-06 09:39:29.927409 7ff72040dec0 10 bluefs _read h 0x560472780d80 0x4779b0d000~f4000 from file(ino 1 size 0x4779b0c000

I think you're facing bug(s) covered by
https://tracker.ceph.com/issues/45903
https://github.com/ceph/ceph/pull/35776

and

https://github.com/ceph/ceph/pull/34876

Unfortunately v12.2.13 doesn't include these fixes - they've been backported to Luminous after it had been released. And no newer Luminous version is available since Luminous is at EOL.

Actions

Copy link

Updated by Igor Fedotov about 3 years ago

Status changed from New to Triaged

Once fresher Luminous build (highly likely to be done on your own) is obtained you might try recovery procedure provided at https://github.com/ceph/ceph/pull/35776

As a permanent fix I would suggest upgrading the cluster...

Actions

Copy link