Bug #18375
closedbluestore: bluefs_preextend_wal_files=true is not crash consistent
0%
Description
We're seeing a case when running 11.1.0 bits in Rook where a crash happens (due to some mempool issue) and the OSD fails to start subsequently. The full log is here https://www.dropbox.com/s/9v9xp2x6m1fpzpl/20161230.castle6.log.gz?dl=0 (its large since bluefs logging is set to 20 -- as requested by Sage). Its osd10 in the logs. Here's a summary of events.
First a the mempool crash at around Dec 30 23:32:43
Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167024 I | osd10: 2016-12-30 23:32:43.166982 7f7131d43700 -1 *** Caught signal (Segmentation fault) ** Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167046 I | osd10: in thread 7f7131d43700 thread_name:rookd Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167055 I | osd10: Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167063 I | osd10: ceph version 11.0.2-2419-g9f96471 (9f964718530f45b7e89e8b5eec9d889f99053f8d) Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167072 I | osd10: 1: [0x185c14e] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167080 I | osd10: 2: [0xc2cfb0] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167088 I | osd10: 3: [0xc147ef] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167096 I | osd10: 4: [0x23967e2] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167104 I | osd10: 5: [0xef4213] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167111 I | osd10: 6: [0xf00575] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167119 I | osd10: 7: [0x1f03a27] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167126 I | osd10: 8: [0x1f03fd0] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167133 I | osd10: 9: [0x1d090fb] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167140 I | osd10: 10: [0x1d12fe5] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167148 I | osd10: 11: [0x1f0a3b6] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167156 I | osd10: 12: [0x1f05571] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167164 I | osd10: 13: [0x22661ff] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167171 I | osd10: 14: [0xc234b5] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167178 I | osd10: 15: [0x234be3b] Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167186 I | osd10: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Dec 30 23:32:43 castle6 rkt[1859]: 2016-12-30 23:32:43.167192 I | osd10:
Here's the related backtrace (we're having an issue with symbols):
(gdb) bt #0 raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x000000000185c2ec in reraise_fatal (signum=11) at /home/rook/go/src/github.com/rook/rook/ceph/src/global/signal_handler.cc:72 #2 handle_fatal_signal (signum=11) at /home/rook/go/src/github.com/rook/rook/ceph/src/global/signal_handler.cc:134 #3 <signal handler called> #4 0x0000000000c147ef in (anonymous namespace)::do_memalign(unsigned long, unsigned long) () #5 0x00000000023967e2 in tc_posix_memalign () #6 0x0000000000ef4213 in mempool::pool_allocator<(mempool::pool_index_t)3, char>::allocate_aligned (p=0x0, align=8, n=4096, this=0x32a8a20 <mempool::buffer_data::alloc_char>) at /home/rook/go/src/github.com/rook/rook/ceph/src/include/mempool.h:322 #7 ceph::buffer::raw_combined::create (align=8, len=4008) at /home/rook/go/src/github.com/rook/rook/ceph/src/common/buffer.cc:272 #8 ceph::buffer::list::append (this=this@entry=0x7f7131d3f2f0, data=0x1c6f39c0 "\345\372\223K&J\364>v\313\253\a@\n\300\225\002N\001\321\230\375Y\256u\313cQ\211\324\315\373", len=32) at /home/rook/go/src/github.com/rook/rook/ceph/src/common/buffer.cc:1810 #9 0x0000000000f00575 in CryptoAESKeyHandler::encrypt (this=<optimized out>, in=..., out=..., error=0x0) at /home/rook/go/src/github.com/rook/rook/ceph/src/auth/Crypto.cc:166 #10 0x0000000001f03a27 in CryptoKey::encrypt (error=0x0, cct=<optimized out>, out=..., in=..., this=0x1b3358f8) at /home/rook/go/src/github.com/rook/rook/ceph/src/auth/Crypto.h:110 #11 CephxSessionHandler::_calc_signature (this=this@entry=0x1b3358e0, m=m@entry=0x1b3e6000, psig=psig@entry=0x7f7131d3f4b8) at /home/rook/go/src/github.com/rook/rook/ceph/src/auth/cephx/CephxSessionHandler.cc:52 #12 0x0000000001f03fd0 in CephxSessionHandler::sign_message (this=0x1b3358e0, m=0x1b3e6000) at /home/rook/go/src/github.com/rook/rook/ceph/src/auth/cephx/CephxSessionHandler.cc:77 #13 0x0000000001d090fb in AsyncConnection::write_message (this=this@entry=0x1b24e000, m=m@entry=0x1b3e6000, bl=..., more=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/msg/async/AsyncConnection.cc:2174 #14 0x0000000001d12fe5 in AsyncConnection::handle_write (this=0x1b24e000) at /home/rook/go/src/github.com/rook/rook/ceph/src/msg/async/AsyncConnection.cc:2412 #15 0x0000000001f0a3b6 in EventCenter::process_events (this=this@entry=0xc3ace80, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000) at /home/rook/go/src/github.com/rook/rook/ceph/src/msg/async/Event.cc:430 #16 0x0000000001f05571 in NetworkStack::<lambda()>::operator()(void) const (__closure=0xc5e26f8) at /home/rook/go/src/github.com/rook/rook/ceph/src/msg/async/Stack.cc:46 #17 0x00000000022661ff in execute_native_thread_routine () #18 0x0000000000c234b5 in start_thread (arg=0x7f7131d43700) at pthread_create.c:333 #19 0x000000000234be3b in clone () (gdb)
this leads to the OSD process crashing and then its restarted. When it attempts to open the rocksdb db it runs into "rocksdb: Corruption: missing start of fragmented record(2)" errors:
Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919280 I | osd10: 2016-12-30 23:32:45.919217 bfb4240 20 bluefs _read left 0x100000 len 0x8000 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919297 I | osd10: 2016-12-30 23:32:45.919231 bfb4240 20 bluefs _read got 32768 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919409 I | osd10: 2016-12-30 23:32:45.919381 bfb4240 10 bluefs _read h 0xcff7c00 0x708000~8000 from file(ino 20 size 0x800000 mtime 2016-12-30 23:32:34.858951 bdev 0 extents [0:0x500000+800000]) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919419 I | osd10: 2016-12-30 23:32:45.919390 bfb4240 20 bluefs _read left 0xf8000 len 0x8000 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919429 I | osd10: 2016-12-30 23:32:45.919398 bfb4240 20 bluefs _read got 32768 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919548 I | osd10: 2016-12-30 23:32:45.919524 bfb4240 10 bluefs _read h 0xcff7c00 0x710000~8000 from file(ino 20 size 0x800000 mtime 2016-12-30 23:32:34.858951 bdev 0 extents [0:0x500000+800000]) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919558 I | osd10: 2016-12-30 23:32:45.919530 bfb4240 20 bluefs _read left 0xf0000 len 0x8000 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919565 I | osd10: 2016-12-30 23:32:45.919538 bfb4240 20 bluefs _read got 32768 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919721 I | osd10: 2016-12-30 23:32:45.919693 bfb4240 10 bluefs _read h 0xcff7c00 0x718000~8000 from file(ino 20 size 0x800000 mtime 2016-12-30 23:32:34.858951 bdev 0 extents [0:0x500000+800000]) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919732 I | osd10: 2016-12-30 23:32:45.919701 bfb4240 20 bluefs _read left 0xe8000 len 0x8000 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919742 I | osd10: 2016-12-30 23:32:45.919709 bfb4240 20 bluefs _read got 32768 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919862 I | osd10: 2016-12-30 23:32:45.919838 bfb4240 10 bluefs _read h 0xcff7c00 0x720000~8000 from file(ino 20 size 0x800000 mtime 2016-12-30 23:32:34.858951 bdev 0 extents [0:0x500000+800000]) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919878 I | osd10: 2016-12-30 23:32:45.919845 bfb4240 20 bluefs _read left 0xe0000 len 0x8000 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.919886 I | osd10: 2016-12-30 23:32:45.919852 bfb4240 20 bluefs _read got 32768 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920031 I | osd10: 2016-12-30 23:32:45.920008 bfb4240 10 bluefs _read h 0xcff7c00 0x728000~8000 from file(ino 20 size 0x800000 mtime 2016-12-30 23:32:34.858951 bdev 0 extents [0:0x500000+800000]) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920041 I | osd10: 2016-12-30 23:32:45.920014 bfb4240 20 bluefs _read left 0xd8000 len 0x8000 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920049 I | osd10: 2016-12-30 23:32:45.920022 bfb4240 20 bluefs _read got 32768 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920157 I | osd10: 2016-12-30 23:32:45.920134 bfb4240 10 bluefs _read h 0xcff7c00 0x730000~8000 from file(ino 20 size 0x800000 mtime 2016-12-30 23:32:34.858951 bdev 0 extents [0:0x500000+800000]) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920166 I | osd10: 2016-12-30 23:32:45.920140 bfb4240 20 bluefs _read left 0xd0000 len 0x8000 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920174 I | osd10: 2016-12-30 23:32:45.920148 bfb4240 20 bluefs _read got 32768 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920184 I | osd10: 2016-12-30 23:32:45.920154 bfb4240 3 rocksdb: db.wal/000012.log: dropping 549 bytes; Corruption: missing start of fragmented record(2) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920425 I | osd10: 2016-12-30 23:32:45.920398 bfb4240 10 bluefs unlock_file 0xc7c9850 on file(ino 2 size 0x0 mtime 2016-12-30 23:30:18.189715 bdev 0 extents []) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920474 I | osd10: 2016-12-30 23:32:45.920442 bfb4240 -1 rocksdb: Corruption: missing start of fragmented record(2) Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920486 I | osd10: 2016-12-30 23:32:45.920450 bfb4240 -1 bluestore(/var/lib/rook/osd10) _open_db erroring opening db: Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920495 I | osd10: 2016-12-30 23:32:45.920453 bfb4240 1 bluefs umount Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920505 I | osd10: 2016-12-30 23:32:45.920455 bfb4240 10 bluefs sync_metadata - no pending log events Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920512 I | osd10: 2016-12-30 23:32:45.920457 bfb4240 10 bluefs _close_writer 0xd0fcd00 type 0 Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.920520 I | osd10: 2016-12-30 23:32:45.920461 bfb4240 20 bluefs _stop_alloc Dec 30 23:32:46 castle6 rkt[1859]: 2016-12-30 23:32:45.921630 I | osd10: 2016-12-30 23:32:45.921572 bfb4240 1 bdev(/var/lib/rook/osd10/block.wal) close
And the OSD never recovers. This is quite reproducible so let me know if you want us to try a patch.
Not sure what lead to the mempool issue in the first place. I can open a different ticket on that if needed.
Updated by Sage Weil over 7 years ago
- Assignee set to Sage Weil
- Priority changed from Normal to Immediate
Updated by Sage Weil over 7 years ago
Can you try to reproduce it with bluefs_preextend_wal_files = false ?
Updated by Bassam Tabbara over 7 years ago
we'll give it a try and get back to you here.
Updated by Bassam Tabbara over 7 years ago
we do not see the rocksdb corruption with bluefs_preextend_wal_files set to false. When the OSD crash (memory issue above), they seem to come back up.
Updated by Sage Weil over 7 years ago
- Subject changed from blustore not crash consistent in 11.1.0 to bluestore: bluefs_preextend_wal_files=true is not crash consistent
- Status changed from New to 12
- Priority changed from Immediate to Urgent
Updated by Sage Weil about 7 years ago
- Related to Bug #18338: bluestore_preextend_wal_files appears to corrupt logs added
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category changed from 107 to Correctness/Safety
- Component(RADOS) BlueStore added
Updated by Greg Farnum over 6 years ago
- Project changed from RADOS to bluestore
- Category deleted (
Correctness/Safety)
Updated by Igor Fedotov almost 5 years ago
- Status changed from 12 to Resolved
This is also duplicated by https://tracker.ceph.com/issues/38559
Marking this as resolved, backporting to be tracked in #38559.