Bug #18698
closedBlueFS FAILED assert(0 == "allocate failed... wtf")
0%
Description
We are seeing this failed assertion and crash using embedded ceph in the rook project: https://github.com/rook/rook.
Tracking issue in rook that has more details: https://github.com/rook/rook/issues/348
This is the ceph tree that we are reproducing with, it has Kraken 11.2.0 plus some fixes: https://github.com/rook/ceph/commits/b7551f136a37e895a4eb7c185a03c58aa260b2a9
Direct link to the failing assertion: https://github.com/rook/ceph/blob/b7551f136a37e895a4eb7c185a03c58aa260b2a9/src/os/bluestore/BlueFS.cc#L1779
Here's the journal output for the first instance of the failed assertion:
Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469075 I | osd3: 2017-01-26 02:23:23.469011 7ff1e4c64700 -1 bluefs _allocate allocate failed on 0x400000 min_alloc_size 0x100000 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469107 I | osd3: 2017-01-26 02:23:23.469019 7ff1e4c64700 0 bitmapalloc:dump instance 0xdf7f130 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469113 I | osd3: 2017-01-26 02:23:23.469022 7ff1e4c64700 0 bitalloc:dump_state zone 0 dump start Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469117 I | osd3: 2017-01-26 02:23:23.469024 7ff1e4c64700 0 bitalloc:0:: 0x87870c3861c3861c Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469121 I | osd3: 2017-01-26 02:23:23.469026 7ff1e4c64700 0 bitalloc:1:: 0x3870e1c3870e1870 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469124 I | osd3: 2017-01-26 02:23:23.469027 7ff1e4c64700 0 bitalloc:2:: 0xe1870e1870c3870e Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469127 I | osd3: 2017-01-26 02:23:23.469029 7ff1e4c64700 0 bitalloc:3:: 0x10c2108610861086 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469131 I | osd3: 2017-01-26 02:23:23.469030 7ff1e4c64700 0 bitalloc:4:: 0x1086108430843084 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469134 I | osd3: 2017-01-26 02:23:23.469031 7ff1e4c64700 0 bitalloc:5:: 0x3084308610861086 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469138 I | osd3: 2017-01-26 02:23:23.469032 7ff1e4c64700 0 bitalloc:6:: 0x1086108610e1c386 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469147 I | osd3: 2017-01-26 02:23:23.469034 7ff1e4c64700 0 bitalloc:7:: 0x1c3870e1870e1c38 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469151 I | osd3: 2017-01-26 02:23:23.469035 7ff1e4c64700 0 bitalloc:8:: 0x70c3870c3870e187 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469155 I | osd3: 2017-01-26 02:23:23.469037 7ff1e4c64700 0 bitalloc:9:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469158 I | osd3: 2017-01-26 02:23:23.469038 7ff1e4c64700 0 bitalloc:10:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469161 I | osd3: 2017-01-26 02:23:23.469039 7ff1e4c64700 0 bitalloc:11:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469164 I | osd3: 2017-01-26 02:23:23.469041 7ff1e4c64700 0 bitalloc:12:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469168 I | osd3: 2017-01-26 02:23:23.469042 7ff1e4c64700 0 bitalloc:13:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469171 I | osd3: 2017-01-26 02:23:23.469044 7ff1e4c64700 0 bitalloc:14:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469174 I | osd3: 2017-01-26 02:23:23.469046 7ff1e4c64700 0 bitalloc:15:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469177 I | osd3: 2017-01-26 02:23:23.469047 7ff1e4c64700 0 bitalloc:dump_state zone 0 dump end Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469188 I | osd3: 2017-01-26 02:23:23.469141 7ff1e4c64700 -1 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_allocate(uint8_t, uint64_t, mempool::bluefs::vector<bluefs_extent_t>*)' thread 7ff1e4c64700 time 2017-01-26 02:23:23.469053 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469193 I | osd3: /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc: 1779: FAILED assert(0 == "allocate failed... wtf") Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469196 I | osd3: Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469200 I | osd3: ceph version 11.0.2-8651-gb7551f1 (b7551f136a37e895a4eb7c185a03c58aa260b2a9) Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469205 I | osd3: 1: [0x102c930] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469209 I | osd3: 2: [0x1a7bcbe] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469213 I | osd3: 3: [0x1a8280b] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469218 I | osd3: 4: [0x1a8494e] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469223 I | osd3: 5: [0x1a70e12] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469226 I | osd3: 6: [0x21141e9] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469229 I | osd3: 7: [0x2115570] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469233 I | osd3: 8: [0x20473e3] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469235 I | osd3: 9: [0x2047faa] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469238 I | osd3: 10: [0x171cee0] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469242 I | osd3: 11: [0x1a30b2c] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469245 I | osd3: 12: [0x1a6aa8d] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469248 I | osd3: 13: [0xc2a015] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469250 I | osd3: 14: [0x235602b] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469254 I | osd3: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Manually recreating the call stack from the above addresses gives us:
> addr2line -e ./rookd 0x102c930 0x1a7bcbe 0x1a8280b 0x1a8494e 0x1a70e12 0x21141e9 0x2115570 0x20473e3 0x2047faa 0x171cee0 0x1a30b2c 0x1a6aa8d 0xc2a015 0x235602b /usr/include/c++/6/sstream:625 (discriminator 3) /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1756 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1336 /usr/x86_64-linux-gnu/include/boost/smart_ptr/intrusive_ptr.hpp:198 /usr/include/c++/6/bits/std_mutex.h:231 /build/external/build/x86_64-linux-gnu/src/rocksdb/./include/rocksdb/status.h:279 /build/external/build/x86_64-linux-gnu/src/rocksdb/./include/rocksdb/status.h:284 /build/external/build/x86_64-linux-gnu/src/rocksdb/./include/rocksdb/status.h:284 /build/external/build/x86_64-linux-gnu/src/rocksdb/db/db_impl.cc:4575 /home/rook/go/src/github.com/rook/rook/ceph/src/kv/RocksDBStore.cc:487 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.cc:6748 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.h:1496 /build/glibc-jxM2Ev/glibc-2.24/nptl/pthread_create.c:333 ??:?
After the initial crash, restarting the OSD immediately runs into the assertion again on the OSD::init path, with a callstack of:
#0 raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x0000000001866d6c in reraise_fatal (signum=6) at /home/rook/go/src/github.com/rook/rook/ceph/src/global/signal_handler.cc:72 #2 handle_fatal_signal (signum=6) at /home/rook/go/src/github.com/rook/rook/ceph/src/global/signal_handler.cc:134 #3 <signal handler called> #4 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:58 #5 0x00000000022d5b8a in abort () #6 0x000000000102cb1b in ceph::__ceph_assert_fail (assertion=assertion@entry=0x27d25bb "0 == \"allocate failed... wtf\"", file=file@entry=0x27d1c10 "/home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc", line=line@entry=1779, func=func@entry=0x27d30a0 <BlueFS::_allocate(unsigned char, unsigned long, std::vector<bluefs_extent_t, mempool::pool_allocator<(mempool::pool_index_t)8, bluefs_extent_t> >*)::__PRETTY_FUNCTION__> "int BlueFS::_allocate(uint8_t, uint64_t, mempool::bluefs::vector<bluefs_extent_t>*)") at /home/rook/go/src/github.com/rook/rook/ceph/src/common/assert.cc:78 #7 0x0000000001a7bcbe in BlueFS::_allocate (this=this@entry=0xcb76000, id=0 '\000', len=<optimized out>, ev=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1779 #8 0x0000000001a8280b in BlueFS::_flush_and_sync_log (this=this@entry=0xcb76000, l=..., want_seq=want_seq@entry=249204, jump_to=jump_to@entry=0) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1335 #9 0x0000000001a8494e in BlueFS::_fsync (this=this@entry=0xcb76000, h=h@entry=0xc710b40, l=...) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1715 #10 0x0000000001a70e12 in BlueFS::fsync (h=0xc710b40, this=0xcb76000) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.h:403 #11 BlueRocksWritableFile::Sync (this=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueRocksEnv.cc:220 #12 0x00000000021141e9 in rocksdb::WritableFileWriter::SyncInternal (this=this@entry=0xc70b410, use_fsync=use_fsync@entry=false) at util/file_reader_writer.cc:242 #13 0x0000000002115570 in rocksdb::WritableFileWriter::Sync (this=0xc70b410, use_fsync=<optimized out>) at util/file_reader_writer.cc:213 #14 0x000000000214fcd3 in rocksdb::BuildTable (dbname=..., env=0xc9a2de0, ioptions=..., mutable_cf_options=..., env_options=..., table_cache=0xca5b580, iter=0x7fff3262d568, range_del_iter=..., meta=0x7fff3262d240, internal_comparator=..., int_tbl_prop_collector_factories=0xca645f0, column_family_id=0, column_family_name=..., snapshots=..., earliest_write_conflict_snapshot=72057594037927935, compression=<incomplete type>, compression_opts=..., paranoid_file_checks=false, internal_stats=0xc8d0f00, reason=rocksdb::TableFileCreationReason::kRecovery, event_logger=0xca60b18, job_id=1, io_priority=rocksdb::Env::IO_HIGH, table_properties=0x0, level=-1) at db/builder.cc:176 #15 0x000000000203fa81 in rocksdb::DBImpl::WriteLevel0TableForRecovery (this=this@entry=0xca60000, job_id=1, cfd=cfd@entry=0xca64580, mem=0xca62400, edit=edit@entry=0xcf722b0) at db/db_impl.cc:1779 #16 0x000000000204266f in rocksdb::DBImpl::RecoverLogFiles (this=this@entry=0xca60000, log_numbers=..., next_sequence=next_sequence@entry=0x7fff3262e348, read_only=read_only@entry=false) at db/db_impl.cc:1697 #17 0x0000000002043474 in rocksdb::DBImpl::Recover (this=this@entry=0xca60000, column_families=..., read_only=read_only@entry=false, error_if_log_file_exist=error_if_log_file_exist@entry=false, error_if_data_exists_in_logs=error_if_data_exists_in_logs@entry=false) at db/db_impl.cc:1342 #18 0x000000000204408a in rocksdb::DB::Open (db_options=..., dbname=..., column_families=..., handles=handles@entry=0x7fff3262e980, dbptr=dbptr@entry=0xc704dd8) at db/db_impl.cc:5938 #19 0x00000000020452f5 in rocksdb::DB::Open (options=..., dbname=..., dbptr=dbptr@entry=0xc704dd8) at db/db_impl.cc:5883 #20 0x0000000001720ef2 in RocksDBStore::do_open (this=0xc704d80, out=..., create_if_missing=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/kv/RocksDBStore.cc:297 #21 0x0000000001a26463 in BlueStore::_open_db (this=this@entry=0xc8ec000, create=create@entry=false) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.cc:3475 #22 0x0000000001a58a2c in BlueStore::mount (this=0xc8ec000) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.cc:4034 #23 0x00000000016dd127 in OSD::init (this=0xc8e0000) at /home/rook/go/src/github.com/rook/rook/ceph/src/osd/OSD.cc:2042 #24 0x00000000014c1ec7 in cephd_osd (argc=<optimized out>, argv=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/ceph_osd.cc:611 #25 0x0000000000dfce65 in cephd_run_osd (argc=<optimized out>, argv=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/libcephd/libcephd.cc:237 #26 0x0000000000c18a6a in _cgo_49d41b80151a_Cfunc_cephd_run_osd (v=0xc4201dbb28) at /home/rook/go/src/github.com/rook/rook/pkg/cephmgr/cephd/cephd.go:204 #27 0x00000000004f3fe0 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:590 #28 0x0000000000498d5d in runtime.cgocall (fn=0x7fff32632860, arg=0x3271580 <runtime.g0>, ~r2=845359184) at /usr/local/go/src/runtime/cgocall.go:115 #29 0x0000000003271500 in runtime.work () #30 0x00007fff32632860 in ?? () #31 0x0000000003271580 in ?? () #32 0x00007fff32632850 in ?? () #33 0x00000000004c6bd4 in runtime.mstart () at /usr/local/go/src/runtime/proc.go:1096 #34 0x00000000004f2135 in runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:156 #35 0x000000000000000b in ?? () #36 0x00007fff326329b8 in ?? () #37 0x000000000000000b in ?? () #38 0x00007fff326329b8 in ?? () #39 0x0000000000400598 in __rela_iplt_start () #40 0x00000000022c4b06 in generic_start_main () #41 0x00000000022c4d8e in __libc_start_main () #42 0x000000000049351a in _start ()
On the repro machine, the OSDs are configured to use bluestore, with each Block partition on its own HDD and all WAL/DB partitions on a single NVMe device, like so:
$ lsblk -o name,type,size,partlabel,partuuid NAME TYPE SIZE PARTLABEL PARTUUID sde disk 3.7T `-sde1 part 3.7T ROOK-OSD7-BLOCK 916755c5-5c12-45ca-a3b0-7dee933411d2 sdf disk 3.7T `-sdf1 part 3.7T ROOK-OSD2-BLOCK 9228fab8-2165-4eaa-88a3-bc3802c6fdf5 sdg disk 3.7T `-sdg1 part 3.7T ROOK-OSD3-BLOCK b8f5f7aa-e9ec-41ac-81d8-2659e2f67512 sdh disk 3.7T `-sdh1 part 3.7T ROOK-OSD8-BLOCK 1a819a7c-8678-4fb9-b27a-dacd18e451a7 sdi disk 3.7T `-sdi1 part 3.7T ROOK-OSD9-BLOCK 849e9642-1f4d-4352-9f9c-da565a7acc6e sdj disk 3.7T `-sdj1 part 3.7T ROOK-OSD4-BLOCK c11246b2-9193-4314-bc60-5ba94c1f76cc sdk disk 3.7T `-sdk1 part 3.7T ROOK-OSD5-BLOCK 0362827a-2dc9-4a2f-94a8-e6aedf356e82 sdl disk 3.7T `-sdl1 part 3.7T ROOK-OSD6-BLOCK fede317b-6f42-4dc4-a50f-cfc8c4b7bcea sdm disk 3.7T `-sdm1 part 3.7T ROOK-OSD0-BLOCK cf6594a1-7e43-463c-8241-efa99e14c0e5 sdn disk 3.7T `-sdn1 part 3.7T ROOK-OSD1-BLOCK ed95ec5c-3ac2-43c1-b0b1-beb839eb376f sdo disk 3.7T `-sdo1 part 3.7T ROOK-OSD10-BLOCK 7e29beb2-71e4-4af5-86f0-0838431b9285 sdp disk 3.7T `-sdp1 part 3.7T ROOK-OSD11-BLOCK 1b151575-1a52-46f8-a424-333f8da61f0a loop0 loop 233.4M nvme0n1 disk 372.6G |-nvme0n1p1 part 576M ROOK-OSD0-WAL ac56ce5f-d880-4334-be35-61938d29a5d3 |-nvme0n1p2 part 30G ROOK-OSD0-DB f8425d9e-7e79-4272-9777-8e343053f694 |-nvme0n1p3 part 576M ROOK-OSD1-WAL 5ed92eac-fae0-42d2-9bf4-fab2a0592eb6 |-nvme0n1p4 part 30G ROOK-OSD1-DB 95c58666-bacd-4dd8-a15b-897a217018b9 |-nvme0n1p5 part 576M ROOK-OSD2-WAL 5a9a3c43-865f-44bf-9a7c-4dd30fbb85ac |-nvme0n1p6 part 30G ROOK-OSD2-DB d7d09b47-6051-4b6a-8dc4-20985e19b61c |-nvme0n1p7 part 576M ROOK-OSD3-WAL 9d81f282-4b5b-418f-9123-9503bfd0912c |-nvme0n1p8 part 30G ROOK-OSD3-DB 9e8eb447-883e-4dc8-b50d-d593b60325df |-nvme0n1p9 part 576M ROOK-OSD4-WAL 954896bb-4fe5-46b9-a361-54d165d57a1c |-nvme0n1p10 part 30G ROOK-OSD4-DB 7d7aff85-0145-414c-81a4-0aabebcad406 |-nvme0n1p11 part 576M ROOK-OSD5-WAL febd37af-6771-4bae-a68e-dddfd0d58edd |-nvme0n1p12 part 30G ROOK-OSD5-DB 49f749a5-590c-4b75-8ebd-796ed2818f82 |-nvme0n1p13 part 576M ROOK-OSD6-WAL cb40e1b0-efee-4b65-acc2-1141c217d3ee |-nvme0n1p14 part 30G ROOK-OSD6-DB f8226b6c-9ba7-4d39-8748-f61ca50fd2f3 |-nvme0n1p15 part 576M ROOK-OSD7-WAL cc365c84-5842-4438-a39e-03c96af294b5 |-nvme0n1p16 part 30G ROOK-OSD7-DB 8a2aa293-04d5-4516-838e-8200bc314aa5 |-nvme0n1p17 part 576M ROOK-OSD8-WAL 2ea2676b-58ed-4907-bc9e-3cdf66fdb26a |-nvme0n1p18 part 30G ROOK-OSD8-DB 39997dd7-4e41-4a1d-aea7-620d3ead6e38 |-nvme0n1p19 part 576M ROOK-OSD9-WAL aad24cf5-652f-4046-8159-66c018cc3865 |-nvme0n1p20 part 30G ROOK-OSD9-DB dc8f3373-d9f0-4395-bd35-9078e02b14f9 |-nvme0n1p21 part 576M ROOK-OSD10-WAL 50ebeb5a-83be-4795-9c5f-1ec53d96039b |-nvme0n1p22 part 30G ROOK-OSD10-DB 09a57ea0-5b26-4acf-a698-7ab9d1787a1d |-nvme0n1p23 part 576M ROOK-OSD11-WAL c112ba32-bda7-433b-bda6-668ca2bc7155 `-nvme0n1p24 part 30G ROOK-OSD11-DB c15304db-a39c-4adc-9b40-bdad6d144a9c
An example ceph config file for one of the OSDs:
# cat rookcluster.config [global] enable experimental unrecoverable data corrupting features = bluestore rocksdb fsid = be1427c9-a0eb-491e-9288-daaf4341e5eb run dir = /var/lib/rook/osd0 mon initial members = mon0 log file = /dev/stdout mon cluster log file = /dev/stdout mon keyvaluedb = rocksdb debug default = 0 debug rados = 0 debug mon = 0 debug osd = 0 debug bluestore = 5/5 debug filestore = 0 debug journal = 0 debug leveldb = 0 filestore_omap_backend = rocksdb osd pg bits = 11 osd pgp bits = 11 osd pool default size = 1 osd pool default min size = 1 osd pool default pg num = 100 osd pool default pgp num = 100 osd objectstore = bluestore rbd_default_features = 3 crushtool = debug bluefs = 5/5 [osd.0] keyring = /var/lib/rook/osd0/keyring bluestore block wal path = /dev/disk/by-partuuid/ac56ce5f-d880-4334-be35-61938d29a5d3 bluestore block db path = /dev/disk/by-partuuid/f8425d9e-7e79-4272-9777-8e343053f694 bluestore block path = /dev/disk/by-partuuid/cf6594a1-7e43-463c-8241-efa99e14c0e5 [mon.mon0] name = mon0 mon addr = 10.20.235.7:6790
Full ceph log¶
The full journal output (ceph log) can be found at: https://gist.github.com/jbw976/3bf47564f1d1ef71e8c41fa7121b793e
Files