Bug #18698
closedBlueFS FAILED assert(0 == "allocate failed... wtf")
0%
Description
We are seeing this failed assertion and crash using embedded ceph in the rook project: https://github.com/rook/rook.
Tracking issue in rook that has more details: https://github.com/rook/rook/issues/348
This is the ceph tree that we are reproducing with, it has Kraken 11.2.0 plus some fixes: https://github.com/rook/ceph/commits/b7551f136a37e895a4eb7c185a03c58aa260b2a9
Direct link to the failing assertion: https://github.com/rook/ceph/blob/b7551f136a37e895a4eb7c185a03c58aa260b2a9/src/os/bluestore/BlueFS.cc#L1779
Here's the journal output for the first instance of the failed assertion:
Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469075 I | osd3: 2017-01-26 02:23:23.469011 7ff1e4c64700 -1 bluefs _allocate allocate failed on 0x400000 min_alloc_size 0x100000 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469107 I | osd3: 2017-01-26 02:23:23.469019 7ff1e4c64700 0 bitmapalloc:dump instance 0xdf7f130 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469113 I | osd3: 2017-01-26 02:23:23.469022 7ff1e4c64700 0 bitalloc:dump_state zone 0 dump start Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469117 I | osd3: 2017-01-26 02:23:23.469024 7ff1e4c64700 0 bitalloc:0:: 0x87870c3861c3861c Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469121 I | osd3: 2017-01-26 02:23:23.469026 7ff1e4c64700 0 bitalloc:1:: 0x3870e1c3870e1870 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469124 I | osd3: 2017-01-26 02:23:23.469027 7ff1e4c64700 0 bitalloc:2:: 0xe1870e1870c3870e Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469127 I | osd3: 2017-01-26 02:23:23.469029 7ff1e4c64700 0 bitalloc:3:: 0x10c2108610861086 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469131 I | osd3: 2017-01-26 02:23:23.469030 7ff1e4c64700 0 bitalloc:4:: 0x1086108430843084 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469134 I | osd3: 2017-01-26 02:23:23.469031 7ff1e4c64700 0 bitalloc:5:: 0x3084308610861086 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469138 I | osd3: 2017-01-26 02:23:23.469032 7ff1e4c64700 0 bitalloc:6:: 0x1086108610e1c386 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469147 I | osd3: 2017-01-26 02:23:23.469034 7ff1e4c64700 0 bitalloc:7:: 0x1c3870e1870e1c38 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469151 I | osd3: 2017-01-26 02:23:23.469035 7ff1e4c64700 0 bitalloc:8:: 0x70c3870c3870e187 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469155 I | osd3: 2017-01-26 02:23:23.469037 7ff1e4c64700 0 bitalloc:9:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469158 I | osd3: 2017-01-26 02:23:23.469038 7ff1e4c64700 0 bitalloc:10:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469161 I | osd3: 2017-01-26 02:23:23.469039 7ff1e4c64700 0 bitalloc:11:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469164 I | osd3: 2017-01-26 02:23:23.469041 7ff1e4c64700 0 bitalloc:12:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469168 I | osd3: 2017-01-26 02:23:23.469042 7ff1e4c64700 0 bitalloc:13:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469171 I | osd3: 2017-01-26 02:23:23.469044 7ff1e4c64700 0 bitalloc:14:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469174 I | osd3: 2017-01-26 02:23:23.469046 7ff1e4c64700 0 bitalloc:15:: 0xffffffffffffffff Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469177 I | osd3: 2017-01-26 02:23:23.469047 7ff1e4c64700 0 bitalloc:dump_state zone 0 dump end Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469188 I | osd3: 2017-01-26 02:23:23.469141 7ff1e4c64700 -1 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_allocate(uint8_t, uint64_t, mempool::bluefs::vector<bluefs_extent_t>*)' thread 7ff1e4c64700 time 2017-01-26 02:23:23.469053 Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469193 I | osd3: /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc: 1779: FAILED assert(0 == "allocate failed... wtf") Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469196 I | osd3: Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469200 I | osd3: ceph version 11.0.2-8651-gb7551f1 (b7551f136a37e895a4eb7c185a03c58aa260b2a9) Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469205 I | osd3: 1: [0x102c930] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469209 I | osd3: 2: [0x1a7bcbe] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469213 I | osd3: 3: [0x1a8280b] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469218 I | osd3: 4: [0x1a8494e] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469223 I | osd3: 5: [0x1a70e12] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469226 I | osd3: 6: [0x21141e9] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469229 I | osd3: 7: [0x2115570] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469233 I | osd3: 8: [0x20473e3] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469235 I | osd3: 9: [0x2047faa] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469238 I | osd3: 10: [0x171cee0] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469242 I | osd3: 11: [0x1a30b2c] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469245 I | osd3: 12: [0x1a6aa8d] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469248 I | osd3: 13: [0xc2a015] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469250 I | osd3: 14: [0x235602b] Jan 25 18:23:23 swag rkt[1698]: 2017-01-26 02:23:23.469254 I | osd3: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Manually recreating the call stack from the above addresses gives us:
> addr2line -e ./rookd 0x102c930 0x1a7bcbe 0x1a8280b 0x1a8494e 0x1a70e12 0x21141e9 0x2115570 0x20473e3 0x2047faa 0x171cee0 0x1a30b2c 0x1a6aa8d 0xc2a015 0x235602b /usr/include/c++/6/sstream:625 (discriminator 3) /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1756 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1336 /usr/x86_64-linux-gnu/include/boost/smart_ptr/intrusive_ptr.hpp:198 /usr/include/c++/6/bits/std_mutex.h:231 /build/external/build/x86_64-linux-gnu/src/rocksdb/./include/rocksdb/status.h:279 /build/external/build/x86_64-linux-gnu/src/rocksdb/./include/rocksdb/status.h:284 /build/external/build/x86_64-linux-gnu/src/rocksdb/./include/rocksdb/status.h:284 /build/external/build/x86_64-linux-gnu/src/rocksdb/db/db_impl.cc:4575 /home/rook/go/src/github.com/rook/rook/ceph/src/kv/RocksDBStore.cc:487 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.cc:6748 /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.h:1496 /build/glibc-jxM2Ev/glibc-2.24/nptl/pthread_create.c:333 ??:?
After the initial crash, restarting the OSD immediately runs into the assertion again on the OSD::init path, with a callstack of:
#0 raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x0000000001866d6c in reraise_fatal (signum=6) at /home/rook/go/src/github.com/rook/rook/ceph/src/global/signal_handler.cc:72 #2 handle_fatal_signal (signum=6) at /home/rook/go/src/github.com/rook/rook/ceph/src/global/signal_handler.cc:134 #3 <signal handler called> #4 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:58 #5 0x00000000022d5b8a in abort () #6 0x000000000102cb1b in ceph::__ceph_assert_fail (assertion=assertion@entry=0x27d25bb "0 == \"allocate failed... wtf\"", file=file@entry=0x27d1c10 "/home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc", line=line@entry=1779, func=func@entry=0x27d30a0 <BlueFS::_allocate(unsigned char, unsigned long, std::vector<bluefs_extent_t, mempool::pool_allocator<(mempool::pool_index_t)8, bluefs_extent_t> >*)::__PRETTY_FUNCTION__> "int BlueFS::_allocate(uint8_t, uint64_t, mempool::bluefs::vector<bluefs_extent_t>*)") at /home/rook/go/src/github.com/rook/rook/ceph/src/common/assert.cc:78 #7 0x0000000001a7bcbe in BlueFS::_allocate (this=this@entry=0xcb76000, id=0 '\000', len=<optimized out>, ev=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1779 #8 0x0000000001a8280b in BlueFS::_flush_and_sync_log (this=this@entry=0xcb76000, l=..., want_seq=want_seq@entry=249204, jump_to=jump_to@entry=0) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1335 #9 0x0000000001a8494e in BlueFS::_fsync (this=this@entry=0xcb76000, h=h@entry=0xc710b40, l=...) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.cc:1715 #10 0x0000000001a70e12 in BlueFS::fsync (h=0xc710b40, this=0xcb76000) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueFS.h:403 #11 BlueRocksWritableFile::Sync (this=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueRocksEnv.cc:220 #12 0x00000000021141e9 in rocksdb::WritableFileWriter::SyncInternal (this=this@entry=0xc70b410, use_fsync=use_fsync@entry=false) at util/file_reader_writer.cc:242 #13 0x0000000002115570 in rocksdb::WritableFileWriter::Sync (this=0xc70b410, use_fsync=<optimized out>) at util/file_reader_writer.cc:213 #14 0x000000000214fcd3 in rocksdb::BuildTable (dbname=..., env=0xc9a2de0, ioptions=..., mutable_cf_options=..., env_options=..., table_cache=0xca5b580, iter=0x7fff3262d568, range_del_iter=..., meta=0x7fff3262d240, internal_comparator=..., int_tbl_prop_collector_factories=0xca645f0, column_family_id=0, column_family_name=..., snapshots=..., earliest_write_conflict_snapshot=72057594037927935, compression=<incomplete type>, compression_opts=..., paranoid_file_checks=false, internal_stats=0xc8d0f00, reason=rocksdb::TableFileCreationReason::kRecovery, event_logger=0xca60b18, job_id=1, io_priority=rocksdb::Env::IO_HIGH, table_properties=0x0, level=-1) at db/builder.cc:176 #15 0x000000000203fa81 in rocksdb::DBImpl::WriteLevel0TableForRecovery (this=this@entry=0xca60000, job_id=1, cfd=cfd@entry=0xca64580, mem=0xca62400, edit=edit@entry=0xcf722b0) at db/db_impl.cc:1779 #16 0x000000000204266f in rocksdb::DBImpl::RecoverLogFiles (this=this@entry=0xca60000, log_numbers=..., next_sequence=next_sequence@entry=0x7fff3262e348, read_only=read_only@entry=false) at db/db_impl.cc:1697 #17 0x0000000002043474 in rocksdb::DBImpl::Recover (this=this@entry=0xca60000, column_families=..., read_only=read_only@entry=false, error_if_log_file_exist=error_if_log_file_exist@entry=false, error_if_data_exists_in_logs=error_if_data_exists_in_logs@entry=false) at db/db_impl.cc:1342 #18 0x000000000204408a in rocksdb::DB::Open (db_options=..., dbname=..., column_families=..., handles=handles@entry=0x7fff3262e980, dbptr=dbptr@entry=0xc704dd8) at db/db_impl.cc:5938 #19 0x00000000020452f5 in rocksdb::DB::Open (options=..., dbname=..., dbptr=dbptr@entry=0xc704dd8) at db/db_impl.cc:5883 #20 0x0000000001720ef2 in RocksDBStore::do_open (this=0xc704d80, out=..., create_if_missing=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/kv/RocksDBStore.cc:297 #21 0x0000000001a26463 in BlueStore::_open_db (this=this@entry=0xc8ec000, create=create@entry=false) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.cc:3475 #22 0x0000000001a58a2c in BlueStore::mount (this=0xc8ec000) at /home/rook/go/src/github.com/rook/rook/ceph/src/os/bluestore/BlueStore.cc:4034 #23 0x00000000016dd127 in OSD::init (this=0xc8e0000) at /home/rook/go/src/github.com/rook/rook/ceph/src/osd/OSD.cc:2042 #24 0x00000000014c1ec7 in cephd_osd (argc=<optimized out>, argv=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/ceph_osd.cc:611 #25 0x0000000000dfce65 in cephd_run_osd (argc=<optimized out>, argv=<optimized out>) at /home/rook/go/src/github.com/rook/rook/ceph/src/libcephd/libcephd.cc:237 #26 0x0000000000c18a6a in _cgo_49d41b80151a_Cfunc_cephd_run_osd (v=0xc4201dbb28) at /home/rook/go/src/github.com/rook/rook/pkg/cephmgr/cephd/cephd.go:204 #27 0x00000000004f3fe0 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:590 #28 0x0000000000498d5d in runtime.cgocall (fn=0x7fff32632860, arg=0x3271580 <runtime.g0>, ~r2=845359184) at /usr/local/go/src/runtime/cgocall.go:115 #29 0x0000000003271500 in runtime.work () #30 0x00007fff32632860 in ?? () #31 0x0000000003271580 in ?? () #32 0x00007fff32632850 in ?? () #33 0x00000000004c6bd4 in runtime.mstart () at /usr/local/go/src/runtime/proc.go:1096 #34 0x00000000004f2135 in runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:156 #35 0x000000000000000b in ?? () #36 0x00007fff326329b8 in ?? () #37 0x000000000000000b in ?? () #38 0x00007fff326329b8 in ?? () #39 0x0000000000400598 in __rela_iplt_start () #40 0x00000000022c4b06 in generic_start_main () #41 0x00000000022c4d8e in __libc_start_main () #42 0x000000000049351a in _start ()
On the repro machine, the OSDs are configured to use bluestore, with each Block partition on its own HDD and all WAL/DB partitions on a single NVMe device, like so:
$ lsblk -o name,type,size,partlabel,partuuid NAME TYPE SIZE PARTLABEL PARTUUID sde disk 3.7T `-sde1 part 3.7T ROOK-OSD7-BLOCK 916755c5-5c12-45ca-a3b0-7dee933411d2 sdf disk 3.7T `-sdf1 part 3.7T ROOK-OSD2-BLOCK 9228fab8-2165-4eaa-88a3-bc3802c6fdf5 sdg disk 3.7T `-sdg1 part 3.7T ROOK-OSD3-BLOCK b8f5f7aa-e9ec-41ac-81d8-2659e2f67512 sdh disk 3.7T `-sdh1 part 3.7T ROOK-OSD8-BLOCK 1a819a7c-8678-4fb9-b27a-dacd18e451a7 sdi disk 3.7T `-sdi1 part 3.7T ROOK-OSD9-BLOCK 849e9642-1f4d-4352-9f9c-da565a7acc6e sdj disk 3.7T `-sdj1 part 3.7T ROOK-OSD4-BLOCK c11246b2-9193-4314-bc60-5ba94c1f76cc sdk disk 3.7T `-sdk1 part 3.7T ROOK-OSD5-BLOCK 0362827a-2dc9-4a2f-94a8-e6aedf356e82 sdl disk 3.7T `-sdl1 part 3.7T ROOK-OSD6-BLOCK fede317b-6f42-4dc4-a50f-cfc8c4b7bcea sdm disk 3.7T `-sdm1 part 3.7T ROOK-OSD0-BLOCK cf6594a1-7e43-463c-8241-efa99e14c0e5 sdn disk 3.7T `-sdn1 part 3.7T ROOK-OSD1-BLOCK ed95ec5c-3ac2-43c1-b0b1-beb839eb376f sdo disk 3.7T `-sdo1 part 3.7T ROOK-OSD10-BLOCK 7e29beb2-71e4-4af5-86f0-0838431b9285 sdp disk 3.7T `-sdp1 part 3.7T ROOK-OSD11-BLOCK 1b151575-1a52-46f8-a424-333f8da61f0a loop0 loop 233.4M nvme0n1 disk 372.6G |-nvme0n1p1 part 576M ROOK-OSD0-WAL ac56ce5f-d880-4334-be35-61938d29a5d3 |-nvme0n1p2 part 30G ROOK-OSD0-DB f8425d9e-7e79-4272-9777-8e343053f694 |-nvme0n1p3 part 576M ROOK-OSD1-WAL 5ed92eac-fae0-42d2-9bf4-fab2a0592eb6 |-nvme0n1p4 part 30G ROOK-OSD1-DB 95c58666-bacd-4dd8-a15b-897a217018b9 |-nvme0n1p5 part 576M ROOK-OSD2-WAL 5a9a3c43-865f-44bf-9a7c-4dd30fbb85ac |-nvme0n1p6 part 30G ROOK-OSD2-DB d7d09b47-6051-4b6a-8dc4-20985e19b61c |-nvme0n1p7 part 576M ROOK-OSD3-WAL 9d81f282-4b5b-418f-9123-9503bfd0912c |-nvme0n1p8 part 30G ROOK-OSD3-DB 9e8eb447-883e-4dc8-b50d-d593b60325df |-nvme0n1p9 part 576M ROOK-OSD4-WAL 954896bb-4fe5-46b9-a361-54d165d57a1c |-nvme0n1p10 part 30G ROOK-OSD4-DB 7d7aff85-0145-414c-81a4-0aabebcad406 |-nvme0n1p11 part 576M ROOK-OSD5-WAL febd37af-6771-4bae-a68e-dddfd0d58edd |-nvme0n1p12 part 30G ROOK-OSD5-DB 49f749a5-590c-4b75-8ebd-796ed2818f82 |-nvme0n1p13 part 576M ROOK-OSD6-WAL cb40e1b0-efee-4b65-acc2-1141c217d3ee |-nvme0n1p14 part 30G ROOK-OSD6-DB f8226b6c-9ba7-4d39-8748-f61ca50fd2f3 |-nvme0n1p15 part 576M ROOK-OSD7-WAL cc365c84-5842-4438-a39e-03c96af294b5 |-nvme0n1p16 part 30G ROOK-OSD7-DB 8a2aa293-04d5-4516-838e-8200bc314aa5 |-nvme0n1p17 part 576M ROOK-OSD8-WAL 2ea2676b-58ed-4907-bc9e-3cdf66fdb26a |-nvme0n1p18 part 30G ROOK-OSD8-DB 39997dd7-4e41-4a1d-aea7-620d3ead6e38 |-nvme0n1p19 part 576M ROOK-OSD9-WAL aad24cf5-652f-4046-8159-66c018cc3865 |-nvme0n1p20 part 30G ROOK-OSD9-DB dc8f3373-d9f0-4395-bd35-9078e02b14f9 |-nvme0n1p21 part 576M ROOK-OSD10-WAL 50ebeb5a-83be-4795-9c5f-1ec53d96039b |-nvme0n1p22 part 30G ROOK-OSD10-DB 09a57ea0-5b26-4acf-a698-7ab9d1787a1d |-nvme0n1p23 part 576M ROOK-OSD11-WAL c112ba32-bda7-433b-bda6-668ca2bc7155 `-nvme0n1p24 part 30G ROOK-OSD11-DB c15304db-a39c-4adc-9b40-bdad6d144a9c
An example ceph config file for one of the OSDs:
# cat rookcluster.config [global] enable experimental unrecoverable data corrupting features = bluestore rocksdb fsid = be1427c9-a0eb-491e-9288-daaf4341e5eb run dir = /var/lib/rook/osd0 mon initial members = mon0 log file = /dev/stdout mon cluster log file = /dev/stdout mon keyvaluedb = rocksdb debug default = 0 debug rados = 0 debug mon = 0 debug osd = 0 debug bluestore = 5/5 debug filestore = 0 debug journal = 0 debug leveldb = 0 filestore_omap_backend = rocksdb osd pg bits = 11 osd pgp bits = 11 osd pool default size = 1 osd pool default min size = 1 osd pool default pg num = 100 osd pool default pgp num = 100 osd objectstore = bluestore rbd_default_features = 3 crushtool = debug bluefs = 5/5 [osd.0] keyring = /var/lib/rook/osd0/keyring bluestore block wal path = /dev/disk/by-partuuid/ac56ce5f-d880-4334-be35-61938d29a5d3 bluestore block db path = /dev/disk/by-partuuid/f8425d9e-7e79-4272-9777-8e343053f694 bluestore block path = /dev/disk/by-partuuid/cf6594a1-7e43-463c-8241-efa99e14c0e5 [mon.mon0] name = mon0 mon addr = 10.20.235.7:6790
Full ceph log¶
The full journal output (ceph log) can be found at: https://gist.github.com/jbw976/3bf47564f1d1ef71e8c41fa7121b793e
Files
Updated by Jared Watts about 7 years ago
I also have core-files and full symbols but those are hundreds of MB's. I'd be happy to share those as needed.
Updated by Jared Watts about 7 years ago
Running the repro scenario with `bluefs_allocator = stupid` did not reproduce the issue after running all night. It does seem to be a bug with the bluefs bitmap allocator.
Updated by François Blondel about 7 years ago
Hi !
I also got that issue. I also added "bluefs_allocator = stupid" in my /etc/ceph/ceph.conf. Worked.
Issue happened to me when I started using a separated block.wal block device.
Question: Is this the "normal" behaviour ?
Please ask if I somehow may run tests to help debugging.
Updated by Sage Weil almost 7 years ago
- Status changed from New to Can't reproduce
I haven't seen this in any our qa... is it still happening for you? Which versions?
Updated by François Blondel almost 7 years ago
Hi,
i tested with kraken v11.2.0 again : deactivated the bluefs_allocator = stupid and restarted all my OSDs, issue was still there for some OSDs.
Upgraded from kraken to luminous 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e). Same issue happened during upgrade (when restarting the OSDs after the apt upgrade), had to put again the bluefs_allocator = stupid into the ceph.conf for some OSDs to make the OSDs start again. Worked.
I then removed again the bluefs_allocator = stupid from the config and restarted the affected osds anew.
It is for now running fine with Luminous 12.0.2.
Updated by François Blondel almost 7 years ago
Hi,
in fact, we are still (after some days) having this issue after upgrading to Luminous 12.0.2.
Same errors in the log files. Fixed the same way using bluefs_allocator = stupid
OSDs were created with Kraken 11.2.0 using:
ceph-disk prepare --zap-disk --bluestore --cluster ceph --cluster-uuid 4b7024fa-485d-4aaf-b3e9-4f955335ca98 --block.wal /dev/sda /dev/sdd
Let me know what I should do/test!
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category deleted (
107)
Updated by Aleksei Gutikov over 6 years ago
- File 18698.osd.log 18698.osd.log added
Reproduced with 12.2.2 during deep scrubbing after 7 days of workload.
36 ssds, 200G each contain 400G of rbds and 500G of small files each 5kb.
Almost all ssds used above 50%.
After 7 days of testing started deep scrubbing.
As I understand that leads to bluestore rebalancing.
And then crash occurs:
bluestore(/var/lib/ceph/osd/ceph-45) _balance_bluefs_freespace allocate failed on 0x76c00000 min_alloc_size 0x1000 ... ... very very long stupidalloc dump ... 2: (()+0x11390) [0x7fb052f47390] 3: (gsignal()+0x38) [0x7fb051ee2428] 4: (abort()+0x16a) [0x7fb051ee402a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5641f18d9a2e] 6: (BlueStore::_balance_bluefs_freespace(std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t> >*)+0x1b21) [0x5641f176b5c1] 7: (BlueStore::_kv_sync_thread()+0x1ac0) [0x5641f176e040] 8: (BlueStore::KVSyncThread::entry()+0xd) [0x5641f17b1f8d] 9: (()+0x76ba) [0x7fb052f3d6ba] 10: (clone()+0x6d) [0x7fb051fb43dd]
As I understand _balance_bluefs_freespace try to allocate space (~2G) for compacted metadata, but due to fragmentation it can't,
and can't handle this.
Will check if it can be eliminated by setting bluestore_max_alloc_size...
Updated by Igor Fedotov over 6 years ago
There is a PR#18494 addressing an issue with the symptoms similar to ones reported in comment #9 (assert during _balance_bluefs_freespace). Issue analysis can be found there as well. Unfortunately still pending since that's not an ideal solution.
Not sure original issue has the same root cause though.