Bug #56467
nautilus: osd crashs with _do_alloc_write failed with (28) No space left on device
% Done:
0%
Source:
Community (dev)
Tags:
v14.2.22; bluestore; space
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
osd crashs and can't be pulled up when bluestore runs out of space in N release. Here's the stack trace in the log:
2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) _do_alloc_write failed to allocate 0x3200000 allocated 0x 1fd0000 min_alloc_size 0x10000 available 0x 0 2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) _do_write _do_alloc_write failed with (28) No space left on device 2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 3, counting from 0) 2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) ENOSPC from bluestore, misconfigured cluster 2022-07-04 03:17:22.525 7f38407c9700 0 _dump_transaction transaction dump: { "ops": [ { "op_num": 0, "op_name": "touch", "collection": "1.0_head", "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#" }, { "op_num": 1, "op_name": "setattrs", "collection": "1.0_head", "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#", "attr_lens": { "_": 312, "snapset": 35 } }, { "op_num": 2, "op_name": "op_setallochint", "collection": "1.0_head", "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#", "expected_object_size": "52428800", "expected_write_size": "52428800" }, { "op_num": 3, "op_name": "write", "collection": "1.0_head", "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#", "length": 52428800, "offset": 0, "bufferlist length": 52428800 }, { "op_num": 4, "op_name": "omap_setkeys", "collection": "1.0_head", "oid": "#1:00000000::::head#", "attr_lens": { "0000000008.00000000000000000020": 201, "_fastinfo": 186 } } ] } 2022-07-04 03:17:22.578 7f38407c9700 -1 /home/ceph/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f38407c9700 time 2022-07-04 03:17:22.526530 /home/ceph/src/os/bluestore/BlueStore.cc: 12391: ceph_abort_msg("unexpected error") ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable) 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xfe) [0x560aa3b91d4e] 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x20ee) [0x560aa3963c6c] 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2da) [0x560aa3961124] 4: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x91) [0x560aa34ae055] 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x615) [0x560aa373e397] 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xcb0) [0x560aa3460bba] 7: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x1a0d) [0x560aa34204d9] 8: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x4eb6) [0x560aa34113d4] 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xe57) [0x560aa340b771] 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x40b) [0x560aa3147027] 11: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6e) [0x560aa36b2b9e] 12: (OpQueueItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x4b) [0x560aa3177a17] 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x36b5) [0x560aa3154fb7] 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5a6) [0x560aa3b80774] 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x25) [0x560aa3b82169] 16: (Thread::entry_wrapper()+0x78) [0x560aa3b6cb5e] 17: (Thread::_entry_func(void*)+0x18) [0x560aa3b6cadc] 18: (()+0x7ea5) [0x7f386117cea5] 19: (clone()+0x6d) [0x7f386003fb0d]
osd pull status with block
[root@HOST-01 build]# ./bin/ceph -v *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable) [root@HOST-01 build]# [root@HOST-01 build]# ./bin/ceph df *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2022-07-04 03:25:17.250 7f45cf5c1700 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:25:17.346 7f45cf5c1700 -1 WARNING: all dangerous and experimental features are enabled. RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1 GiB 37 MiB 982 MiB 987 MiB 96.34 TOTAL 1 GiB 37 MiB 982 MiB 987 MiB 96.34 POOLS: POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL test 1 1 980 MiB 18 980 MiB 97.30 27 MiB [root@HOST-01 build]# [root@HOST-01 build]# ./bin/ceph-osd -i 0 warning: line 125: 'bluestore_block_db_path' in section 'osd' redefined warning: line 126: 'bluestore_block_db_size' in section 'osd' redefined warning: line 127: 'bluestore_block_wal_path' in section 'osd' redefined warning: line 128: 'bluestore_block_wal_size' in section 'osd' redefined 2022-07-04 03:25:22.989 7fdbb8be6a80 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:25:23.038 7fdbb8be6a80 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:25:23.074 7fdbb8be6a80 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:25:23.623 7fdbb8be6a80 -1 Falling back to public interface 2022-07-04 03:25:23.973 7fdbb8be6a80 -1 bluestore(/home/ceph/build/dev/osd0) fsck error: bluefs_extents inconsistency, downgrade to previous releases might be broken. 2022-07-04 03:25:24.444 7fdbb8be6a80 -1 bluestore(/home/ceph/build/dev/osd0) _mount fsck found 1 errors 2022-07-04 03:25:24.444 7fdbb8be6a80 -1 osd.0 0 OSD:init: unable to mount object store 2022-07-04 03:25:24.444 7fdbb8be6a80 -1 ** ERROR: osd init failed: (5) Input/output error [root@HOST-01 build]# [root@HOST-01 build]# ll dev/osd0/ total 48 lrwxrwxrwx 1 root root 10 Jul 4 03:15 block -> /dev/sdbk3 -rw------- 1 root root 2 Jul 4 03:15 bluefs -rw------- 1 root root 37 Jul 4 03:15 ceph_fsid -rw-r--r-- 1 root root 37 Jul 4 03:15 fsid -rw-r--r-- 1 root root 56 Jul 4 03:15 keyring -rw------- 1 root root 8 Jul 4 03:15 kv_backend -rw------- 1 root root 21 Jul 4 03:15 magic -rw------- 1 root root 4 Jul 4 03:15 mkfs_done -rw------- 1 root root 41 Jul 4 03:15 osd_key -rw------- 1 root root 6 Jul 4 03:15 ready -rw------- 1 root root 3 Jul 4 03:15 require_osd_release -rw------- 1 root root 10 Jul 4 03:15 type -rw------- 1 root root 2 Jul 4 03:15 whoami
osd pull status with block, db, wal
[root@HOST-01 build]# ./bin/ceph -v *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable) [root@HOST-01 build]# [root@HOST-01 build]# ./bin/ceph df *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2022-07-04 03:02:46.893 7f6f61021700 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:02:46.966 7f6f61021700 -1 WARNING: all dangerous and experimental features are enabled. RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1.2 GiB 42 MiB 1.2 GiB 1.2 GiB 96.54 TOTAL 1.2 GiB 42 MiB 1.2 GiB 1.2 GiB 96.54 POOLS: POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL test 1 1 980 MiB 18 980 MiB 97.02 30 MiB [root@HOST-01 build]# [root@HOST-01 build]# ./bin/ceph-osd -i 0 warning: line 117: 'bluestore_block_db_path' in section 'osd' redefined warning: line 118: 'bluestore_block_wal_path' in section 'osd' redefined 2022-07-04 03:02:50.867 7fe20d74fa80 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:02:50.909 7fe20d74fa80 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:02:50.944 7fe20d74fa80 -1 WARNING: all dangerous and experimental features are enabled. 2022-07-04 03:02:51.988 7fe20d74fa80 -1 Falling back to public interface 2022-07-04 03:02:52.393 7fe20d74fa80 -1 bluestore(/home/ceph/build/dev/osd0) fsck error: bluefs_extents inconsistency, downgrade to previous releases might be broken. 2022-07-04 03:02:53.062 7fe20d74fa80 -1 bluestore(/home/ceph/build/dev/osd0) _mount fsck found 1 errors 2022-07-04 03:02:53.062 7fe20d74fa80 -1 osd.0 0 OSD:init: unable to mount object store 2022-07-04 03:02:53.062 7fe20d74fa80 -1 ** ERROR: osd init failed: (5) Input/output error [root@HOST-01 build]# [root@HOST-01 build]# ll dev/osd0/ total 48 lrwxrwxrwx 1 root root 10 Jul 4 02:43 block -> /dev/sdbk3 lrwxrwxrwx 1 root root 10 Jul 4 02:43 block.db -> /dev/sdbk1 lrwxrwxrwx 1 root root 10 Jul 4 02:43 block.wal -> /dev/sdbk2 -rw------- 1 root root 2 Jul 4 02:43 bluefs -rw------- 1 root root 37 Jul 4 02:43 ceph_fsid -rw-r--r-- 1 root root 37 Jul 4 02:43 fsid -rw-r--r-- 1 root root 56 Jul 4 02:43 keyring -rw------- 1 root root 8 Jul 4 02:43 kv_backend -rw------- 1 root root 21 Jul 4 02:43 magic -rw------- 1 root root 4 Jul 4 02:43 mkfs_done -rw------- 1 root root 41 Jul 4 02:43 osd_key -rw------- 1 root root 6 Jul 4 02:43 ready -rw------- 1 root root 3 Jul 4 02:44 require_osd_release -rw------- 1 root root 10 Jul 4 02:43 type -rw------- 1 root root 2 Jul 4 02:43 whoami [root@HOST-01 build]#
History
#1 Updated by xu wang over 1 year ago
relates to https://tracker.ceph.com/issues/42913