Project

General

Profile

Bug #56467

nautilus: osd crashs with _do_alloc_write failed with (28) No space left on device

Added by xu wang over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
v14.2.22; bluestore; space
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd crashs and can't be pulled up when bluestore runs out of space in N release. Here's the stack trace in the log:

2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) _do_alloc_write failed to allocate 0x3200000 allocated 0x 1fd0000 min_alloc_size 0x10000 available 0x 0
2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) _do_write _do_alloc_write failed with (28) No space left on device
2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 3, counting from 0)
2022-07-04 03:17:22.525 7f38407c9700 -1 bluestore(/home/ceph/build/dev/osd0) ENOSPC from bluestore, misconfigured cluster
2022-07-04 03:17:22.525 7f38407c9700  0 _dump_transaction transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "touch",
            "collection": "1.0_head",
            "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#" 
        },
        {
            "op_num": 1,
            "op_name": "setattrs",
            "collection": "1.0_head",
            "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#",
            "attr_lens": {
                "_": 312,
                "snapset": 35
            }
        },
        {
            "op_num": 2,
            "op_name": "op_setallochint",
            "collection": "1.0_head",
            "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#",
            "expected_object_size": "52428800",
            "expected_write_size": "52428800" 
        },
        {
            "op_num": 3,
            "op_name": "write",
            "collection": "1.0_head",
            "oid": "#1:4b010060:::benchmark_data_HOST-01_57478_object19:head#",
            "length": 52428800,
            "offset": 0,
            "bufferlist length": 52428800
        },
        {
            "op_num": 4,
            "op_name": "omap_setkeys",
            "collection": "1.0_head",
            "oid": "#1:00000000::::head#",
            "attr_lens": {
                "0000000008.00000000000000000020": 201,
                "_fastinfo": 186
            }
        }
    ]
}

2022-07-04 03:17:22.578 7f38407c9700 -1 /home/ceph/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f38407c9700 time 2022-07-04 03:17:22.526530
/home/ceph/src/os/bluestore/BlueStore.cc: 12391: ceph_abort_msg("unexpected error")

 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xfe) [0x560aa3b91d4e]
 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x20ee) [0x560aa3963c6c]
 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2da) [0x560aa3961124]
 4: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x91) [0x560aa34ae055]
 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x615) [0x560aa373e397]
 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xcb0) [0x560aa3460bba]
 7: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x1a0d) [0x560aa34204d9]
 8: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x4eb6) [0x560aa34113d4]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xe57) [0x560aa340b771]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x40b) [0x560aa3147027]
 11: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6e) [0x560aa36b2b9e]
 12: (OpQueueItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x4b) [0x560aa3177a17]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x36b5) [0x560aa3154fb7]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5a6) [0x560aa3b80774]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x25) [0x560aa3b82169]
 16: (Thread::entry_wrapper()+0x78) [0x560aa3b6cb5e]
 17: (Thread::_entry_func(void*)+0x18) [0x560aa3b6cadc]
 18: (()+0x7ea5) [0x7f386117cea5]
 19: (clone()+0x6d) [0x7f386003fb0d]

osd pull status with block

[root@HOST-01 build]# ./bin/ceph -v
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
[root@HOST-01 build]#
[root@HOST-01 build]# ./bin/ceph df
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-07-04 03:25:17.250 7f45cf5c1700 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:25:17.346 7f45cf5c1700 -1 WARNING: all dangerous and experimental features are enabled.
RAW STORAGE:
    CLASS     SIZE      AVAIL      USED        RAW USED     %RAW USED
    hdd       1 GiB     37 MiB     982 MiB      987 MiB         96.34
    TOTAL     1 GiB     37 MiB     982 MiB      987 MiB         96.34

POOLS:
    POOL     ID     PGS     STORED      OBJECTS     USED        %USED     MAX AVAIL
    test      1       1     980 MiB          18     980 MiB     97.30        27 MiB
[root@HOST-01 build]#
[root@HOST-01 build]# ./bin/ceph-osd -i 0
warning: line 125: 'bluestore_block_db_path' in section 'osd' redefined
warning: line 126: 'bluestore_block_db_size' in section 'osd' redefined
warning: line 127: 'bluestore_block_wal_path' in section 'osd' redefined
warning: line 128: 'bluestore_block_wal_size' in section 'osd' redefined
2022-07-04 03:25:22.989 7fdbb8be6a80 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:25:23.038 7fdbb8be6a80 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:25:23.074 7fdbb8be6a80 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:25:23.623 7fdbb8be6a80 -1 Falling back to public interface
2022-07-04 03:25:23.973 7fdbb8be6a80 -1 bluestore(/home/ceph/build/dev/osd0) fsck error: bluefs_extents inconsistency, downgrade to previous releases might be broken.
2022-07-04 03:25:24.444 7fdbb8be6a80 -1 bluestore(/home/ceph/build/dev/osd0) _mount fsck found 1 errors
2022-07-04 03:25:24.444 7fdbb8be6a80 -1 osd.0 0 OSD:init: unable to mount object store
2022-07-04 03:25:24.444 7fdbb8be6a80 -1  ** ERROR: osd init failed: (5) Input/output error
[root@HOST-01 build]#
[root@HOST-01 build]# ll dev/osd0/
total 48
lrwxrwxrwx 1 root root 10 Jul  4 03:15 block -> /dev/sdbk3
-rw------- 1 root root  2 Jul  4 03:15 bluefs
-rw------- 1 root root 37 Jul  4 03:15 ceph_fsid
-rw-r--r-- 1 root root 37 Jul  4 03:15 fsid
-rw-r--r-- 1 root root 56 Jul  4 03:15 keyring
-rw------- 1 root root  8 Jul  4 03:15 kv_backend
-rw------- 1 root root 21 Jul  4 03:15 magic
-rw------- 1 root root  4 Jul  4 03:15 mkfs_done
-rw------- 1 root root 41 Jul  4 03:15 osd_key
-rw------- 1 root root  6 Jul  4 03:15 ready
-rw------- 1 root root  3 Jul  4 03:15 require_osd_release
-rw------- 1 root root 10 Jul  4 03:15 type
-rw------- 1 root root  2 Jul  4 03:15 whoami

osd pull status with block, db, wal

[root@HOST-01 build]# ./bin/ceph -v
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
[root@HOST-01 build]#
[root@HOST-01 build]# ./bin/ceph df
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-07-04 03:02:46.893 7f6f61021700 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:02:46.966 7f6f61021700 -1 WARNING: all dangerous and experimental features are enabled.
RAW STORAGE:
    CLASS     SIZE        AVAIL      USED        RAW USED     %RAW USED
    hdd       1.2 GiB     42 MiB     1.2 GiB      1.2 GiB         96.54
    TOTAL     1.2 GiB     42 MiB     1.2 GiB      1.2 GiB         96.54

POOLS:
    POOL     ID     PGS     STORED      OBJECTS     USED        %USED     MAX AVAIL
    test      1       1     980 MiB          18     980 MiB     97.02        30 MiB
[root@HOST-01 build]#
[root@HOST-01 build]# ./bin/ceph-osd -i 0
warning: line 117: 'bluestore_block_db_path' in section 'osd' redefined
warning: line 118: 'bluestore_block_wal_path' in section 'osd' redefined
2022-07-04 03:02:50.867 7fe20d74fa80 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:02:50.909 7fe20d74fa80 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:02:50.944 7fe20d74fa80 -1 WARNING: all dangerous and experimental features are enabled.
2022-07-04 03:02:51.988 7fe20d74fa80 -1 Falling back to public interface
2022-07-04 03:02:52.393 7fe20d74fa80 -1 bluestore(/home/ceph/build/dev/osd0) fsck error: bluefs_extents inconsistency, downgrade to previous releases might be broken.
2022-07-04 03:02:53.062 7fe20d74fa80 -1 bluestore(/home/ceph/build/dev/osd0) _mount fsck found 1 errors
2022-07-04 03:02:53.062 7fe20d74fa80 -1 osd.0 0 OSD:init: unable to mount object store
2022-07-04 03:02:53.062 7fe20d74fa80 -1  ** ERROR: osd init failed: (5) Input/output error
[root@HOST-01 build]#
[root@HOST-01 build]# ll dev/osd0/
total 48
lrwxrwxrwx 1 root root 10 Jul  4 02:43 block -> /dev/sdbk3
lrwxrwxrwx 1 root root 10 Jul  4 02:43 block.db -> /dev/sdbk1
lrwxrwxrwx 1 root root 10 Jul  4 02:43 block.wal -> /dev/sdbk2
-rw------- 1 root root  2 Jul  4 02:43 bluefs
-rw------- 1 root root 37 Jul  4 02:43 ceph_fsid
-rw-r--r-- 1 root root 37 Jul  4 02:43 fsid
-rw-r--r-- 1 root root 56 Jul  4 02:43 keyring
-rw------- 1 root root  8 Jul  4 02:43 kv_backend
-rw------- 1 root root 21 Jul  4 02:43 magic
-rw------- 1 root root  4 Jul  4 02:43 mkfs_done
-rw------- 1 root root 41 Jul  4 02:43 osd_key
-rw------- 1 root root  6 Jul  4 02:43 ready
-rw------- 1 root root  3 Jul  4 02:44 require_osd_release
-rw------- 1 root root 10 Jul  4 02:43 type
-rw------- 1 root root  2 Jul  4 02:43 whoami
[root@HOST-01 build]#

History

Also available in: Atom PDF