Project

General

Profile

Actions

Bug #42913

closed

nautilus: cram test fails with _do_alloc_write failed with (28) No space left on device

Added by Neha Ojha over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-11-19T21:19:18.602 INFO:teuthology.orchestra.run.smithi053.stderr:HEAD is now at c837913... Merge branch 'wip-42677-nautilus' of https://github.com/epuertat/ceph into wip-yuri3-testing-2019-11-19-1618
2019-11-19T21:19:18.602 INFO:teuthology.orchestra.run.smithi053:> cp -- /home/ubuntu/cephtest/clone.client.0/src/test/cli-integration/balancer/misplaced.t /home/ubuntu/cephtest/archive/cram.client.0
2019-11-19T21:19:18.696 INFO:tasks.cram:Running tests for client.0...
2019-11-19T21:19:18.697 INFO:teuthology.orchestra.run.smithi053:> CEPH_REF=master CEPH_ID="0" PATH=$PATH:/usr/sbin adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage /home/ubuntu/cephtest/virtualenv/bin/cram -v -- /home/ubuntu/cephtest/archive/cram.client.0/*.t
2019-11-19T21:19:34.121 INFO:tasks.ceph.osd.1.smithi053.stderr:2019-11-19 21:19:34.117 7fd752fa9700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _do_write _do_alloc_write failed with (28) No space left on device
2019-11-19T21:19:34.121 INFO:tasks.ceph.osd.1.smithi053.stderr:2019-11-19 21:19:34.117 7fd752fa9700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 3, counting from 0)
2019-11-19T21:19:34.122 INFO:tasks.ceph.osd.1.smithi053.stderr:2019-11-19 21:19:34.117 7fd752fa9700 -1 bluestore(/var/lib/ceph/osd/ceph-1) ENOSPC from bluestore, misconfigured cluster
2019-11-19T21:19:34.122 INFO:tasks.ceph.osd.1.smithi053.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4-895-gc837913/rpm/el7/BUILD/ceph-14.2.4-895-gc837913/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7fd74efa1700 time 2019-11-19 21:19:34.112169
2019-11-19T21:19:34.122 INFO:tasks.ceph.osd.1.smithi053.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4-895-gc837913/rpm/el7/BUILD/ceph-14.2.4-895-gc837913/src/os/bluestore/BlueStore.cc: 11847: ceph_abort_msg("unexpected error")
2019-11-19T21:19:34.123 INFO:tasks.ceph.osd.1.smithi053.stderr: ceph version 14.2.4-895-gc837913 (c837913497e4170ffd01ec9a0dcfed03c01b0ab7) nautilus (stable)
2019-11-19T21:19:34.124 INFO:tasks.ceph.osd.1.smithi053.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xdd) [0x563fd17bd31a]
2019-11-19T21:19:34.124 INFO:tasks.ceph.osd.1.smithi053.stderr: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0xc8d) [0x563fd1d0cd3d]
2019-11-19T21:19:34.124 INFO:tasks.ceph.osd.1.smithi053.stderr: 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x370) [0x563fd1d262e0]
2019-11-19T21:19:34.124 INFO:tasks.ceph.osd.1.smithi053.stderr: 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x563fd1a91494]
2019-11-19T21:19:34.125 INFO:tasks.ceph.osd.1.smithi053.stderr: 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x61e) [0x563fd1b8452e]
2019-11-19T21:19:34.125 INFO:tasks.ceph.osd.1.smithi053.stderr: 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x10c3) [0x563fd19f60c3]
2019-11-19T21:19:34.126 INFO:tasks.ceph.osd.1.smithi053.stderr: 7: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x10a2) [0x563fd1a48512]
2019-11-19T21:19:34.126 INFO:tasks.ceph.osd.1.smithi053.stderr: 8: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3716) [0x563fd1a4c436]
2019-11-19T21:19:34.127 INFO:tasks.ceph.osd.1.smithi053.stderr: 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xcae) [0x563fd1a4de1e]
2019-11-19T21:19:34.127 INFO:tasks.ceph.osd.1.smithi053.stderr: 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) [0x563fd1899e62]
2019-11-19T21:19:34.127 INFO:tasks.ceph.osd.1.smithi053.stderr: 11: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x563fd1b27622]
2019-11-19T21:19:34.127 INFO:tasks.ceph.osd.1.smithi053.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x563fd18b4caf]
2019-11-19T21:19:34.128 INFO:tasks.ceph.osd.1.smithi053.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x563fd1e54776]
2019-11-19T21:19:34.128 INFO:tasks.ceph.osd.1.smithi053.stderr: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x563fd1e57290]
2019-11-19T21:19:34.128 INFO:tasks.ceph.osd.1.smithi053.stderr: 15: (()+0x7dd5) [0x7fd773f8edd5]
2019-11-19T21:19:34.128 INFO:tasks.ceph.osd.1.smithi053.stderr: 16: (clone()+0x6d) [0x7fd772e5502d]
2019-11-19T21:19:34.128 INFO:tasks.ceph.osd.1.smithi053.stderr:*** Caught signal (Aborted) **

This has started appearing since we merged https://github.com/ceph/ceph/commit/15f360dd17502b947c0237fd6b01a8b14e9be6c7#diff-bce7896e1ecd14da7871a0509c3fcb9d

/a/yuriw-2019-11-19_20:27:29-rados-wip-yuri3-testing-2019-11-19-1618-nautilus-distro-basic-smithi/4523837/

Actions #1

Updated by Neha Ojha over 4 years ago

2019-12-16 18:25:24.383 7f1c9f173700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _do_alloc_write failed to allocate 0x400000 allocated 0x 384000 min_alloc_size 0x4000 available 0x 0
2019-12-16 18:25:24.383 7f1c9f173700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _do_write _do_alloc_write failed with (28) No space left on device
2019-12-16 18:25:24.383 7f1c9f173700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 3, counting from 0)
2019-12-16 18:25:24.383 7f1c9f173700 -1 bluestore(/var/lib/ceph/osd/ceph-1) ENOSPC from bluestore, misconfigured cluster
2019-12-16 18:25:24.383 7f1c9f173700  0 _dump_transaction transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "touch",
            "collection": "2.17_head",
            "oid": "#2:e9373a8f:::benchmark_data_smithi076_34907_object3091:head#" 
        },
        {
            "op_num": 1,
            "op_name": "setattrs",
            "collection": "2.17_head",
            "oid": "#2:e9373a8f:::benchmark_data_smithi076_34907_object3091:head#",
            "attr_lens": {
                "_": 296,
                "snapset": 35
            }
        },
        {
            "op_num": 2,
            "op_name": "op_setallochint",
            "collection": "2.17_head",
            "oid": "#2:e9373a8f:::benchmark_data_smithi076_34907_object3091:head#",
            "expected_object_size": "4194304",
            "expected_write_size": "4194304" 
        },
        {
            "op_num": 3,
            "op_name": "write",
            "collection": "2.17_head",
            "oid": "#2:e9373a8f:::benchmark_data_smithi076_34907_object3091:head#",
            "length": 4194304,
            "offset": 0,
            "bufferlist length": 4194304
        },
        {
            "op_num": 4,
            "op_name": "omap_setkeys",
            "collection": "2.17_head",
            "oid": "#2:e8000000::::head#",
            "attr_lens": {
                "0000000014.00000000000000000019": 185,
                "_fastinfo": 186
            }
        }
    ]
}

Fails in nautilus: http://pulpito.ceph.com/nojha-2019-12-16_18:02:56-rados:singleton-nomsgr-nautilus-distro-basic-smithi/

Log copied to senta04.front.sepia.ceph.com - /home/nojha/42913

Passes on master: http://pulpito.ceph.com/nojha-2019-12-16_18:01:36-rados:singleton-nomsgr-master-distro-basic-smithi/

Actions #2

Updated by Neha Ojha over 4 years ago

  • Assignee set to Neha Ojha
Actions #3

Updated by Neha Ojha over 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32283

We need to increase the block size.

[ubuntu@smithi076 ceph]$ ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 63 down
1 ssd 0.00980 1.00000 10 GiB 9.6 GiB 8.6 GiB 0 B 1 GiB 424 MiB 95.86 1.11 63 down
2 ssd 0.00980 1.00000 10 GiB 7.7 GiB 6.7 GiB 0 B 1 GiB 2.3 GiB 76.84 0.89 73 up
TOTAL 20 GiB 17 GiB 15 GiB 0 B 2 GiB 2.7 GiB 86.35
MIN/MAX VAR: 0.89/1.11 STDDEV: 9.51

Actions #5

Updated by Neha Ojha over 4 years ago

  • Status changed from Fix Under Review to Resolved
  • Target version set to v14.2.6
Actions

Also available in: Atom PDF