Actions
Bug #62501
closedpacific(?): qa: mgr-osd-full causes OSD aborts due to ENOSPC (fs/full/subvolume_snapshot_rm.sh)
% Done:
0%
Source:
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Probably misconfiguration allowing OSDs to actually run out of space during test instead of the OSD refusing further writes due to pool full flag.
2023-08-17T15:30:28.064 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _do_alloc_write failed to allocate 0x400000 allocated 0x 2a000 min_alloc_size 0x1000 available 0x 0 2023-08-17T15:30:28.065 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _do_write _do_alloc_write failed with (28) No space left on device 2023-08-17T15:30:28.065 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 2, counting from 0) 2023-08-17T15:30:28.065 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) ENOSPC from bluestore, misconfigured cluster 2023-08-17T15:30:28.076 INFO:tasks.ceph.osd.0.smithi175.stderr:/build/ceph-16.2.13-531-ge78f2dc9/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f5d0503d700 time 2023-08-17T15:30:28.061403+0000 2023-08-17T15:30:28.077 INFO:tasks.ceph.osd.0.smithi175.stderr:/build/ceph-16.2.13-531-ge78f2dc9/src/os/bluestore/BlueStore.cc: 13506: ceph_abort_msg("unexpected error") 2023-08-17T15:30:28.077 INFO:tasks.ceph.osd.0.smithi175.stderr: ceph version 16.2.13-531-ge78f2dc9 (e78f2dc97637da188e6292122efedae3d18948ca) pacific (stable) 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe0) [0x55e13f5500cb] 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x95a) [0x55e13fc270aa] 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2fd) [0x55e13fc297dd] 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x55e13f805ec8] 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x7e9) [0x55e13fa28f39] 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xecf) [0x55e13f75e2bf] 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 7: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xf95) [0x55e13f7c58b5] 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 8: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x36b9) [0x55e13f7c9d39] 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xe61) [0x55e13f7d4d81] 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1c0) [0x55e13f624b50] 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 11: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x66) [0x55e13f8dca06] 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8aa) [0x55e13f651e7a] 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x55e13fd8c5d3] 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55e13fd8f614] 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f5d26a07609] 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 16: clone() ... 2023-08-17T17:15:36.189 DEBUG:teuthology.orchestra.run:got remote process result: 1 2023-08-17T17:15:36.190 INFO:tasks.workunit:Stopping ['fs/full/subvolume_snapshot_rm.sh'] on client.0... 2023-08-17T17:15:36.190 DEBUG:teuthology.orchestra.run.smithi175:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2023-08-17T17:15:36.428 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/run_tasks.py", line 105, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/run_tasks.py", line 83, in run_one_task return task(**kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_e78f2dc97637da188e6292122efedae3d18948ca/qa/tasks/workunit.py", line 129, in task p.spawn(_run_tests, ctx, refspec, role, tests, File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 84, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 98, in __next__ resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 30, in resurrect_traceback raise exc.exc_info[1] File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 23, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_e78f2dc97637da188e6292122efedae3d18948ca/qa/tasks/workunit.py", line 423, in _run_tests remote.run( File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/remote.py", line 522, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/run.py", line 455, in run r.wait() File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/run.py", line 181, in _raise_for_status raise CommandFailedError( teuthology.exceptions.CommandFailedError: Command failed (workunit test fs/full/subvolume_snapshot_rm.sh) on smithi175 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=e78f2dc97637da188e6292122efedae3d18948ca TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/full/subvolume_snapshot_rm.sh'
/teuthology/yuriw-2023-08-17_14:36:27-fs-wip-yuri7-testing-2023-08-16-1309-pacific-distro-default-smithi/7371810/teuthology.log
Actions