Project

General

Profile

Bug #62501

Updated by Patrick Donnelly 9 months ago

Probably misconfiguration allowing OSDs to actually run out of space during test instead of the OSD refusing further writes due to pool full flag. 


 <pre> 
 2023-08-17T15:30:28.064 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _do_alloc_write failed to allocate 0x400000 allocated 0x 2a000 min_alloc_size 0x1000 available 0x 0 
 2023-08-17T15:30:28.065 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _do_write _do_alloc_write failed with (28) No space left on device 
 2023-08-17T15:30:28.065 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 2, counting from 0) 
 2023-08-17T15:30:28.065 INFO:tasks.ceph.osd.0.smithi175.stderr:2023-08-17T15:30:28.054+0000 7f5d0503d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) ENOSPC from bluestore, misconfigured cluster 
 2023-08-17T15:30:28.076 INFO:tasks.ceph.osd.0.smithi175.stderr:/build/ceph-16.2.13-531-ge78f2dc9/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f5d0503d700 time 2023-08-17T15:30:28.061403+0000 
 2023-08-17T15:30:28.077 INFO:tasks.ceph.osd.0.smithi175.stderr:/build/ceph-16.2.13-531-ge78f2dc9/src/os/bluestore/BlueStore.cc: 13506: ceph_abort_msg("unexpected error") 
 2023-08-17T15:30:28.077 INFO:tasks.ceph.osd.0.smithi175.stderr: ceph version 16.2.13-531-ge78f2dc9 (e78f2dc97637da188e6292122efedae3d18948ca) pacific (stable) 
 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe0) [0x55e13f5500cb] 
 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x95a) [0x55e13fc270aa] 
 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2fd) [0x55e13fc297dd] 
 2023-08-17T15:30:28.078 INFO:tasks.ceph.osd.0.smithi175.stderr: 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x55e13f805ec8] 
 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x7e9) [0x55e13fa28f39] 
 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xecf) [0x55e13f75e2bf] 
 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 7: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xf95) [0x55e13f7c58b5] 
 2023-08-17T15:30:28.079 INFO:tasks.ceph.osd.0.smithi175.stderr: 8: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x36b9) [0x55e13f7c9d39] 
 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xe61) [0x55e13f7d4d81] 
 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1c0) [0x55e13f624b50] 
 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 11: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x66) [0x55e13f8dca06] 
 2023-08-17T15:30:28.080 INFO:tasks.ceph.osd.0.smithi175.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8aa) [0x55e13f651e7a] 
 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x55e13fd8c5d3] 
 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55e13fd8f614] 
 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f5d26a07609] 
 2023-08-17T15:30:28.081 INFO:tasks.ceph.osd.0.smithi175.stderr: 16: clone() 
 ... 
 2023-08-17T17:15:36.189 DEBUG:teuthology.orchestra.run:got remote process result: 1 
 2023-08-17T17:15:36.190 INFO:tasks.workunit:Stopping ['fs/full/subvolume_snapshot_rm.sh'] on client.0... 
 2023-08-17T17:15:36.190 DEBUG:teuthology.orchestra.run.smithi175:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 
 2023-08-17T17:15:36.428 ERROR:teuthology.run_tasks:Saw exception from tasks. 
 Traceback (most recent call last): 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/run_tasks.py", line 105, in run_tasks 
     manager = run_one_task(taskname, ctx=ctx, config=config) 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/run_tasks.py", line 83, in run_one_task 
     return task(**kwargs) 
   File "/home/teuthworker/src/github.com_ceph_ceph-c_e78f2dc97637da188e6292122efedae3d18948ca/qa/tasks/workunit.py", line 129, in task 
     p.spawn(_run_tests, ctx, refspec, role, tests, 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 84, in __exit__ 
     for result in self: 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 98, in __next__ 
     resurrect_traceback(result) 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 30, in resurrect_traceback 
     raise exc.exc_info[1] 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/parallel.py", line 23, in capture_traceback 
     return func(*args, **kwargs) 
   File "/home/teuthworker/src/github.com_ceph_ceph-c_e78f2dc97637da188e6292122efedae3d18948ca/qa/tasks/workunit.py", line 423, in _run_tests 
     remote.run( 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/remote.py", line 522, in run 
     r = self._runner(client=self.ssh, name=self.shortname, **kwargs) 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/run.py", line 455, in run 
     r.wait() 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/run.py", line 161, in wait 
     self._raise_for_status() 
   File "/home/teuthworker/src/git.ceph.com_teuthology_449a1bc2027504e7b3c3d7b30fa4178906581da7/teuthology/orchestra/run.py", line 181, in _raise_for_status 
     raise CommandFailedError( 
 teuthology.exceptions.CommandFailedError: Command failed (workunit test fs/full/subvolume_snapshot_rm.sh) on smithi175 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=e78f2dc97637da188e6292122efedae3d18948ca TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/full/subvolume_snapshot_rm.sh' 
 </pre> 

 /teuthology/yuriw-2023-08-17_14:36:27-fs-wip-yuri7-testing-2023-08-16-1309-pacific-distro-default-smithi/7371810/teuthology.log

Back