Bug #46360

Updated by Ramana Raja 8 months ago

During Hit the EDQUOT error by libcephfs during `fs subvolume clone`, libcephfs hit the "Disk quota exceeded error" that clone` and this caused the subvolume clone fs clones to be stuck in progress instead of entering failed state. I could see the following traceback in the mgr log,

File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 117, in copy_file
written += fs.write(dst_fd, data[written:], -1)
File "cephfs.pyx", line 1463, in cephfs.LibCephFS.write
cephfs.Error: error in write: Disk quota exceeded [Errno 122]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 44, in run
self.async_job.execute_job(vol_job[0], vol_job[1], should_cancel=lambda: thread_id.should_cancel())
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 309, in execute_job
clone(, volname, job[0].decode('utf-8'), job[1].decode('utf-8'), self.state_table, should_cancel)
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 222, in clone
start_clone_sm(volume_client, volname, index, groupname, subvolname, state_table, should_cancel)
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 202, in start_clone_sm
(next_state, finished) = handler(volume_client, volname, index, groupname, subvolname, should_cancel)
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 159, in handle_clone_in_progress
do_clone(volume_client, volname, groupname, subvolname, should_cancel)
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 155, in do_clone
bulk_copy(fs_handle, src_path, dst_path, should_cancel)
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 144, in bulk_copy
cptree(source_path, dst_path)
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 129, in cptree
copy_file(fs_handle, d_full_src, d_full_dst, mo, cancel_check=should_cancel)
File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/", line 120, in copy_file
raise VolumeException(-e.args[0], e.args[1])
TypeError: bad operand type for unary -: 'str'

Digging further found that if a libcephfs return code is not converted into a python exception by cephfs.pyx, then cephfs.pyx raises an exception with a different argument than it normally does. See in cephfs.pyx,

cdef make_ex(ret, msg):
Translate a librados return code into an exception.
ret = abs(ret)
if ret in errno_to_exception:
return errno_to_exception[ret](ret, msg)
return Error(msg + ': {} [Errno {:d}]'.format(os.strerror(ret), ret))

So it sometimes raises cephfs.Error(ret, msg) and sometimes cephfs.Error(msg). The mgr/volumes only handles cephfs.Error(ret, msg) correctly.