Actions
Bug #5615
closedlock ops are not re-sent when cluster gets marked un-full
Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I'm not certain what the correct behavior should be in this case, so maybe it is not a bug, but here is what is happening: When an OSD becomes full, a process fails and we unmount the rbd attempt to remove the lock associated with the rbd for the process. The unmount works fine, but removing the lock is failing right now because the list_lockers() function call never returns. Here is a code snippet I tried with a fake rbd lock on a test cluster: import rbd import rados with rados.Rados(conffile='/etc/ceph/ceph.conf') as cluster: with cluster.open_ioctx('rbd') as ioctx: with rbd.Image(ioctx, 'msd1') as image: image.list_lockers() The process never returns, even after the ceph cluster is returned to healthy. The only indication of the error is an error in the /var/log/messages file: Jul 11 23:25:05 node-172-16-0-13 python: 2013-07-11 23:25:05.826793 7ffc66d72700 0 client.6911.objecter FULL, paused modify 0x7ffc687c6050 tid 2 Any help would be greatly appreciated. ceph version: ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
This may turn out to be a librados issue, but it showed up via rbd locking.
Updated by Sage Weil almost 11 years ago
i bet this affects all class calls/execs, because objecter doesn't know if it is a read or a write. we may need to resend all of them (and/or pessimistically flag them all as write) so that they get resent.
Updated by Greg Farnum almost 11 years ago
If it's stopping the op because the cluster is marked FULL then it ought to be able to know the op needs to be sent out again later. ;)
Updated by Sage Weil almost 11 years ago
oh right. in that case the op was probably sent and then dropped by the osd. objecter only pauses ops marked as write, and iirc class ops aren't.
Updated by Sage Weil over 10 years ago
- Status changed from New to Duplicate
this is the linger resend on unfull bug #6070
Actions