Bug #5615: lock ops are not re-sent when cluster gets marked un-full - rbd - Ceph

Actions

Copy link

Bug #5615

closed

lock ops are not re-sent when cluster gets marked un-full

Added by Greg Farnum almost 11 years ago. Updated over 10 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I'm not certain what the correct behavior should be in this case, so
maybe it is not a bug, but here is what is happening:

When an OSD becomes full, a process fails and we unmount the rbd
attempt to remove the lock associated with the rbd for the process.
The unmount works fine, but removing the lock is failing right now
because the list_lockers() function call never returns.

Here is a code snippet I tried with a fake rbd lock on a test cluster:

import rbd
import rados
with rados.Rados(conffile='/etc/ceph/ceph.conf') as cluster:
  with cluster.open_ioctx('rbd') as ioctx:
    with rbd.Image(ioctx, 'msd1') as image:
      image.list_lockers()

The process never returns, even after the ceph cluster is returned to
healthy.  The only indication of the error is an error in the
/var/log/messages file:

Jul 11 23:25:05 node-172-16-0-13 python: 2013-07-11 23:25:05.826793
7ffc66d72700  0 client.6911.objecter  FULL, paused modify
0x7ffc687c6050 tid 2

Any help would be greatly appreciated.

ceph version:

ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)

This may turn out to be a librados issue, but it showed up via rbd locking.

Actions

Copy link

Updated by Sage Weil almost 11 years ago

i bet this affects all class calls/execs, because objecter doesn't know if it is a read or a write. we may need to resend all of them (and/or pessimistically flag them all as write) so that they get resent.

Actions

Copy link