cache-flush-evict-all causes OSD stuck ops on unevictable objects
Cluster is running Ceph 0.94.4.
Start with a pair of empty pools, one erasure, one replicated:
pool 3 'rbd-backup' erasure size 9 min_size 7 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 203 lfor 202 flags hashpspool stripe_width 4256 pool 5 'rbd-backup-cache' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 205 flags hashpspool stripe_width 0Set up the tier:
$ ceph osd tier add rbd-backup rbd-backup-cache $ ceph osd tier cache-mode rbd-backup-cache writeback $ ceph osd tier set-overlay rbd-backup rbd-backup-cacheCreate an empty object and give it an omap entry, so it can't be evicted to the erasure pool:
$ rados -p rbd-backup put test /dev/null $ rados -p rbd-backup setomapval test test testTry to flush the pool:
$ rados -p rbd-backup-cache cache-flush-evict-all test failed to flush /test: (16) Device or resource busy error from cache-flush-evict-all: (1) Operation not permittedAccessing the object now hangs:
$ rados -p rbd-backup getomapval test test
Mon reports blocked requests. Trying cache-flush-evict-all again hangs too. Restarting the OSD holding the object fixes the hang.
I would expect cache-flush-evict-all to evict all evictable objects and complain about those that aren't, but not leave them in limbo. Presumably something is left locked waiting for the objects to be evicted, even though that is impossible.