Actions
Bug #8935
closedoperations not idempotent when enabling cache
Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
consider:
- no cache
- send delete to base tier
- base does delete, replies
- mon set overlay
- client resends delete to cache
- (client discards original delete reply)
- cache promotes whiteout (object doesn't exist any more)
- cache replies with -ENOENT
i'm not quite sure how this should work. perhaps this is part of the more general problem of using the pg log to enforce idempotency. perhaps, instead, it should be a shorter per-object history that gets promoted. even that, though, doesn't handle the deletion case...
here is the job that reproduces this (just hit it 1 out of 10 attempts):
tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - slow request - exec: client.0: - ceph osd pool create base 4 - ceph osd pool create cache 4 - ceph osd tier add base cache - ceph osd tier cache-mode cache writeback - ceph osd tier set-overlay base cache - ceph osd pool set cache hit_set_type bloom - ceph osd pool set cache hit_set_count 8 - ceph osd pool set cache hit_set_period 60 - ceph osd pool set cache target_max_objects 500 - background_exec: mon.a: - while true - do sleep 30 - echo forward - ceph osd tier cache-mode cache forward - sleep 10 - ceph osd pool set cache cache_target_full_ratio .001 - echo cache-try-flush-evict-all - rados -p cache cache-try-flush-evict-all - sleep 5 - echo cache-flush-evict-all - rados -p cache cache-flush-evict-all - sleep 5 - echo remove overlay - ceph osd tier remove-overlay base - sleep 20 - echo add writeback overlay - ceph osd tier cache-mode cache writeback - ceph osd pool set cache cache_target_full_ratio .8 - ceph osd tier set-overlay base cache - done - rados: clients: - client.0 max_seconds: 600 objects: 10000 op_weights: copy_from: 50 delete: 50 read: 100 write: 100 ops: 400000 pools: - base size: 1024 overrides: ceph: conf: global: ms inject socket failures: 500 mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 osd op thread timeout: 60 osd sloppy crc: true fs: btrfs log-whitelist: - slow request install: ceph: branch: wip-8931 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - osd.3 - osd.4 - osd.5 - client.0
Actions