Bug #2002
closedosd: racy push/pull for clones
0%
Description
There is currently a race where:
- an adjacent clone is missing
- we (calculate some clone overlap? and) start pulling
- we get adjacent clone
- we get push, calc a different overlap, and then get confused.
Also, we don't work efficiently when pulling clones in parallel. We should probably serialize on each object_t so that we don't waste disk space. Recovery will probably still be faster.
Updated by Sage Weil about 12 years ago
sage@metropolis.ceph.dreamhost.com:osd.log.badpushpull
shows the (or similar) badness. workload was
kernel: branch: master interactive-on-error: true roles: - - mon.a - mds.a - osd.0 - osd.1 - osd.2 - - mon.b - mon.c - client.0 - osd.3 - osd.4 - osd.5 tasks: - ceph: btrfs: 1 log-whitelist: - wrongly marked me down or wrong addr conf: osd: debug ms: 1 debug osd: 20 mon: debug ms: 10 debug mon: 20 - thrashosds: - rados: clients: - client.0 objects: 500 op_weights: delete: 50 read: 100 write: 100 snap_create: 50 snap_remove: 50 snap_rollback: 50 ops: 4000
on 2116f012eddfe3278fcdfeb5a2ddc877491d210d
Updated by Sage Weil about 12 years ago
- Status changed from New to 7
- Source set to Development
reenabling this in my thrashing tests. if all goes well, i'll reenable in master under the assumption that sam's cleanups addressed the problem.
Updated by Sage Weil about 12 years ago
- Target version changed from v0.43 to v0.44
Updated by Sage Weil about 12 years ago
- Status changed from 7 to Resolved
haven't seen this in forever; looks fixed.
Updated by Sage Weil about 12 years ago
- Status changed from Resolved to 7
- Target version changed from v0.44 to v0.45
i take that back; this wasn't enabled in qa. adding to the teuthology ceph.conf file.