Bug #2002
closed
osd: racy push/pull for clones
Added by Sage Weil over 12 years ago.
Updated about 12 years ago.
Description
There is currently a race where:
- an adjacent clone is missing
- we (calculate some clone overlap? and) start pulling
- we get adjacent clone
- we get push, calc a different overlap, and then get confused.
Also, we don't work efficiently when pulling clones in parallel. We should probably serialize on each object_t so that we don't waste disk space. Recovery will probably still be faster.
sage@metropolis.ceph.dreamhost.com:osd.log.badpushpull
shows the (or similar) badness. workload was
kernel:
branch: master
interactive-on-error: true
roles:
- - mon.a
- mds.a
- osd.0
- osd.1
- osd.2
- - mon.b
- mon.c
- client.0
- osd.3
- osd.4
- osd.5
tasks:
- ceph:
btrfs: 1
log-whitelist:
- wrongly marked me down or wrong addr
conf:
osd:
debug ms: 1
debug osd: 20
mon:
debug ms: 10
debug mon: 20
- thrashosds:
- rados:
clients:
- client.0
objects: 500
op_weights:
delete: 50
read: 100
write: 100
snap_create: 50
snap_remove: 50
snap_rollback: 50
ops: 4000
on
2116f012eddfe3278fcdfeb5a2ddc877491d210d
- Target version set to v0.43
- Status changed from New to 7
- Source set to Development
reenabling this in my thrashing tests. if all goes well, i'll reenable in master under the assumption that sam's cleanups addressed the problem.
- Target version changed from v0.43 to v0.44
- Status changed from 7 to Resolved
haven't seen this in forever; looks fixed.
- Status changed from Resolved to 7
- Target version changed from v0.44 to v0.45
i take that back; this wasn't enabled in qa. adding to the teuthology ceph.conf file.
- Status changed from 7 to Resolved
Also available in: Atom
PDF