Bug #8595
closed
osd: client op blocks until backfill starts (dumpling)
Added by Sage Weil almost 10 years ago.
Updated over 9 years ago.
Description
observed on congress. logs at cephstore7098:~sage
we are hitting this:
if (head == backfill_pos) {
wait_for_backfill_pos(op);
return;
}
in do_op()
Both
6f975e35a1e29a01347e4a6709b54a0422e063dd
and
3d0d69fed09675fc466f6c5b2736ba674923823d
look like they need to be backported together.
I created wip-8595-dz on dumpling branch for further testing.
- Priority changed from Normal to Urgent
- Status changed from New to In Progress
- Assignee set to Sage Weil
- Status changed from In Progress to 7
- Status changed from 7 to In Progress
with this patch, i see filestore tripping over ENOENT on clone:
ubuntu@teuthology:/a/teuthology-2014-08-11_19:00:01-rados-dumpling-testing-basic-multi/417411
ubuntu@teuthology:/a/teuthology-2014-08-11_19:00:01-rados-dumpling-testing-basic-multi/417465
which were
description: rados/thrash/{clusters/fixed-2.yaml fs/ext4.yaml msgr-failures/few.yaml thrashers/morepggrow.yaml workloads/snaps-many-objects.yaml}
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/pggrow.yaml workloads/snaps-many-objects.yaml}
- Assignee deleted (
Sage Weil)
- Status changed from In Progress to 12
The simple fixes here seem insufficient (fail in qa). Haven't seen anybody else hitting this, which surprises me a bit.
- Status changed from 12 to 7
- Assignee set to Samuel Just
I think the least distasteful solution is to actually backport the last_backfill_started modifications. I'll start testing a branch.
It seems that we need to backport the update_range/scan_range changes (intended to avoid backfill related flushes) from just after dumpling as well.
0cb797b3e5064ec6b092dc28f23a85fcdbf96526
1f9e137b51d580bce6505773217bcca008225a47
wip-sam-dumpling-testing
- Status changed from 7 to Resolved
Also available in: Atom
PDF