Bug #8595
osd: client op blocks until backfill starts (dumpling)
0%
Description
observed on congress. logs at cephstore7098:~sage
History
#1 Updated by Sage Weil almost 10 years ago
we are hitting this:
if (head == backfill_pos) { wait_for_backfill_pos(op); return; }
in do_op()
#2 Updated by Sage Weil almost 10 years ago
6f975e35a1e29a01347e4a6709b54a0422e063dd fixed it in emperor, but not intentionally.
#3 Updated by David Zafman almost 10 years ago
Both
6f975e35a1e29a01347e4a6709b54a0422e063dd
and
3d0d69fed09675fc466f6c5b2736ba674923823d
look like they need to be backported together.
I created wip-8595-dz on dumpling branch for further testing.
#4 Updated by Sage Weil over 9 years ago
- Priority changed from Normal to Urgent
#5 Updated by Sage Weil over 9 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
#6 Updated by Sage Weil over 9 years ago
- Status changed from In Progress to 7
#7 Updated by Sage Weil over 9 years ago
- Status changed from 7 to In Progress
with this patch, i see filestore tripping over ENOENT on clone:
ubuntu@teuthology:/a/teuthology-2014-08-11_19:00:01-rados-dumpling-testing-basic-multi/417411
ubuntu@teuthology:/a/teuthology-2014-08-11_19:00:01-rados-dumpling-testing-basic-multi/417465
which were
description: rados/thrash/{clusters/fixed-2.yaml fs/ext4.yaml msgr-failures/few.yaml thrashers/morepggrow.yaml workloads/snaps-many-objects.yaml}
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/pggrow.yaml workloads/snaps-many-objects.yaml}
#8 Updated by Sage Weil over 9 years ago
- Assignee deleted (
Sage Weil)
#9 Updated by Sage Weil over 9 years ago
- Status changed from In Progress to 12
The simple fixes here seem insufficient (fail in qa). Haven't seen anybody else hitting this, which surprises me a bit.
#10 Updated by Samuel Just over 9 years ago
- Status changed from 12 to 7
- Assignee set to Samuel Just
I think the least distasteful solution is to actually backport the last_backfill_started modifications. I'll start testing a branch.
#11 Updated by Samuel Just over 9 years ago
It seems that we need to backport the update_range/scan_range changes (intended to avoid backfill related flushes) from just after dumpling as well.
0cb797b3e5064ec6b092dc28f23a85fcdbf96526
1f9e137b51d580bce6505773217bcca008225a47
wip-sam-dumpling-testing
#12 Updated by Sage Weil over 9 years ago
- Status changed from 7 to Resolved