Project

General

Profile

Bug #14962

PG::publish_stats_to_osd() does not get called when trimming snap objects (TestStrays.test_snapshot_remove failure)

Added by Zheng Yan about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/zyan-2016-03-02_06:21:11-fs-wip-zyan-testing-testing-basic-mira/37805/

'ceph df' shows the data pool still contains 2 objects. This is OSD issue, it seem that PG::publish_stats_to_osd() is not called when trimming snap objects

Associated revisions

Revision d9af48ab (diff)
Added by Greg Farnum about 8 years ago

ReplicatedPG: be more careful about calling publish_stats_to_osd() correctly

We had moved the call out of eval_repop into a lambda, but that left out
a few other code paths and is fragile. So just call it unconditionally in
eval_repop() when we're done with the repop instead.

Fixes: #14962

Signed-off-by: Greg Farnum <>

History

#1 Updated by Zheng Yan about 8 years ago

the latest good commit is 82896f2c6e2655db8c9a05a7cc31a9e65c9aa350

#2 Updated by Zheng Yan about 8 years ago

  • Project changed from CephFS to Ceph
  • Subject changed from Failure in TestStrays.test_snapshot_remove to PG::publish_stats_to_osd() does not get called when trimming snap objects

the test expects num_objects in data pool become zero.Due to this bug, it fails.

#3 Updated by Greg Farnum about 8 years ago

  • Assignee set to Zheng Yan

Is there something tricky about invoking that call when trimming? It'll probably go in faster if you do it and we test through the FS suite first. :)

#4 Updated by Samuel Just about 8 years ago

  • Priority changed from Normal to Urgent

#5 Updated by Zheng Yan about 8 years ago

if we just trim old snaps, num_object in 'ceph df' does not change (even we wait for a long time). If we create and delete lots of objects (make sure all PG get modified), num_object in 'ceph df' changes immidiately

#6 Updated by Greg Farnum about 8 years ago

  • Status changed from New to In Progress
  • Assignee changed from Zheng Yan to Greg Farnum

Okay, the problem actually got started in cc1b2c6f342b17d6e304560c23f4ce310d6690d9 ("ReplicatedPG: move client reply handling out of eval_repop"), in which the call to publish_stats_to_osd() got moved out of ReplicatedPG::eval_repop and into ReplicatedPG::execute_ctx()'s ctx->register_on_commit() lambda.

But that path is not used for the local ops which trim snapshots, and I think also the path followed in finish_promote(). That leads me to think we just want to unconditionally call publish_stats_to_osd() in ReplicatedPG::eval_repop(), rather than trying to bunch it up with the log_op_stats() call in the execute_ctx() lambda.

#7 Updated by John Spray about 8 years ago

  • Subject changed from PG::publish_stats_to_osd() does not get called when trimming snap objects to PG::publish_stats_to_osd() does not get called when trimming snap objects (TestStrays.test_snapshot_remove failure)

Updating description with test name so that I can find it on a search

#8 Updated by Greg Farnum about 8 years ago

  • Assignee changed from Greg Farnum to Samuel Just

This passed fs suite testing (although of course that doesn't exercise RADOS bits at all). http://pulpito.ceph.com/gregf-2016-03-10_23:53:12-fs-greg-fs-testing-310-1---basic-mira/

#9 Updated by Sage Weil about 8 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF