Project

General

Profile

Actions

Bug #12717

closed

pool's statistic data not updated after doing Cache evict operation

Added by huang jun over 8 years ago. Updated over 8 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We setup a cache tier and set target_max_bytes to 10240,
After put 2 objects(each 40MB) by 'rados put' to the cache pool,
which will trigger the cache pool's agent_evict operation
the ceph df and rados df shows the cache pool still have 2 objects,
but the 'rados ls' shows there is no objects.

It's because that in agent_maybe_evict(), we create a repop without op,
in the eval_repop(), the repop->all_committed is 0, so it will not execute publish_stats_to_osd(),

// ondisk?
  if (repop->all_committed) {
    if (repop->ctx->op && !repop->log_op_stat) {
      log_op_stats(repop->ctx);
      repop->log_op_stat = true;
    }
    publish_stats_to_osd();
Actions #1

Updated by Kefu Chai over 8 years ago

  • Description updated (diff)
  • Status changed from New to Fix Under Review
  • Source changed from other to Community (user)
Actions #2

Updated by Kefu Chai over 8 years ago

huangjun,

i am trying to repeat your test:


$ ceph osd pool create slow  1 1
$ ceph osd pool set fast target_max_objects  1
$ ceph osd pool set fast target_max_bytes 20
$ ceph osd pool set fast hit_set_count 1
$ ls -lh /tmp/doc.tgz
-rw-r--r-- 1 kefu kefu 2.8M Aug 19 16:41 /tmp/doc.tgz
$ rados -p slow put obj3 /tmp/doc.tgz
$ rados -p slow put obj4 /tmp/doc.tgz
$ ./rados df
pool name                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
cephfs_data                0            0            0            0            0            0            0            0            0
cephfs_metadata            5           54            0            0            0            0            0          117           17
fast                       1            1            0            0            0            0            0            4         5578
rbd                        0            0            0            0            0            0            0            0            0
slow                    5577            2            0            0            0            0            0            2         5578
  total used       137587004           57
  total avail      409409908
  total space      576342804
$ ./ceph df
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED
    549G      390G         131G         23.87
POOLS:
    NAME                ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd                 0          0         0          130G           0
    cephfs_data         1          0         0          130G           0
    cephfs_metadata     2       4196         0          130G          54
    slow                3      5576k         0          130G           2
    fast                4         84         0          130G           1

$ ./rados -p fast ls
$ ./rados -p fast ls -N .ceph-internal
hit_set_4.0_archive_2015-08-19 11:09:38.600516Z_2015-08-19 11:09:48.705156Z
$ ./rados -p slow ls
obj4
obj3

so the only object left in the cache tier is the hitset archive object.

It's because that in agent_maybe_evict(), we create a repop without op,

i doubt it. _delete_oid(ctx, true) is called in this function. and it does create stash/remove op. am i missing anything?

Actions #3

Updated by huang jun over 8 years ago

hi,kefu
My ceph version is 0.80.7 maybe a bit older,
I will try it in ceph newest master branch.

Actions #4

Updated by huang jun over 8 years ago

I thinks you should put objects to pool 'fast' not 'slow', bc 'fast' is a cache pool

Actions #5

Updated by Kefu Chai over 8 years ago

huangjun,

i think if we are using the "fast" pool as the cache tier of the "slow" pool. it is supposed to be a transparent layer over the "slow" one, hence we should put using the "slow" pool.

Actions #6

Updated by huang jun over 8 years ago

hmm, but the agent_cache_evict and agent_cache_flush is used only in cache pool, the base pool will not do this.

Actions #7

Updated by huang jun over 8 years ago

hi,kefu
I think https://github.com/ceph/ceph/commit/78b1fb5e9440dcd4ef99301c3ac857385e870cf3 have resolved this.
In version 0.80.7 if there is no op in repop, then the eval_repop will do nothing, so there is no chance to call publish_stats_to_osd() to update pg stats.
In current master branch, if repops are all_committed, it will call publish_stats_to_osd(), so the
pg stats will update in osd::tick().

So this maybe a replicated bug.
what's your opinion?

Actions #8

Updated by Kefu Chai over 8 years ago

  • Assignee set to Kefu Chai
Actions #9

Updated by Kefu Chai over 8 years ago

huangjun, if https://github.com/ceph/ceph/commit/78b1fb5e9440dcd4ef99301c3ac857385e870cf3 addresses your issue, maybe we can consider this issue as a dup.

I will try it in ceph newest master branch.

may i learn your test result?

Actions #10

Updated by huang jun over 8 years ago

in the newest master branch, it works as expected,
the pool status was updated after evict operation,
i think you can close this issue.
sorry for the replicated issue report.

Actions #11

Updated by Loïc Dachary over 8 years ago

  • Status changed from Fix Under Review to Duplicate
Actions

Also available in: Atom PDF