Bug #12892: Image'snapshot too much cause the "Deep Scrub" function running too long, eventually led to the Osd down - Ceph - Ceph

Actions

Copy link

Bug #12892

closed

Image'snapshot too much cause the "Deep Scrub" function running too long, eventually led to the Osd down

Added by huanwen ren over 8 years ago. Updated over 8 years ago.

Status:

Rejected

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v0.94

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Our environment has some snapshots of the Image created too much, when carries on the Deep the Scrub will lead to take pg lock time is too long, eventually regularly report the state of the OSD timer timeout, OSD is MON buy for the DOWN state
gdb info

????? ceph??ceph245
gdb) s objects.back()
Cannot evaluate function -- may be inlined
(gdb) p objects.back()
Cannot evaluate function -- may be inlined
(gdb) s objects.back().get_filestore_key()
Cannot evaluate function -- may be inlined
(gdb) p objects
$7 = std::vector of length 25, capacity 50 = {{oid = {name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {
      val = 99376}, hash = 4141438085, max = false, static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99379}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99381}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99382}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99385}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99386}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99388}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99390}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99392}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99395}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99397}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99398}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99401}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99403}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99405}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99407}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99409}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99411}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99413}, hash = 4141438085, max = false, 
---Type <return> to continue, or q <return> to quit---
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99415}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99417}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99419}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99421}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99423}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}, {oid = {
      name = "rb.0.1cdc6d.238e1f29.", '0' <repeats 12 times>}, snap = {val = 99425}, hash = 4141438085, max = false, 
    static POOL_IS_TEMP = -1, pool = 34, nspace = "", key = ""}}
(gdb) n
172       filestore_hobject_key_t get_filestore_key() const {

Files

Download all files

????1.txt (12.2 KB) ????1.txt		huanwen ren, 09/01/2015 08:06 AM
????2.txt (10.4 KB) ????2.txt		huanwen ren, 09/01/2015 08:06 AM
chunky_scrub?? (302 KB) chunky_scrub??		huanwen ren, 09/10/2015 03:36 AM

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Status changed from New to Need More Info

Could you please explain how to reproduce the problem ? Here is a template:

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Actions

Copy link

Updated by huanwen ren over 8 years ago

Steps to Reproduce:
1.create POOL
rados mkpool pool_1
2.create Image in POOL
rbd create pool_1/im1 --size 10240 --image-format 2
3.Create 100000 Snap for this image
rbd snap create --image pool_1/im1 --snap snap_pool1_im1
rbd snap create --image pool_1/im1 --snap snap_pool1_im2
......
rbd snap create --image pool_1/im1 --snap snap_pool1_im100000
4.Open the associated with the image of the OSD deep the scrub

Actions

Copy link

Updated by huanwen ren over 8 years ago

analysis of causes?

1.the first kind is underway PG::chunky_scrub NEW_CHUNK stage, get into the while (! Boundary_found) loop, leading to excess consumption of scheduling time slice
2. the second is for PG::chunky_scrub BUILD_MAP stage, get into be_deep_scrub function in the while loop, leading to excess consumption of scheduling time slice

Above two kinds of circumstances are plunged into a while loop in reason is due to a test environment to a large number of Object snapshot (from the GDB mode information as you can see a single snapshot objectd id 100000 level), and the deep ceph the scrub in testing was not considered when this kind of circumstance, so lead to a cycle was only 25 Object detection process, becomes the Object of examination 25 * 100000 levels of the process

Actions

Copy link

Updated by huanwen ren over 8 years ago

As a result of the as:
In PG::scrub need to get in the scrub function "PG lock", and a single the scrub time is too long, lead to occupy "PG lock" is too long cause the OSD down

Actions

Copy link

Updated by huanwen ren over 8 years ago

Limit the number of snapshots can solve this problem

https://github.com/ceph/ceph/pull/5854

Actions

Copy link

Updated by Josh Durgin over 8 years ago

It seems like making chunky scrub understand smaller chunks would be a better fix, or simply not taking so many snapshots.

Do you have higher-level tools on top of rbd that could implement the limit on snapshots? OpenStack cinder, for example, has a quota for number of snapshots.

Actions

Copy link

Updated by huanwen ren over 8 years ago

File chunky_scrub?? chunky_scrub?? added

Actions

Copy link

Updated by huanwen ren over 8 years ago

Josh Durgin wrote:

It seems like making chunky scrub understand smaller chunks would be a better fix, or simply not taking so many snapshots.

Do you have higher-level tools on top of rbd that could implement the limit on snapshots? OpenStack cinder, for example, has a quota for number of snapshots.

yes,Modify "chunky the scrub" process is the fundamental solution,I modified code just for configurable limit snapshot

We use the ceph application scenario can have a variety of, not only openstack, so in the RBD With limit can be adapted to all of the scene

Actions

Copy link

Updated by Samuel Just over 8 years ago

Status changed from Need More Info to Rejected

We need a way to limit the number of snapshots both at the osd level and at the client level (rbd). The trick is that snap trimming is asyncronous, so there may be more snapshots actually present on osds than the client believes exists. Part of the solution therefore needs to involve bounding how far behind on trimming the osds can get. I'm closing this bug since we don't actually want scrub to not see all of the clones in one chunk -- it would limit the amount of metadata scrubbing we could do.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #12892

Image'snapshot too much cause the "Deep Scrub" function running too long, eventually led to the Osd down

Updated by Loïc Dachary over 8 years ago

Updated by huanwen ren over 8 years ago

Updated by huanwen ren over 8 years ago

Updated by huanwen ren over 8 years ago

Updated by huanwen ren over 8 years ago

Updated by Josh Durgin over 8 years ago

Updated by huanwen ren over 8 years ago

Updated by huanwen ren over 8 years ago

Updated by Samuel Just over 8 years ago