Bug #38135: Ceph is in HEALTH_ERR status with inconsistent PG after some rbd snapshot creating/removing task. - RADOS - Ceph

Actions

Copy link

Bug #38135

open

Ceph is in HEALTH_ERR status with inconsistent PG after some rbd snapshot creating/removing task.

Added by Bengen Tan about 5 years ago. Updated over 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Brad Hubbard

Category:

Snapshots

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v13.2.2

ceph-qa-suite:

rbd

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We observe Ceph is in HEALTH_ERR status with inconsistent PG after some rbd snapshot creating/removing task. Here are the environments and steps:
1, The ceph cluster has 108 OSDs.
2, Create a pool with 2048 PGs.
3, Generate 500K RBDs in the pool, each RBD is 20G
4, After the ceph performs some deep-scrub, the cluster is in HEALTH_OK status
5, Create snapshots for those RBDs, total snapshots are around 1.2M.
6, Make sure the Ceph cluster is in HEALTH_OK
7, Randomly creating and removing snapshot in parallel. We have about 6 clients do the creating/removing
8, We observe some snaptrim_wait, after about 12 hrs, we got about 3 inconsistent PGs.
Comparing to Ceph 12, we have 100K RBDs, with about 2M snapshots, we only get 1 inconsistent PG, with some crashed OSDs.
If need more details, please kindly let me know and I am happy to provide the test script and the detail.

Files

Download all files

create_rbd.sh (5.63 KB) create_rbd.sh		Bengen Tan, 02/01/2019 12:10 AM
create_snapshot.sh (7.63 KB) create_snapshot.sh		Bengen Tan, 02/01/2019 12:10 AM
delete_random_snapshot.sh (2.02 KB) delete_random_snapshot.sh		Bengen Tan, 02/01/2019 12:10 AM
snapshot_action.sh (3.48 KB) snapshot_action.sh		Bengen Tan, 02/01/2019 12:10 AM

Actions

Copy link Download all files

Updated by Bengen Tan about 5 years ago

File create_rbd.sh create_rbd.sh added
File create_snapshot.sh create_snapshot.sh added
File delete_random_snapshot.sh delete_random_snapshot.sh added
File snapshot_action.sh snapshot_action.sh added

1, create_rbd.sh, this is for creating rbds
2, create_snapshot.sh, this is for creating snapshots
3, delete_random_snapshot.sh, this is for deleting random snapshots
4, snapshot_action.sh, this performs creating and deleting snapshots in parallel.

Actions

Copy link