Bug #37568: CephFS remove snapshot result in slow ops - Ceph - Ceph

Actions

Copy link

Bug #37568

closed

CephFS remove snapshot result in slow ops

Added by Francois Legrand over 5 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

Zheng Yan

Category:

Target version:

v14.0.0

% Done:

Source:

Community (dev)

Tags:

Backport:

mimic,luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v13.2.2

ceph-qa-suite:

Pull request ID:

25481

Crash signature (v1):

Crash signature (v2):

Description

Hello,
I have a ceph mimic cluster with cephfs.
I create few snapshots (mkdir .snap/test etc...) in different directories. So far so good.
But when I delete the snapshots (rmdir .snap/test etc...) the cluster get in a warn state with :

ceph -w
cluster:
id: 2fbbf089-a846-4c09-90bc-1dd9bd7af30f
health: HEALTH_WARN
3 slow ops, oldest one blocked for 11415 sec, mon.lpnceph01 has slow ops
...
2018-12-06 16:54:56.356518 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11410 sec, mon.lpnceph01 has slow ops (SLOW_OPS)
2018-12-06 16:55:05.856294 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11415 sec, mon.lpnceph01 has slow ops (SLOW_OPS)
2018-12-06 16:55:10.856657 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11425 sec, mon.lpnceph01 has slow ops (SLOW_OPS)

It's obviously related to the removal of snapshots because :

ceph daemon mon.lpnceph01 ops {
"ops": [ {
"description": "remove_snaps({28=[3,4]} v0)",
"initiated_at": "2018-12-06 13:44:41.396039",
"age": 14549.148016,
"duration": 14549.148028,
"type_data": {
"events": [ {
"time": "2018-12-06 13:44:41.396039",
"event": "initiated"
}, {
"time": "2018-12-06 13:44:41.396039",
"event": "header_read"
}, {
"time": "2018-12-06 13:44:41.396042",
"event": "throttled"
}, {
"time": "2018-12-06 13:44:41.396089",
"event": "all_read"
}, {
"time": "2018-12-06 13:44:41.396186",
"event": "dispatched"
}, {
"time": "2018-12-06 13:44:41.396190",
"event": "mon:_ms_dispatch"
}, {
"time": "2018-12-06 13:44:41.396191",
"event": "mon:dispatch_op"
}, {
"time": "2018-12-06 13:44:41.396192",
"event": "psvc:dispatch"
}, {
"time": "2018-12-06 13:44:41.396205",
"event": "osdmap:preprocess_query"
}, {
"time": "2018-12-06 13:44:41.396214",
"event": "osdmap:preprocess_remove_snaps"
}, {
"time": "2018-12-06 13:44:41.396220",
"event": "forward_request_leader"
}, {
"time": "2018-12-06 13:44:41.396258",
"event": "forwarded"
}
],
"info": {
"seq": 250448,
"src_is_mon": false,
"source": "mds.0 xxx.xxx.xxx.xxx:6800/2790459226",
"forwarded_to_leader": true
}
}
},
...

I tryed to add in ceph.conf the lines :
[osd]
osd snap trim sleep = 0.6

as suggested in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031227.html
but it doesn't solve the problem.

I had to restart the service :
systemctl restart ceph-mon@lpnceph01.service

to get the cluster back to healty status.

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Updated by Patrick Donnelly over 5 years ago

Subject changed from Cephfs remove snapshot result in slow ops to CephFS remove snapshot result in slow ops
Assignee set to Zheng Yan
Priority changed from Normal to High
Target version set to v14.0.0