Bug #37568: CephFS remove snapshot result in slow ops - Ceph - Ceph

Actions

Copy link

Bug #37568

closed

CephFS remove snapshot result in slow ops

Added by Francois Legrand over 5 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

Zheng Yan

Category:

Target version:

v14.0.0

% Done:

Source:

Community (dev)

Tags:

Backport:

mimic,luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v13.2.2

ceph-qa-suite:

Pull request ID:

25481

Crash signature (v1):

Crash signature (v2):

Description

Hello,
I have a ceph mimic cluster with cephfs.
I create few snapshots (mkdir .snap/test etc...) in different directories. So far so good.
But when I delete the snapshots (rmdir .snap/test etc...) the cluster get in a warn state with :

ceph -w
cluster:
id: 2fbbf089-a846-4c09-90bc-1dd9bd7af30f
health: HEALTH_WARN
3 slow ops, oldest one blocked for 11415 sec, mon.lpnceph01 has slow ops
...
2018-12-06 16:54:56.356518 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11410 sec, mon.lpnceph01 has slow ops (SLOW_OPS)
2018-12-06 16:55:05.856294 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11415 sec, mon.lpnceph01 has slow ops (SLOW_OPS)
2018-12-06 16:55:10.856657 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11425 sec, mon.lpnceph01 has slow ops (SLOW_OPS)

It's obviously related to the removal of snapshots because :

ceph daemon mon.lpnceph01 ops {
"ops": [ {
"description": "remove_snaps({28=[3,4]} v0)",
"initiated_at": "2018-12-06 13:44:41.396039",
"age": 14549.148016,
"duration": 14549.148028,
"type_data": {
"events": [ {
"time": "2018-12-06 13:44:41.396039",
"event": "initiated"
}, {
"time": "2018-12-06 13:44:41.396039",
"event": "header_read"
}, {
"time": "2018-12-06 13:44:41.396042",
"event": "throttled"
}, {
"time": "2018-12-06 13:44:41.396089",
"event": "all_read"
}, {
"time": "2018-12-06 13:44:41.396186",
"event": "dispatched"
}, {
"time": "2018-12-06 13:44:41.396190",
"event": "mon:_ms_dispatch"
}, {
"time": "2018-12-06 13:44:41.396191",
"event": "mon:dispatch_op"
}, {
"time": "2018-12-06 13:44:41.396192",
"event": "psvc:dispatch"
}, {
"time": "2018-12-06 13:44:41.396205",
"event": "osdmap:preprocess_query"
}, {
"time": "2018-12-06 13:44:41.396214",
"event": "osdmap:preprocess_remove_snaps"
}, {
"time": "2018-12-06 13:44:41.396220",
"event": "forward_request_leader"
}, {
"time": "2018-12-06 13:44:41.396258",
"event": "forwarded"
}
],
"info": {
"seq": 250448,
"src_is_mon": false,
"source": "mds.0 xxx.xxx.xxx.xxx:6800/2790459226",
"forwarded_to_leader": true
}
}
},
...

I tryed to add in ceph.conf the lines :
[osd]
osd snap trim sleep = 0.6

as suggested in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031227.html
but it doesn't solve the problem.

I had to restart the service :
systemctl restart ceph-mon@lpnceph01.service

to get the cluster back to healty status.

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #37568

CephFS remove snapshot result in slow ops

Updated by Patrick Donnelly over 5 years ago

Updated by Zheng Yan over 5 years ago

Updated by Zheng Yan over 5 years ago

Updated by Patrick Donnelly over 5 years ago

Updated by Nathan Cutler over 5 years ago

Updated by Nathan Cutler over 5 years ago

Updated by Patrick Donnelly over 5 years ago

Updated by Nathan Cutler about 5 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Janek Bevendorff over 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Janek Bevendorff about 3 years ago