Bug #37568
closedCephFS remove snapshot result in slow ops
Description
I have a ceph mimic cluster with cephfs.
I create few snapshots (mkdir .snap/test etc...) in different directories. So far so good.
But when I delete the snapshots (rmdir .snap/test etc...) the cluster get in a warn state with :
- ceph -w
cluster:
id: 2fbbf089-a846-4c09-90bc-1dd9bd7af30f
health: HEALTH_WARN
3 slow ops, oldest one blocked for 11415 sec, mon.lpnceph01 has slow ops
...
2018-12-06 16:54:56.356518 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11410 sec, mon.lpnceph01 has slow ops (SLOW_OPS)
2018-12-06 16:55:05.856294 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11415 sec, mon.lpnceph01 has slow ops (SLOW_OPS)
2018-12-06 16:55:10.856657 mon.lpnceph-mon01 [WRN] Health check update: 3 slow ops, oldest one blocked for 11425 sec, mon.lpnceph01 has slow ops (SLOW_OPS)
- ceph daemon mon.lpnceph01 ops
{
"ops": [ {
"description": "remove_snaps({28=[3,4]} v0)",
"initiated_at": "2018-12-06 13:44:41.396039",
"age": 14549.148016,
"duration": 14549.148028,
"type_data": {
"events": [ {
"time": "2018-12-06 13:44:41.396039",
"event": "initiated"
}, {
"time": "2018-12-06 13:44:41.396039",
"event": "header_read"
}, {
"time": "2018-12-06 13:44:41.396042",
"event": "throttled"
}, {
"time": "2018-12-06 13:44:41.396089",
"event": "all_read"
}, {
"time": "2018-12-06 13:44:41.396186",
"event": "dispatched"
}, {
"time": "2018-12-06 13:44:41.396190",
"event": "mon:_ms_dispatch"
}, {
"time": "2018-12-06 13:44:41.396191",
"event": "mon:dispatch_op"
}, {
"time": "2018-12-06 13:44:41.396192",
"event": "psvc:dispatch"
}, {
"time": "2018-12-06 13:44:41.396205",
"event": "osdmap:preprocess_query"
}, {
"time": "2018-12-06 13:44:41.396214",
"event": "osdmap:preprocess_remove_snaps"
}, {
"time": "2018-12-06 13:44:41.396220",
"event": "forward_request_leader"
}, {
"time": "2018-12-06 13:44:41.396258",
"event": "forwarded"
}
],
"info": {
"seq": 250448,
"src_is_mon": false,
"source": "mds.0 xxx.xxx.xxx.xxx:6800/2790459226",
"forwarded_to_leader": true
}
}
},
...
I tryed to add in ceph.conf the lines :
[osd]
osd snap trim sleep = 0.6
as suggested in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031227.html
but it doesn't solve the problem.
I had to restart the service :
systemctl restart ceph-mon@lpnceph01.service
to get the cluster back to healty status.
Updated by Patrick Donnelly over 5 years ago
- Subject changed from Cephfs remove snapshot result in slow ops to CephFS remove snapshot result in slow ops
- Assignee set to Zheng Yan
- Priority changed from Normal to High
- Target version set to v14.0.0
Updated by Zheng Yan over 5 years ago
- Project changed from CephFS to Ceph
- Category deleted (
89) - Status changed from New to Fix Under Review
- Backport set to mimic,luminous
- Pull request ID set to 37568
Updated by Zheng Yan over 5 years ago
- Pull request ID changed from 37568 to 25481
Updated by Patrick Donnelly over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37693: mimic: CephFS remove snapshot result in slow ops added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37694: luminous: CephFS remove snapshot result in slow ops added
Updated by Patrick Donnelly over 5 years ago
- Has duplicate Bug #37782: Snapshot removal hangs added
Updated by Nathan Cutler about 5 years ago
- Status changed from Pending Backport to Resolved
Updated by Patrick Donnelly over 4 years ago
- Has duplicate Bug #24088: mon: slow remove_snaps op reported in cluster health log added
Updated by Janek Bevendorff about 3 years ago
I can reproduce this on 15.2.8. I have 30 PGs in the active+clean+snaptrim state and about 1500-2500 slow ops. This happens regularly.
Updated by Patrick Donnelly about 3 years ago
Janek Bevendorff wrote:
I can reproduce this on 15.2.8. I have 30 PGs in the active+clean+snaptrim state and about 1500-2500 slow ops. This happens regularly.
This may be unrelated. Can you create a new tracker ticket with logs/etc.
Updated by Janek Bevendorff about 3 years ago
I'm actually not so sure anymore if we are really having an issue here. If I have anything, I'll open a new issue.