Project

General

Profile

Actions

Bug #48778

open

Setting quota triggered ops storms on meta pool

Added by Gerry D over 3 years ago. Updated over 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A few days ago, I toyed with setting a quota on a handful (less than 60) directories on my CephFS filesystem. At the time, I was running version 15.2.1. After a while, I noticed the number of operations on my meta pool was extremely high. Normally, this would be near-zero or sometimes a few hundreds. Now this was at about 10K. I removed all the quotas I had set in the hopes that this would resolve the issue, but to no avail.

As I was due for an upgrade, I decided to upgrade to 15.2.8 yesterday. At first, things seemed to be back to normal. However, after a few hours, a strange pattern started to emerge, where the operations on the meta pool would spike to 20K for approximately 1h40m and then go back to normal for approximately 3h15m, making this a recurring event every 5 hours or so.

I have attached a screenshot to illustrate what this looks like. The first "wave" of 10K IOPS is until I did the upgrade, the next 20K spikes are after the upgrade.

During these spikes, dump_ops_in_flight shows a bunch of pending operations, many of them pertaining to ceph.quota (despite there being no more quotas set), all with at least one event of "failed to xlock, waiting". Excerpt below.

        {
            "description": "client_request(mds.0:61937455 setxattr #0x10006a6465b ceph.quota caller_uid=0, caller_gid=0{})",
            "initiated_at": "2021-01-06T09:45:41.610493+0000",
            "age": 0.0010133469999999999,
            "duration": 0.001879094,
            "type_data": {
                "flag_point": "failed to xlock, waiting",
                "reqid": "mds.0:61937455",
                "op_type": "client_request",
                "client_info": {
                    "client": "mds.0",
                    "tid": 61937455
                },
                "events": [
                    {
                        "time": "2021-01-06T09:45:41.610493+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610493+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610493+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610493+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610598+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610607+0000",
                        "event": "failed to xlock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.612276+0000",
                        "event": "failed to xlock, waiting" 
                    }
                ]
            }
        },
        {
            "description": "client_request(mds.0:61937456 setxattr #0x100000014c1 ceph.quota caller_uid=0, caller_gid=0{})",
            "initiated_at": "2021-01-06T09:45:41.610515+0000",
            "age": 0.00099128499999999995,
            "duration": 0.00189281,
            "type_data": {
                "flag_point": "failed to xlock, waiting",
                "reqid": "mds.0:61937456",
                "op_type": "client_request",
                "client_info": {
                    "client": "mds.0",
                    "tid": 61937456
                },
                "events": [
                    {
                        "time": "2021-01-06T09:45:41.610515+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610515+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610515+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610515+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610609+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610615+0000",
                        "event": "failed to xlock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.612294+0000",
                        "event": "failed to xlock, waiting" 
                    }
                ]
            }
        },
        {
            "description": "client_request(client.75073:35290007 rmdir #0x10006acee5b/fonts 2021-01-06T09:45:41.609114+0000 caller_uid=3166, caller_gid=1000{})",
            "initiated_at": "2021-01-06T09:45:41.610660+0000",
            "age": 0.00084625900000000003,
            "duration": 0.001782741,
            "type_data": {
                "flag_point": "acquired locks",
                "reqid": "client.75073:35290007",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.75073",
                    "tid": 35290007
                },
                "events": [
                    {
                        "time": "2021-01-06T09:45:41.610660+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610661+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610660+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610663+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610673+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610690+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610697+0000",
                        "event": "failed to xlock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.612426+0000",
                        "event": "acquired locks" 
                    }
                ]
            }
        },
        {
            "description": "client_request(client.75073:35287363 setfilelock rule 2, type 2, owner 13411631674495090448, pid 9437, start 0, length 0, wait 1 #0x10006accd98 2021-01-06T09:44:08.248412+0000 caller_uid=2730, caller_gid=1000{})",
            "initiated_at": "2021-01-06T09:44:08.251306+0000",
            "age": 93.360200323000001,
            "duration": 93.36117634,
            "type_data": {
                "flag_point": "failed to add lock, waiting",
                "reqid": "client.75073:35287363",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.75073",
                    "tid": 35287363
                },
                "events": [
                    {
                        "time": "2021-01-06T09:44:08.251306+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-01-06T09:44:08.251308+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-01-06T09:44:08.251306+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-01-06T09:44:08.251314+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-01-06T09:44:08.251328+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2021-01-06T09:44:08.251353+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:44:08.251360+0000",
                        "event": "failed to add lock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:44:10.683395+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:44:10.683397+0000",
                        "event": "failed to add lock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:44:17.626619+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:44:17.626621+0000",
                        "event": "failed to add lock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:44:46.166481+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:44:46.166483+0000",
                        "event": "failed to add lock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:44:46.371158+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:44:46.371159+0000",
                        "event": "failed to add lock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:44:47.581885+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:44:47.581886+0000",
                        "event": "failed to add lock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:44:52.718687+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:44:52.718689+0000",
                        "event": "failed to add lock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:45:31.331148+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:45:31.331150+0000",
                        "event": "failed to add lock, waiting" 
                    }
                ]
            }
        },
        {
            "description": "client_request(mds.0:61937457 setxattr #0x10006a64656 ceph.quota caller_uid=0, caller_gid=0{})",
            "initiated_at": "2021-01-06T09:45:41.610708+0000",
            "age": 0.00079801799999999995,
            "duration": 0.0018484860000000001,
            "type_data": {
                "flag_point": "submit entry: journal_and_reply",
                "reqid": "mds.0:61937457",
                "op_type": "client_request",
                "client_info": {
                    "client": "mds.0",
                    "tid": 61937457
                },
                "events": [
                    {
                        "time": "2021-01-06T09:45:41.610708+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610708+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610708+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610708+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610728+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.610750+0000",
                        "event": "failed to xlock, waiting" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.612514+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2021-01-06T09:45:41.612523+0000",
                        "event": "submit entry: journal_and_reply" 
                    }
                ]
            }
        }
    ],

I'm at a loss as for what to do next, but I do think this looks like I hit a bug somehow, as I don't see why this happened in the first place, and escpecially why this is still happening now after removing all quotas and upgrading (and thus restarting everything).


Files

meta-iops.png (78.1 KB) meta-iops.png Gerry D, 01/07/2021 05:28 AM
traffic_ceph_vs_clients.png (102 KB) traffic_ceph_vs_clients.png Gerry D, 01/12/2021 05:52 AM
Actions

Also available in: Atom PDF