Project

General

Profile

Actions

Support #23839

open

RGW GC Stuxk

Added by sean redmond about 6 years ago. Updated almost 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

We are currently using Jewel 10.2.7 and recently, we have been experiencing some issues with objects being deleted using the gc. After a bucket was unsuccessfully deleted using –purge-objects (first error next discussed occurred), all of the rgw’s are occasionally becoming unresponsive and require a restart of the processes before they will accept requests again. On investigation of the garbage collection, it has an enormous list which we are struggling to count the length of, but seem stuck on a particular object which is not updating, shown in the logs below:

2018-04-23 15:16:04.101660 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.290071.4_XXXXXXX/XXXX/XX/XX/XXXXXXX.ZIP

2018-04-23 15:16:04.104231 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_1

2018-04-23 15:16:04.105541 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_2

2018-04-23 15:16:04.176235 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_3

2018-04-23 15:16:04.178435 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_4

2018-04-23 15:16:04.250883 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_5

2018-04-23 15:16:04.297912 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_6

2018-04-23 15:16:04.298803 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_7

2018-04-23 15:16:04.320202 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_8

2018-04-23 15:16:04.340124 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_9

2018-04-23 15:16:04.383924 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_10

2018-04-23 15:16:04.386865 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_11

2018-04-23 15:16:04.389067 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_12

2018-04-23 15:16:04.413938 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_13

2018-04-23 15:16:04.487977 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_14

2018-04-23 15:16:04.544235 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_1

2018-04-23 15:16:04.546978 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_2

2018-04-23 15:16:04.598644 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_3

2018-04-23 15:16:04.629519 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_4

2018-04-23 15:16:04.700492 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_5

2018-04-23 15:16:04.765798 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_6

2018-04-23 15:16:04.772774 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_7

2018-04-23 15:16:04.846379 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_8

2018-04-23 15:16:04.935023 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_9

2018-04-23 15:16:04.937229 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_10

2018-04-23 15:16:04.968289 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_11

2018-04-23 15:16:05.005194 7f1fdcc29a00 0 gc::process: removing .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_12

We seem completely unable to get this deleted, and nothing else of immediate concern is flagging up as a potential cause of all RGWs become unresponsive at the same time. On the bucket containing this object (the one we originally tried to purge), I have attempted a further purge passing the “—bypass-gc” parameter to it, but this also resulted in all rgws becoming unresponsive within 30 minutes and so I terminated the operation and restarted the rgws again.

The bucket we attempted to remove has no shards and I have attached the details below. 90% of the contents of the bucket have already been successfully removed to our knowledge, and the bucket had no sharding (old bucket, sharding is now active for new buckets).

root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket stats --bucket=xxxxxxxxxxxx

{

"bucket": "xxxxxxxxxxxx",
"pool": ".rgw.buckets",
"index_pool": ".rgw.buckets.index",
"id": "default.290071.4",
"marker": "default.290071.4",
"owner": "yyyyyyyyyy",
"ver": "0#107938549",
"master_ver": "0#0",
"mtime": "2014-10-24 14:58:48.955805",
"max_marker": "0#",
"usage": {
"rgw.none": {
"size_kb": 0,
"size_kb_actual": 0,
"num_objects": 0
},
"rgw.main": {
"size_kb": 186685939,
"size_kb_actual": 189914068,
"num_objects": 1419528
},
"rgw.multimeta": {
"size_kb": 0,
"size_kb_actual": 0,
"num_objects": 24
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}

}

If anyone has any thoughts, they’d be greatly appreciated!

Kind Regards,

Actions #1

Updated by Nathan Cutler about 6 years ago

  • Project changed from Ceph to rgw
Actions #2

Updated by sean redmond almost 6 years ago

To update this case, The cluster was updated to 10.2.10 and a inconsistent PG was found in .rgw.bucket.index - once repaired the GC process seems to be progressing. - It appears it may have been broken for some time - in future releases does ceph health update if the GC backlog is very high?

Actions #3

Updated by David Turner almost 6 years ago

In Luminous 12.2.2 we had a GC backlog of over 200M objects and there was no notification from the cluster that this was the case. Our GC was using 40% of our available cluster space. I think this would be a very useful thing to add to the output of the cluster status or somehow discoverable without doing weird grep's and wc's from the output of listing the gc which can take longer than a day if it gets large enough.

Actions

Also available in: Atom PDF