Project

General

Profile

Actions

Bug #22010

open

Failures removing RGW buckets with --bypass-gc

Added by Bryan Stillwell over 6 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When trying to remove 6000 buckets that were used during a POC I ran into the following couple of problems when attempting to use --bypass-gc:

  1. radosgw-admin bucket rm --bucket=sg2pl598 --purge-objects --bypass-gc
    2017-10-31 09:21:04.111599 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=4194304 stripe_ofs=4194304 part_ofs=0 rule->part_size=15728640
    2017-10-31 09:21:04.121664 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=8388608 stripe_ofs=8388608 part_ofs=0 rule->part_size=15728640
    2017-10-31 09:21:04.126356 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=12582912 stripe_ofs=12582912 part_ofs=0 rule->part_size=15728640
    2017-10-31 09:21:04.130582 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=15728640 stripe_ofs=15728640 part_ofs=15728640 rule->part_size=15728640
    2017-10-31 09:21:04.135791 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=19922944 stripe_ofs=19922944 part_ofs=15728640 rule->part_size=15728640
    2017-10-31 09:21:04.140240 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=24117248 stripe_ofs=24117248 part_ofs=15728640 rule->part_size=15728640
    2017-10-31 09:21:04.145792 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=28311552 stripe_ofs=28311552 part_ofs=15728640 rule->part_size=15728640
    2017-10-31 09:21:04.149964 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=31457280 stripe_ofs=31457280 part_ofs=31457280 rule->part_size=15728640
    2017-10-31 09:21:04.165820 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=35651584 stripe_ofs=35651584 part_ofs=31457280 rule->part_size=15728640
    2017-10-31 09:21:04.171099 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=39845888 stripe_ofs=39845888 part_ofs=31457280 rule->part_size=15728640
    2017-10-31 09:21:04.176765 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=44040192 stripe_ofs=44040192 part_ofs=31457280 rule->part_size=15728640
    2017-10-31 09:21:04.183664 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=47185920 stripe_ofs=47185920 part_ofs=47185920 rule->part_size=83674
    2017-10-31 09:21:04.188140 7f45f5d108c0 0 RGWObjManifest::operator++(): result: ofs=47269594 stripe_ofs=47269594 part_ofs=47269594 rule->part_size=83674
    2017-10-31 09:21:05.034837 7f45f5d108c0 -1 ERROR: failed to get obj ref with ret=-22
    2017-10-31 09:21:05.034846 7f45f5d108c0 -1 ERROR: delete obj aio failed with -22
  1. radosgw-admin bucket rm --bucket=sg2pl593 --purge-objects --bypass-gc
    2017-10-31 09:24:09.082063 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=4194304 stripe_ofs=4194304 part_ofs=0 rule->part_size=15728640
    2017-10-31 09:24:09.090394 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=8388608 stripe_ofs=8388608 part_ofs=0 rule->part_size=15728640
    2017-10-31 09:24:09.095172 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=12582912 stripe_ofs=12582912 part_ofs=0 rule->part_size=15728640
    2017-10-31 09:24:09.099116 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=15728640 stripe_ofs=15728640 part_ofs=15728640 rule->part_size=15728640
    [...snip...]
    2017-10-31 09:24:09.245171 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=110100480 stripe_ofs=110100480 part_ofs=110100480 rule->part_size=15728640
    2017-10-31 09:24:09.251659 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=114294784 stripe_ofs=114294784 part_ofs=110100480 rule->part_size=15728640
    2017-10-31 09:24:09.269739 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=118489088 stripe_ofs=118489088 part_ofs=110100480 rule->part_size=15728640
    2017-10-31 09:24:09.273871 7fe7f4be68c0 0 RGWObjManifest::operator++(): result: ofs=122683392 stripe_ofs=122683392 part_ofs=110100480 rule->part_size=15728640
    2017-10-31 09:24:09.274968 7fe7f4be68c0 -1 ERROR: could not drain handles as aio completion returned with -2

Then successive runs continue failing at the same spot preventing further progress. I can then run it without --bypass-gc for a few seconds followed by running it with --bypass-gc, but usually it fails again after a few minutes.

For example, here's another run on sg2pl593 after running it without --bypass-gc for a few seconds:

  1. radosgw-admin bucket rm --bucket=sg2pl593 --purge-objects --bypass-gc
    2017-10-31 09:28:03.704490 7efdb31d08c0 0 RGWObjManifest::operator++(): result: ofs=565628 stripe_ofs=565628 part_ofs=0 rule->part_size=0
    2017-10-31 09:28:03.890675 7efdb31d08c0 0 RGWObjManifest::operator++(): result: ofs=1757663 stripe_ofs=1757663 part_ofs=0 rule->part_size=0
    2017-10-31 09:28:04.144966 7efdb31d08c0 0 RGWObjManifest::operator++(): result: ofs=2723340 stripe_ofs=2723340 part_ofs=0 rule->part_size=0
    2017-10-31 09:28:04.380761 7efdb31d08c0 -1 ERROR: could not drain handles as aio completion returned with -2

This was found on a hammer (0.94.10) cluster, but there was at least one report on the ceph-users list of this happening in jewel (10.2.7) too:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021972.html

Actions #1

Updated by Patrick Donnelly about 5 years ago

  • Project changed from Ceph to rgw
Actions

Also available in: Atom PDF