Project

General

Profile

Bug #41745

radosgw-admin orphans find stuck forever

Added by Manuel Rios about 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
radosgw-admin orphan
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable)
Host with 32 cores / 64 GB RAM Centos 7.6

Command : radosgw-admin orphans find --pool=default.rgw.buckets.data --job-id=ophans_clean3 --yes-i-really-mean-it

After spend near 36 hours looping over objects radosgw-admin orphans find stuck in step:

2019-09-10 12:43:10.996 7fc6ca719700  1 iterated through 69647000 objects
2019-09-10 12:43:11.391 7fc6ca719700  0 run(): building index of all bucket indexes
2019-09-10 12:43:11.540 7fc6ca719700  0 run(): building index of all linked objects
2019-09-10 12:43:11.540 7fc6ca719700  0 building linked oids index: 0/64
2019-09-10 12:43:11.863 7fc6ca719700  0 building linked oids index: 1/64

Largeomaps due orphans find in log pool, i think its expected.

LARGE_OMAP_OBJECTS 59 large omap objects
    59 large objects found in pool 'default.rgw.log'
    Search the cluster log for 'Large omap object found' for more details.

^Cstrace: Process 3156151 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.70    0.963434          64     15064      2532 futex
  3.01    0.029946           4      6733           write
  0.29    0.002897          66        44           madvise
  0.00    0.000038           1        48           brk
------ ----------- ----------- --------- --------- ----------------
100.00    0.996315                 21889      2532 total
[00007fc6c051758a] futex(0x7fc694079180, FUTEX_WAKE_PRIVATE, 1) = 1
[00007fc6bd4ccd1c] futex(0x7fc6bd787760, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[00007fc6bd4ccd49] futex(0x7fc6bd787760, FUTEX_WAKE_PRIVATE, 1) = 0
[00007fc6c051758a] futex(0x558070fa34e8, FUTEX_WAKE_PRIVATE, 1) = 0
[00007fc6c051769d] write(12, "c", 1)    = 1
[00007fc6c051758a] futex(0x7fc6940705b0, FUTEX_WAKE_PRIVATE, 1) = 1
[00007fc6c051769d] write(6, "c", 1)     = 1
[00007fc6c051758a] futex(0x7fc69403b7c0, FUTEX_WAKE_PRIVATE, 1) = 1
[00007fc6bd4ccd1c] futex(0x7fc6bd787760, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[00007fc6bd4ccd49] futex(0x7fc6bd787760, FUTEX_WAKE_PRIVATE, 1) = 0
[00007fc6c051769d] write(9, "c", 1)     = 1
[00007fc6bd4ccd1c] futex(0x7fc6a0000020, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[00007fc6bd4ccd49] futex(0x7fc6a0000020, FUTEX_WAKE_PRIVATE, 1) = 0
[00007fc6c051769d] write(6, "c", 1)     = 1
[00007fc6c051758a] futex(0x7fc69408c3a0, FUTEX_WAKE_PRIVATE, 1) = 1
[00007fc6bd4ccd1c] futex(0x7fc6bd787760, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)

Also add orphands find need a little more documentation about usage, expected output and expected logs.
Documentation is not clear about what expect from it.

History

#1 Updated by Abhishek Lekshmanan about 2 years ago

  • Assignee set to Abhishek Lekshmanan

#2 Updated by Abhishek Lekshmanan almost 2 years ago

did the orphans find ever complete? what is the current status

#3 Updated by Manuel Rios almost 2 years ago

Its supossed finish but with the output generated a log file of 9 GB and we're unable to identify whats is orphaned and can be cleaned.

The output of the tool is not clear, and we're still waiting for the new tool for orphans clean.

We calculated arround 60TB/70TB of orphans files... Several millions objects..

Still waiting help for cleanup our storage.

For get the log, we checked the 64 shards entries but... what is this?? All orphaned?? All valid??

Waiting for someone to clarify

Regards

#4 Updated by Manuel Rios almost 2 years ago

We can run again if you prefer , wait several days and send you the log file compressed.

#5 Updated by Manuel Rios almost 2 years ago

radosgw-admin orphans find --pool=default.rgw.buckets.data --num-shards=64 --max-concurrent-ios=128 --job-id=blablabla

Current state

2019-11-10 11:24:26.208 7fca50d80700  1 iterated through 63161000 objects
2019-11-10 11:24:26.412 7fca50d80700  1 iterated through 63162000 objects
2019-11-10 11:24:26.570 7fca50d80700  1 iterated through 63163000 objects
2019-11-10 11:24:26.725 7fca50d80700  1 iterated through 63164000 objects
2019-11-10 11:24:26.988 7fca50d80700  0 run(): building index of all bucket indexes
2019-11-10 11:24:27.078 7fca50d80700  0 run(): building index of all linked objects
2019-11-10 11:24:27.078 7fca50d80700  0 building linked oids index: 0/64
2019-11-10 11:24:27.296 7fca50d80700  0 building linked oids index: 1/64

TOP after 3days running..

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

506720 root      20   0   60.7g  47.5g   6148 R  85.8 75.8   3156:02 radosgw-admin orphans find --pool=default.rgw.buckets.data --num-shards=64 --max-concurrent-ios=128 --job-id=+

Ceph health detail

[root@ceph-rgw03 ~]# ceph health detail
HEALTH_WARN 70 large omap objects
LARGE_OMAP_OBJECTS 70 large omap objects
    70 large objects found in pool 'default.rgw.log'

#6 Updated by Matthew Oliver almost 2 years ago

What does `radosgw-admin orphans list-jobs --extra-info` say. I assume it gets to a point where it starts writing some state.

Side note, as I'm learning as I go here :) I'd need to look at the code. But I wonder if the large omap objects are related to these searches, ie the search results. There is a `radosgw-admin orphans finish` comment, so I wonder if that does the tidy up.
I guess if `list-jobs` doesn't see them then they can't be finished.

#7 Updated by Manuel Rios almost 2 years ago

Last night node of 64GB get OOM and kill the orphans find process since 10-11-2019.

[root@ceph-rgw03 ~]# radosgw-admin orphans list-jobs --extra-info
[
    {
        "orphan_search_state": {
            "info": {
                "orphan_search_info": {
                    "job_name": "2019orphans",
                    "pool": "default.rgw.buckets.data",
                    "num_shards": 1,
                    "start_time": "2019-11-08 16:47:29.285364Z" 
                }
            },
            "stage": {
                "orphan_search_stage": {
                    "search_stage": "iterate_bucket_index",
                    "shard": 0,
                    "marker": "" 
                }
            }
        }
    },
    {
        "orphan_search_state": {
            "info": {
                "orphan_search_info": {
                    "job_name": "2019v2orphans",
                    "pool": "default.rgw.buckets.data",
                    "num_shards": 32,
                    "start_time": "2019-11-09 16:10:48.411939Z" 
                }
            },
            "stage": {
                "orphan_search_stage": {
                    "search_stage": "iterate_bucket_index",
                    "shard": 0,
                    "marker": "" 
                }
            }
        }
    },
    {
        "orphan_search_state": {
            "info": {
                "orphan_search_info": {
                    "job_name": "2020orphans",
                    "pool": "default.rgw.buckets.data",
                    "num_shards": 64,
                    "start_time": "2019-11-10 06:05:46.415683Z" 
                }
            },
            "stage": {
                "orphan_search_stage": {
                    "search_stage": "iterate_bucket_index",
                    "shard": 0,
                    "marker": "" 
                }
            }
        }
    }
]

Yes Omap is related to the shards generated by the command , and as far as i know orphans finish cleanup the shards in rgw.log pool

Command always stuck in the iterate_bucket_index and every day take near 10-12GB RAM until OOM.
I got the full output of the last command in a log file of 1.1G , (4.4G Raw).

Thx!

#8 Updated by Matthew Oliver almost 2 years ago

So according the to the RGW code, roughly the stages are:

ORPHAN_SEARCH_STAGE_INIT (initializing state) > ORPHAN_SEARCH_STAGE_LSPOOL (building index of all objects in pool) > ORPHAN_SEARCH_STAGE_LSBUCKETS (building index of all bucket indexes) > ORPHAN_SEARCH_STAGE_ITERATE_BI (building index of all linked objects) > ORPHAN_SEARCH_STAGE_COMPARE.

The one with "iterate_bucket_index" maps to ORPHAN_SEARCH_STAGE_ITERATE_BI (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_json_enc.cc#L1641). So you're getting to stage 4 out of 5 before OOM is killing it. Which we can see in states you mentioned. Now to follow the code and hope it bring some insight.

#9 Updated by Matthew Oliver almost 2 years ago

Manuel Rios wrote:

For get the log, we checked the 64 shards entries but... what is this?? All orphaned?? All valid??

Looking at the code, it builds up an index of objects and an indes of objects in containers then compares to see which ones are orphans. So no, not all orphans and probably not all valid (unless there turns out to be 0 orphans).

Seeing as the process seems to be killed before it gets to the compare stage, I'd say who knows.

The compare seems pretty straight forward. Anything orphan's it finds should result in a message to the screen with 'leaked: <object>'. If you turn up your debugging you can get this list in the logs (if I'm reading correctly).

It's interesting that it seems to have stopped while going through the entries on the second orphan shard (1/64). Maybe if we increase the logging of rgw or of even of just that node we could get more information on where it's hanging. Maybe that will help track down the problem.

I did run a valgrind on the orphan search but that didn't really show anything, though maybe I just don't have enough objects in my test environment.

#10 Updated by Manuel Rios almost 2 years ago

Ok,

Running : radosgw-admin orphans find --pool=default.rgw.buckets.data --num-shards=64 --max-concurrent-ios=128 --job-id=matthew --debug-rgw=20

In 4 days will add the log for your download.

Regards

#11 Updated by Manuel Rios almost 2 years ago

Hi Matthew,

2 days running , now its in phase iterate_bucket_index. In the console with debug-rgw=20 spam the next message with :

2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204465696768 stripe_ofs=1204465696768 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204469891072 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204469891072 stripe_ofs=1204469891072 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204474085376 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204474085376 stripe_ofs=1204474085376 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204478279680 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.057 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204478279680 stripe_ofs=1204478279680 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204482473984 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204482473984 stripe_ofs=1204482473984 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204486668288 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204486668288 stripe_ofs=1204486668288 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204490862592 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204490862592 stripe_ofs=1204490862592 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204495056896 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204495056896 stripe_ofs=1204495056896 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204499251200 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204499251200 stripe_ofs=1204499251200 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): stripe_ofs=1204503445504 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): result: ofs=1204503445504 stripe_ofs=1204503445504 part_ofs=1204415365120 rule->part_size=230686720
2019-11-22 09:32:51.058 7ff53c33f700 20 RGWObjManifest::operator++(): rule->part_size=230686720 rules.size()=1445

Also randomly spam some :

2019-11-22 09:33:33.245 7ff53c33f700  5 build_linked_oids_for_bucketskipping stat as the object MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Red Oak.jpg:/20030129181538/Woods - Plastics.Finish Carpentry.Wood.Red Oak.jpgfits in a head
2019-11-22 09:33:33.245 7ff53c33f700 20 obj entry: MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Redwood.jpg:/20030129182000/Woods - Plastics.Finish Carpentry.Wood.Redwood.jpg
2019-11-22 09:33:33.245 7ff53c33f700 20 build_linked_oids_for_bucket: entry.key.name=MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Redwood.jpg:/20030129182000/Woods - Plastics.Finish Carpentry.Wood.Redwood.jpg entry.key.instance=
2019-11-22 09:33:33.245 7ff53c33f700  5 build_linked_oids_for_bucketskipping stat as the object MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Redwood.jpg:/20030129182000/Woods - Plastics.Finish Carpentry.Wood.Redwood.jpgfits in a head
2019-11-22 09:33:33.245 7ff53c33f700 20 obj entry: MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Teak.jpg:/20030129191126/Woods - Plastics.Finish Carpentry.Wood.Teak.jpg
2019-11-22 09:33:33.245 7ff53c33f700 20 build_linked_oids_for_bucket: entry.key.name=MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Teak.jpg:/20030129191126/Woods - Plastics.Finish Carpentry.Wood.Teak.jpg entry.key.instance=
2019-11-22 09:33:33.245 7ff53c33f700  5 build_linked_oids_for_bucketskipping stat as the object MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Teak.jpg:/20030129191126/Woods - Plastics.Finish Carpentry.Wood.Teak.jpgfits in a head
2019-11-22 09:33:33.245 7ff53c33f700 20 obj entry: MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Walnut.jpg:/20030130211332/Woods - Plastics.Finish Carpentry.Wood.Walnut.jpg
2019-11-22 09:33:33.245 7ff53c33f700 20 build_linked_oids_for_bucket: entry.key.name=MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Walnut.jpg:/20030130211332/Woods - Plastics.Finish Carpentry.Wood.Walnut.jpg entry.key.instance=
2019-11-22 09:33:33.245 7ff53c33f700  5 build_linked_oids_for_bucketskipping stat as the object MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.Walnut.jpg:/20030130211332/Woods - Plastics.Finish Carpentry.Wood.Walnut.jpgfits in a head
2019-11-22 09:33:33.245 7ff53c33f700 20 obj entry: MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.White Ash.jpg:/20030129194634/Woods - Plastics.Finish Carpentry.Wood.White Ash.jpg
2019-11-22 09:33:33.245 7ff53c33f700 20 build_linked_oids_for_bucket: entry.key.name=MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.White Ash.jpg:/20030129194634/Woods - Plastics.Finish Carpentry.Wood.White Ash.jpg entry.key.instance=
2019-11-22 09:33:33.245 7ff53c33f700  5 build_linked_oids_for_bucketskipping stat as the object MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.White Ash.jpg:/20030129194634/Woods - Plastics.Finish Carpentry.Wood.White Ash.jpgfits in a head
2019-11-22 09:33:33.245 7ff53c33f700 20 obj entry: MBS-810e080b-6a30-4bd7-9ddd-5dbe1f347a2c/CBB_SERVER9/C:/M/DATOS/Public/BUZON/Dolors/Autocad 2007 Español Spanish + Serial + Crack/Autocad 2007/Application Data/Autodesk/Textures/Woods - Plastics.Finish Carpentry.Wood.White Oak.jpg:/20030129194028/Woods - Plastics.Finish Carpentry.Wood.White Oak.jpg

#12 Updated by Matthew Oliver almost 2 years ago

Thanks, just looking into these now.

The second section, object x fits in a head. means they are small enough not to bother with. (unless you run with --detail though let's not do that). Looking into the RGWObjManifest logs. I assume it's reading larger objects (those that don't fit into a head).

#13 Updated by Manuel Rios almost 2 years ago

Hi Matthew ,

FOr your knowleage, after 8 days :

  {
        "orphan_search_state": {
            "info": {
                "orphan_search_info": {
                    "job_name": "matthew",
                    "pool": "default.rgw.buckets.data",
                    "num_shards": 64,
                    "start_time": "2019-11-20 06:29:27.506393Z" 
                }
            },
            "stage": {
                "orphan_search_stage": {
                    "search_stage": "iterate_bucket_index",
                    "shard": 0,

       "marker": "" 
                }
            }
        }
    }

Top of the process , its using the 50% of the node .
1596316 root 20 0 35.9g 31.9g 13008 R 70.5 50.9 8225:00

As you can see its still in shard 0.

#14 Updated by Matthew Oliver almost 2 years ago

Thanks Manuel, great bit of info. So we know it's definitely failing on shard 0. Are there still alot of `RGWObjManifest::operator++()`, this should be reading through larger objects.
I'm loading up my dev env if larger multipart objects in the hopes to recreate your problem, or maybe point to a memory leak or bottleneck that may explain it. WE know there are millions of objects, but we should be paging through the data as to now use that much memory.

Hopefully I'll get to the bottom of this soon.

#15 Updated by Matthew Oliver almost 2 years ago

So I've been playing around, what I can see if you have some really large multipart objects, that's what's in the top part of the log you pasted, and unless you don't have very large objects and what your seeing is a loop (which could explain things) that output is totally normal.

So question 1 is, do you have instances of very large multi part objects? If so that snippet you gave may be a red herring and could be totally normal, and in which case is there anything else in the log you can see?

Or if not, then the orphans code is certainly following some large manifest file.. which I hope isn't a bug and looping indefinitely.

#16 Updated by Manuel Rios almost 2 years ago

I'm sure that we have TB objects due to the main goal of CEPH is store backups data. The largest file uploaded in multi parts will be near 12-15TB as image server backups.

That could be as you say totally normally but, in 7 days the process is unable to finish shard 0 and the server get OOM.

I really don't know what do to clean the cluster successfully, its a lot of garbage and wasted space.

#17 Updated by Abhishek Lekshmanan almost 2 years ago

Unfortunately with debug rgw=20 you also blow up the memory requirement of logging quite a bit. Has the stage advanced through more shards?

What happened when you ran the orphan find on an already finished job, this one should reveal the leaked objects as LEAKED and oid.

Since you also mention large omap objects, can you search the cluster log for large omap objects, as this would reveal the name of the omap objects. (it is also possible that there are older jobs which were never cleaned up causing it).

Adding Eric as well on the topic of orphan search

#18 Updated by Manuel Rios almost 2 years ago

Hi,

The main problem is that no one job finished or go more than shard 0. Even after 7 days of job running at shard 0/64 get any progress. Now I cant run the process more than 6-7 days due im getting a auto OOM Kill.

The main scan finish in the stage of full scan takes near 8-10 hours, and they start to shard process but never got more than 0/64.

I can try to add another 64 GB RAM and get a host with 128 just for the process but it has no sense no?

You have more experience, I'm able to send you or run whatever CLI param you want for orphans find. Im able to streaming the process or whatever you want, even a SSH access.

Does this issue only happen to us? Or people don't care about orphans?

I checked 14.2.5 but didn't see any improvement for orphans.

#19 Updated by Pavan Rallabhandi almost 2 years ago

Not directly related to orphans find, am curious to check if this cluster was a greenfield Nautilus installation or was this cluster upgraded from a previous version? And also, does this cluster serve swift or S3 traffic or both?

May be the development of a new tool under this tracker might be of your interest https://tracker.ceph.com/issues/41828

#20 Updated by Manuel Rios almost 2 years ago

Cluster actually provides S3 access and RBD. No swift.
This cluster comes from the previous version of Nautilus I think from Luminous version. Upgraded version by version until the lastest updated yesterday 14.2.5
I personally think that orphans find and delete must be done in the scrubbing-deep time but is not even in the roadmap.
Checked the issue but looks on hold, not any progress.
I can try to move rgw.log pool where shards are stored to NVME drive instead of SSD drives and run again the orphans find to check if it improves something.

Also available in: Atom PDF