Bug #45333: LARGE_OMAP_OBJECTS in pool metadata - CephFS - Ceph

Actions

Copy link

Bug #45333

open

LARGE_OMAP_OBJECTS in pool metadata

Added by Kenneth Waegeman almost 4 years ago. Updated over 3 years ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v14.2.6

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'metadata'
Search the cluster log for 'Large omap object found' for more details.

I did a deep-scrub on the pg, but same result:
2020-04-29 16:18:14.028 7f94efaeb700 0 log_channel(cluster) log [DBG] : 6.3b3 deep-scrub starts
2020-04-29 16:18:21.928 7f94f3af3700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 6:cdef95e9:::1001f2c4388.00000000:head PG: 6.97a9f7b3 (6.3b3) Key count: 551078 Size (bytes): 262667412
2020-04-29 16:18:23.888 7f94f3af3700 0 log_channel(cluster) log [DBG] : 6.3b3 deep-scrub ok

the metadata pool is the metadata pool of cephfs.

Thanks!
Kenneth

Actions

Copy link

Updated by Brad Hubbard almost 4 years ago

Project changed from Ceph to CephFS

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Status changed from New to Need More Info

Just to confirm, this is with a Nautilus cluster?

Actions

Copy link

Updated by Kenneth Waegeman almost 4 years ago

yes, we are running 14.2.6

[root@mds01 ~]# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Kenneth Waegeman wrote:

yes, we are running 14.2.6

[root@mds01 ~]# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)

What's the oldest version of Ceph this cluster was running?

Actions

Copy link

Updated by Kenneth Waegeman almost 4 years ago

Hmm that's a very good question, is there a simple way to check this?

It the very very least Luminous, but quite surely Jewel or even Infernalis :)

[root@mds01 ~]# ceph mon feature ls
all features
supported: [kraken,luminous,mimic,osdmap-prune,nautilus]
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
on current monmap (epoch 6)
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
required: [kraken,luminous,mimic,osdmap-prune,nautilus]

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Kenneth Waegeman wrote:

Hmm that's a very good question, is there a simple way to check this?

It the very very least Luminous, but quite surely Jewel or even Infernalis :)

[root@mds01 ~]# ceph mon feature ls
all features
supported: [kraken,luminous,mimic,osdmap-prune,nautilus]
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
on current monmap (epoch 6)
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
required: [kraken,luminous,mimic,osdmap-prune,nautilus]

It's probable that this directory was created when the cluster was Jewel and never fragmented because the MDS hasn't loaded the directory since it was created. Try:

$ rados getxattr --pool=<metadata pool> 1001f2c4388.00000000 parent | ceph-dencoder type inode_backtrace_t import - decode dump_json

to get the directory's path. Then `ls` the directory to see if it gets fragmented. You can confirm via:

$ ceph daemon mds.foo dirfrag ls <path>

Actions

Copy link

Updated by Kenneth Waegeman almost 4 years ago

Thanks, I tried this:

[root@mds01 ~]# ceph daemon mds.mds01 dirfrag ls /backups/xxx/yyy/zzz/exp5_0/results_filtered
[ {
"value": 0,
"bits": 0,
"str": "0/0"
}
]
Still there:
2020-05-04 17:10:22.523 7f94f3af3700 0 log_channel(cluster) log [DBG] : 6.3b3 deep-scrub starts
2020-05-04 17:10:25.183 7f94f3af3700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 6:cdef95e9:::1001f2c4388.00000000:head PG: 6.97a9f7b3 (6.3b3) Key count: 549423 Size (bytes): 261902802
2020-05-04 17:10:25.613 7f94efaeb700 0 log_channel(cluster) log [DBG] : 6.3b3 deep-scrub ok

This directory is also empty now. But the directory is still in snapshots with 230k files in it, can this have impact on this?

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Kenneth Waegeman wrote:

Thanks, I tried this:

[root@mds01 ~]# ceph daemon mds.mds01 dirfrag ls /backups/xxx/yyy/zzz/exp5_0/results_filtered
[ {
"value": 0,
"bits": 0,
"str": "0/0"
}
]
Still there:
2020-05-04 17:10:22.523 7f94f3af3700 0 log_channel(cluster) log [DBG] : 6.3b3 deep-scrub starts
2020-05-04 17:10:25.183 7f94f3af3700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 6:cdef95e9:::1001f2c4388.00000000:head PG: 6.97a9f7b3 (6.3b3) Key count: 549423 Size (bytes): 261902802
2020-05-04 17:10:25.613 7f94efaeb700 0 log_channel(cluster) log [DBG] : 6.3b3 deep-scrub ok

This directory is also empty now. But the directory is still in snapshots with 230k files in it, can this have impact on this?

Yes. The MDS does not fragment directory snapshots. Either you'll need to delete the snapshot or raise the omap warning threshold.

Actions

Copy link

Updated by Kenneth Waegeman over 3 years ago

Lately I am getting more of these warnings. If needed I can raise the warning threshold, but it seems it is not exactly the same issue:


/var/log/ceph/ceph.log.2.gz:2020-08-05 14:53:43.994254 osd.449 (osd.449) 1575 : cluster [WRN] Large omap object found. Object: 6:507fa0cc:::20006c03082.00000000:head PG: 6.3305fe0a (6.20a) Key count: 251908 Size (bytes): 116384536
/var/log/ceph/ceph.log.2.gz:2020-08-06 01:09:16.072854 osd.9 (osd.9) 2709 : cluster [WRN] Large omap object found. Object: 6:bd84e1a1:::10020210901.01800000:head PG: 6.858721bd (6.1bd) Key count: 753690 Size (bytes): 349674772
/var/log/ceph/ceph.log.3.gz:2020-08-04 20:37:31.635099 osd.105 (osd.105) 2906 : cluster [WRN] Large omap object found. Object: 6:e5da4c9a:::200083b1f19.00000000:head PG: 6.59325ba7 (6.3a7) Key count: 805611 Size (bytes): 388308115
/var/log/ceph/ceph.log.6.gz:2020-08-01 22:31:57.042942 osd.39 (osd.39) 2534 : cluster [WRN] Large omap object found. Object: 6:cdef95e9:::1001f2c4388.00000000:head PG: 6.97a9f7b3 (6.3b3) Key count: 549423 Size (bytes): 261902802

[root@mds04 ~]#  rados getxattr --pool=metadata 20006c03082.00000000 parent | ceph-dencoder type inode_backtrace_t import - decode dump_json
{
    "ino": 2199136514178,
    "ancestors": [
        {
            "dirino": 1542,
            "dname": "20006c03082",
            "version": 245759416
        },
        {
            "dirino": 256,
            "dname": "stray6",
            "version": 718375018
        }
    ],
    "pool": 6,
    "old_pools": []
}

[root@mds04 ~]#  rados getxattr --pool=metadata 200083b1f19.00000000 parent | ceph-dencoder type inode_backtrace_t import - decode dump_json
{
    "ino": 2199161347865,
    "ancestors": [
        {
            "dirino": 1536,
            "dname": "200083b1f19",
            "version": 246369863
        },
        {
            "dirino": 256,
            "dname": "stray0",
            "version": 721895994
        }
    ],
    "pool": 6,
    "old_pools": []
}

[root@mds04 ~]#  rados getxattr --pool=metadata 10020210901.02400000 parent | ceph-dencoder type inode_backtrace_t import - decode dump_json
error getting xattr metadata/10020210901.02400000/parent: (61) No data available
error: buffer::end_of_buffer
[root@mds04 ~]#  rados getxattr --pool=metadata 1001f2c4388.00000000 parent | ceph-dencoder type inode_backtrace_t import - decode dump_json
{
    "ino": 1100034622344,
    "ancestors": [
        {
            "dirino": 1100034622341,
            "dname": "results_filtered",
            "version": 4
        },
        {
            "dirino": 2199045203283,
            "dname": "exp5_0",
            "version": 169
        },
        {
            "dirino": 1099590462457,
            "dname": "vscxxxxx",
            "version": 37606
        },
        {
            "dirino": 1099589861337,
            "dname": "gvoxxxxx",
            "version": 53187943
        },
        {
            "dirino": 1099968287658,
            "dname": "000",
            "version": 33659356
        },
        {
            "dirino": 1099589921855,
            "dname": "vo",
            "version": 33271999
        },
        {
            "dirino": 1099588863364,
            "dname": "gent",
            "version": 241215031
        },
        {
            "dirino": 1099511627777,
            "dname": "kyukondata",
            "version": 36832015
        },
        {
            "dirino": 1,
            "dname": "backups",
            "version": 49188213
        }
    ],
    "pool": 6,
    "old_pools": []
}

The last one is the same as before, but now this directory is empty in all snapshots.. I deep-scrubbed all those osds, but the issues persists.

Thanks again!!

Kenneth

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #45333

LARGE_OMAP_OBJECTS in pool metadata

Updated by Brad Hubbard almost 4 years ago

Updated by Patrick Donnelly almost 4 years ago

Updated by Kenneth Waegeman almost 4 years ago

Updated by Patrick Donnelly almost 4 years ago

Updated by Kenneth Waegeman almost 4 years ago

Updated by Patrick Donnelly almost 4 years ago

Updated by Kenneth Waegeman almost 4 years ago

Updated by Patrick Donnelly almost 4 years ago

Updated by Kenneth Waegeman over 3 years ago