Project

General

Profile

Actions

Bug #46047

open

[rgw] resizing pg_num/pgp_num causes excessive osd traffic on get_omap_iterator

Added by Mariusz Gronczewski almost 4 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Autoscaler tuned the .index pool for rgw to 8 pgs (we have 40 OSDs) which causes a bit of a hot-spot issue so I thought about increasing it to 32 (via setting autoscaler bias.

That caused any OSD involved in to basically go to 100% IO load (stuck in IO wait mostly), after enabling bluestore debug I got:

2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# = 0
2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# = 0
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.708+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.708+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# = 0
2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#

and similar for rest of the pgs

Resizing back to original 8 PGs resolved the performance issue

Actions #1

Updated by Mariusz Gronczewski almost 4 years ago

NVM, same behaviour, I just thought load dropped because I didn't look at all nodes

cluster itself is getting much smaller (few thousand requests per second for files <1MB) traffic, yet load on OSDs involved in rgw index is very high

Actions #2

Updated by Sage Weil almost 3 years ago

  • Project changed from Ceph to rgw
Actions

Also available in: Atom PDF