Project

General

Profile

Bug #58106

when a large number of error ops appear in the OSDs,pglog does not trim.

Added by 王子敬 wang 2 months ago. Updated about 1 month ago.

Status:
Need More Info
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When We use the s3 interface append and copy of the object gateway, a large number of error ops appear in the OSDs when the pressure is high and concurrent. We have an S3 cluster with OSDs running out of memory due to the large amount of ram needed to hold pglog entries.This causes the OSDs memory oom. Because a large number of error pglogs are not written to the hard disk. This causes the pglog trim mechanism to fail. pglog does not trim error op. How to solve this problem?
(This is on an osd with osd_memory_target = 2GB, and the osd has 223 PGs).
osd_max_pg_log_entries=10000
osd_min_pg_log_entries: 250
osd_pg_log_trim_max: 10000,
osd_pg_log_trim_min: 100,
osd_target_pg_log_entries_per_osd: 300000,

pg_dump.txt View - ceph pg dump partial output (5.63 KB) 王子敬 wang, 11/30/2022 01:44 AM

2.1a0s2_pglog (416 KB) 王子敬 wang, 11/30/2022 08:31 AM

1668130751396.jpg View - top osd memory (106 KB) 王子敬 wang, 12/02/2022 01:03 AM

1666334986727.jpg View - the pglog in the momory (3.33 KB) 王子敬 wang, 12/02/2022 01:10 AM

image-2022-10-13-15-02-19-462.png View - dump_mempools (54.9 KB) 王子敬 wang, 12/06/2022 01:17 AM

image-2022-10-13-15-02-36-902.png View (75.2 KB) 王子敬 wang, 12/06/2022 01:26 AM

image-2022-10-13-15-01-58-184.png View (145 KB) 王子敬 wang, 12/06/2022 01:26 AM

History

#1 Updated by Nitzan Mordechai 2 months ago

  • Assignee set to Nitzan Mordechai

#2 Updated by Nitzan Mordechai 2 months ago

@王子敬 wang can you please provide the output of 'ceph pg dump' ?

#3 Updated by 王子敬 wang 2 months ago

Nitzan Mordechai wrote:

@王子敬 wang can you please provide the output of 'ceph pg dump' ?

ok, the output in the pg_dump.txt.The whole output file is too large. I have intercepted part of it.

There are some other phenomena
I saw a lot of error op in pg_log { "op": "error",
"object": "",
..........
}

ceph daemon osd.0 dump_mempools
"osd_pglog"{
"item": 645472,
"bytes": 2083163019

},

#4 Updated by Nitzan Mordechai 2 months ago

@王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool?

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-9 --op log --pgid 3.0 --no-mon-config'

we need to count find out if the issue here is really error entries or dups
you can check that against the output json with

| jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)

i ran a quick check, it looks like we are trimming error entries on regular basis

#5 Updated by 王子敬 wang 2 months ago

Nitzan Mordechai wrote:

@王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool?
[...]

we need to count find out if the issue here is really error entries or dups
you can check that against the output json with
[...]

i ran a quick check, it looks like we are trimming error entries on regular basis

this is the pglog,the part of the pg2.1a0s2

#6 Updated by 王子敬 wang 2 months ago

王子敬 wang wrote:

Nitzan Mordechai wrote:

@王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool?
[...]

we need to count find out if the issue here is really error entries or dups
you can check that against the output json with
[...]

i ran a quick check, it looks like we are trimming error entries on regular basis

this is the pglog,the part of the pg2.1a0s2

This is what I saved before。

#7 Updated by Nitzan Mordechai 2 months ago

Since you attached part of the pglog, i can't see how many entries you have for log and how many for dups
can you please run the ceph-objectstore-tool with

| jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'
so we can see how many entries we have for each section ?

#8 Updated by Radoslaw Zarzynski 2 months ago

  • Status changed from New to Need More Info

#9 Updated by 王子敬 wang 2 months ago

Nitzan Mordechai wrote:

Since you attached part of the pglog, i can't see how many entries you have for log and how many for dups
can you please run the ceph-objectstore-tool with [...] so we can see how many entries we have for each section ?

[root@node1 ~]# cat 2.1a0s2 | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'
1632
1369

file 2.1a0s2 is pg2.1a0s2 pg_log, Is this ok? If necessary, I will repeat the problem and get the relevant log information

#10 Updated by Nitzan Mordechai 2 months ago

王子敬 wang wrote:

Nitzan Mordechai wrote:

Since you attached part of the pglog, i can't see how many entries you have for log and how many for dups
can you please run the ceph-objectstore-tool with [...] so we can see how many entries we have for each section ?

[root@node1 ~]# cat 2.1a0s2 | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'
1632
1369

file 2.1a0s2 is pg2.1a0s2 pg_log, Is this ok? If necessary, I will repeat the problem and get the relevant log information

2.1a0s2 was trimmed, it doesn't looks like it contain all your pg log entries, can you please run ceph-objectstore-tool again but this time sent us the full pglog

#11 Updated by 王子敬 wang 2 months ago

Nitzan Mordechai wrote:

王子敬 wang wrote:

Nitzan Mordechai wrote:

Since you attached part of the pglog, i can't see how many entries you have for log and how many for dups
can you please run the ceph-objectstore-tool with [...] so we can see how many entries we have for each section ?

[root@node1 ~]# cat 2.1a0s2 | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'
1632
1369

file 2.1a0s2 is pg2.1a0s2 pg_log, Is this ok? If necessary, I will repeat the problem and get the relevant log information

2.1a0s2 was trimmed, it doesn't looks like it contain all your pg log entries, can you please run ceph-objectstore-tool again but this time sent us the full pglog

2.1a0s2 was run ceph-objectstore-tool got. the ceph-objectstore-tool only get on disk pglog. the most pglogs are in memory,I can't get it。Is there any way you can get it? please tell me.

When I stop append and copy objects, the memory state is shown in the picture.the memory is high.

after that, When I try to put some objects, the memory starts to be recycled slowly. pglog trimed.

#12 Updated by Radoslaw Zarzynski about 2 months ago

Hello!

what is on disk is actually serialized from the the in-memory representation. We don't see huge numbers of entries here.

The in-memory pg-log can be diagnosed by looking at an output of dump_mempools. On the screenshot it's under 1 GB.

Why do think these values are too high? Are worried what top says about the VIRT column?

#13 Updated by 王子敬 wang about 2 months ago

Radoslaw Zarzynski wrote:

Hello!

what is on disk is actually serialized from the the in-memory representation. We don't see huge numbers of entries here.

The in-memory pg-log can be diagnosed by looking at an output of dump_mempools. On the screenshot it's under 1 GB.

Why do think these values are too high? Are worried what top says about the VIRT column?

Sorry, this is the screenshot of dump_mempools in the environment.
Most OSDs have hundreds of thousands or even millions of pglogs, In addition, (byte/items)the number of bytes in a single pglog is also large. this is also where I wonder

#14 Updated by Radoslaw Zarzynski about 1 month ago

Well, values around 600-900 kitems aren't looking very large to me. Definitely they are much, much smaller than anything we saw in the dups issue where dozens of millions were typical.

Also available in: Atom PDF