Project

General

Profile

Bug #59099

PG move causes data duplication

Added by Adam Kupczyk about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Lets imagine we have a pool TEST.
In the PG we have object OBJ of size 1M.

We create snap SNAP-1 and write some 4K to OBJ.
As result we get OBJ.1 that takes 1M and OBJ.head that reuses all but 4K.
The total data usage is 1M + 4K.

Now we move PG to other OSD.
In some cases OBJ.head + OBJ.1 will take 2M.

The example of this happening is in attachment snap-pg-move-history.sh.
When data is on original PG on OSD.0:

ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.09859 1.00000 101 GiB 1.1 GiB 101 MiB 0 B 21 MiB 100 GiB 1.09 1.05 2 up
1 ssd 0.09859 1.00000 101 GiB 1.0 GiB 740 KiB 0 B 20 MiB 100 GiB 0.99 0.95 1 up
TOTAL 202 GiB 2.1 GiB 101 MiB 0 B 41 MiB 200 GiB 1.04
MIN/MAX VAR: 0.95/1.05 STDDEV: 0.05

And after forcibly moving PG to OSD.

ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.09859 1.00000 101 GiB 1.0 GiB 756 KiB 0 B 21 MiB 100 GiB 0.99 0.91 1 up
1 ssd 0.09859 1.00000 101 GiB 1.2 GiB 201 MiB 0 B 21 MiB 100 GiB 1.18 1.09 2 up
TOTAL 202 GiB 2.2 GiB 201 MiB 0 B 42 MiB 200 GiB 1.09
MIN/MAX VAR: 0.91/1.09 STDDEV: 0.10

The script was tested on Reef, but I do not believe it is limited to it.

snap-pg-move-history.sh View (743 Bytes) Adam Kupczyk, 03/17/2023 01:50 PM

transaction-plika.txt View (2.87 KB) Adam Kupczyk, 03/17/2023 02:54 PM

History

#1 Updated by Radoslaw Zarzynski about 1 year ago

  • Priority changed from Normal to High

In some cases OBJ.head + OBJ.1 will take 2M.

The first thing would be to clarify when exactly.

#2 Updated by Adam Kupczyk about 1 year ago

I made additional test.
Modified size in script from 50M to 1M and looked what operation is requested on BS side.

BS is simply requested to write data in object "plika" anew.

Attach: transaction-plika.txt

#3 Updated by Adam Kupczyk about 1 year ago

Additional observations made during testing.

a) Never did expansion occur more then 2x.
b) Expansion is always in form of snap.1 being recreated from scratch,
subsequent snap.2, snap.3... were diffs to snap.1.
c) The issue is non-random. If specific conditions cause duplication to occur, it will always happen.
d) Duplication either happens to all objects, or to none.
e)
10 objects 50M -> duplication
20 objects 1M -> duplication
50 objects 1M -> OK
20 objects 50M -> OK
1 object 50M + 10 objects 1M -> duplication

#4 Updated by Radoslaw Zarzynski about 1 year ago

  • Priority changed from Normal to High

#5 Updated by Radoslaw Zarzynski about 1 year ago

  • Assignee set to Adam Kupczyk

Notes from the scrub:

1. there are important bounds on the inflation which makes priority high but not urgent,
2. worth focusing on this after getting the current BS stuff for quincy.

Also available in: Atom PDF