Project

General

Profile

Bug #37876

[object map] discard operations might flag clean objects as dirty

Added by Jason Dillaman about 5 years ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032218.html

A snapshot object map might mark an object as EXISTS instead of the correct EXISTS_CLEAN after a discard operation (fstrim initiated against image).

History

#1 Updated by Ilya Dryomov 9 months ago

I think I reproduced this. It's not even a race condition -- very simple:

$ rbd create -s 1G img
$ sudo rbd device map img
/dev/rbd0
$ sudo xfs_io -d -c 'pwrite -b 64K 0 64K' /dev/rbd0
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0163 sec (3.823 MiB/sec and 61.1733 ops/sec)
$ rbd snap create img@snap
Creating snap: 100% complete...done.
$ sudo blkdiscard -o 1M -l 64K /dev/rbd0
$ rbd object-map check img
Object Map Check: 3% complete...
2023-06-20T15:17:55.509-0400 7f34f6fe1700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.15eaef4b10a2.0000000000000000 marked as 1, but should be 3
Object Map Check: 99% complete...
2023-06-20T15:17:55.545-0400 7f34f77e2700 -1 librbd::object_map::InvalidateRequest: 0x7f34f0010440 invalidating object map in-memory
2023-06-20T15:17:55.545-0400 7f34f77e2700 -1 librbd::object_map::InvalidateRequest: 0x7f34f0010440 invalidating object map on-disk
2023-06-20T15:17:55.553-0400 7f34f6fe1700 -1 librbd::object_map::InvalidateRequest: 0x7f34f0010440 should_complete: r=0
Object Map Check: 100% complete...done.

This is because after the 64K write at offset 0, the underlying object is 64K in size. The discard at offset 1M is beyond that ("beyond EOF") and the OSD just ignores the corresponding ZERO op. Librbd however still transitions the object map state from EXISTS_CLEAN to EXISTS since it doesn't track object sizes and can't tell that ZERO would be a no-op.

An interesting wrinkle is that this doesn't happen when the discard request ends at an object size boundary (4M by default):

$ rbd create -s 1G img
$ sudo rbd device map img
/dev/rbd0
$ sudo xfs_io -d -c 'pwrite -b 64K 0 64K' /dev/rbd0
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0172 sec (3.622 MiB/sec and 57.9542 ops/sec)
$ rbd snap create img@snap
Creating snap: 100% complete...done.
$ sudo blkdiscard -o 4032K -l 64K /dev/rbd0
$ rbd object-map check img
Object Map Check: 100% complete...done.

Here the object is still 64K in size after the write and the discard is still "beyond EOF", but, because the discard ends at offset 4M, librbd uses the TRUNCATE op which doesn't no-op that easily. The OSD ends up extending the object to 4032K in size (still sparse of course). Because that is an actual modification that gets recorded in RADOS, the EXISTS_CLEAN -> EXISTS transition which happens exactly as in the previous case is "justified" (has at least some meaning) and hence "rbd object-map check" lets it pass.

Also available in: Atom PDF