Project

General

Profile

Actions

Bug #64800

open

unable to remove RBD image when OSD is full and trash object is not already created in the image's pool

Added by Ramana Raja about 2 months ago. Updated about 1 month ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In a vstart cluster, created a RBD image, wrote data to it, configured set-full-ratio to make the OSDs full. Failed to remove the image using the CLI

$ rbd create --size 2048 data/img1
$ sudo ./bin/rbd device map data/img1
$ sudo fio --name=fiotest --filename=/dev/rbd0 --rw=randrw --bs=4M --direct=1 --ioengine=libaio --size=2G
$ sudo ./bin/rbd device unmap /dev/rbd0
$ ceph -s
  cluster:
    id:     1b6d031f-c462-4236-b88b-7486b1f40603
    health: HEALTH_WARN
            3 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum a (age 64m)
    mgr: x(active, since 64m)
    osd: 1 osds: 1 up (since 63m), 1 in (since 64m)

  data:
    pools:   3 pools, 65 pgs
    objects: 277 objects, 1.1 GiB
    usage:   2.1 GiB used, 99 GiB / 101 GiB avail
    pgs:     65 active+clean

  io:
    client:   112 KiB/s rd, 90 MiB/s wr, 134 op/s rd, 67 op/s wr

$ # configure OSD setting to make the OSD full
$ ceph osd set-nearfull-ratio 0.017
$ ceph osd set-backfillfull-ratio 0.018
$ ceph osd set-full-ratio 0.020
$ ceph -s
  cluster:
    id:     1b6d031f-c462-4236-b88b-7486b1f40603
    health: HEALTH_ERR
            1 full osd(s)
            3 pool(s) full
            3 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum a (age 73m)
    mgr: x(active, since 73m)
    osd: 1 osds: 1 up (since 72m), 1 in (since 72m)

  data:
    pools:   3 pools, 65 pgs
    objects: 277 objects, 1.1 GiB
    usage:   2.1 GiB used, 99 GiB / 101 GiB avail
    pgs:     65 active+clean
$ rbd rm data/img1 --debug_rbd=10
...
2024-03-07T16:34:26.927-0500 7f19c6ffd6c0 10 librbd::ImageCtx: 0x55b8470abab0 ~ImageCtx
2024-03-07T16:34:26.927-0500 7f19ff594580 10 librbd::trash::MoveRequest: 0x7f19c80029f0 trash_add: 
2024-03-07T16:34:26.928-0500 7f19e3fff6c0 10 librbd::trash::MoveRequest: 0x7f19c80029f0 handle_trash_add: r=-28
2024-03-07T16:34:26.928-0500 7f19e3fff6c0 -1 librbd::trash::MoveRequest: 0x7f19c80029f0 handle_trash_add: failed to add image to trash: (28) No space left on device
2024-03-07T16:34:26.928-0500 7f19e3fff6c0 10 librbd::trash::MoveRequest: 0x7f19c80029f0 finish: r=-28
2024-03-07T16:34:26.929-0500 7f19ff594580 -1 librbd::api::Trash: move: error setting trash image state: (2) No such file or directory
Removing image: 0% complete...failed.
rbd: delete error: (2) No such file or directory

This issue is also documented as a FIXME in test/librbd/test_librbd.cc,

TEST_F(TestLibRBD, RemoveFullTry)
{
  ...

  // FIXME: this is a workaround for rbd_trash object being created
  // on the first remove -- pre-create it to avoid bumping into quota
  ASSERT_EQ(0, create_image(ioctx, image_name.c_str(), 0, &order));
  ASSERT_EQ(0, rbd_remove(ioctx, image_name.c_str()));
  remove_full_try(ioctx, image_name, pool_name);

  rados_ioctx_destroy(ioctx);
}

https://github.com/ceph/ceph/blob/v19.0.0/src/test/librbd/test_librbd.cc#L2193

Actions #1

Updated by Ramana Raja about 2 months ago

  • Description updated (diff)
Actions #2

Updated by Ramana Raja about 1 month ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56310
Actions

Also available in: Atom PDF