Project

General

Profile

Bug #46927

Nautilus: RBD map hangs or fails with kernel: [98051.490691] rbd: rbd_dev_v2_snap_context: rbd_obj_method_sync returned -512

Added by Kishore Kasi about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
librbd
Target version:
% Done:

0%

Source:
Tags:
rbd
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
krbd
Pull request ID:
Crash signature:

Description

Ceph Version: ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)

Aug 7 22:26:27 ceph-rbd-client-1 kernel: [97749.115664] rbd: got int token 0 val 2048
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97749.375646] rbd: rbd_dev_create rbd_dev 0000000032b21531 dev_id 1
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97749.381823] rbd: rbd id object name is rbd_id.stashstaging
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.101527] rbd: rbd_dev_image_id: rbd_obj_method_sync returned 16
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.101528] rbd: image_id is 38a4eb81354f
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.101529] rbd: __rbd_register_watch rbd_dev 0000000032b21531
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.108576] rbd: _rbd_dev_v2_snap_size: rbd_obj_method_sync returned 9
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.108578] rbd: order 22
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.108580] rbd: snap_id 0xfffffffffffffffe snap_size = 17592186044416
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.109487] rbd: rbd_dev_v2_object_prefix: rbd_obj_method_sync returned 25
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.109488] rbd: object_prefix = rbd_data.38a4eb81354f
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.110373] rbd: _rbd_dev_v2_snap_features: rbd_obj_method_sync returned 16
Aug 7 22:26:28 ceph-rbd-client-1 kernel: [97750.110374] rbd: snap_id 0xfffffffffffffffe features = 0x0000000000000005 incompat = 0x0000000000000005
Aug 7 22:31:30 ceph-rbd-client-1 kernel: [98051.490691] rbd: rbd_dev_v2_snap_context: rbd_obj_method_sync returned -512
Aug 7 22:31:30 ceph-rbd-client-1 kernel: [98051.502103] rbd: cancel_tasks_sync rbd_dev 0000000032b21531
Aug 7 22:31:30 ceph-rbd-client-1 kernel: [98051.502105] rbd: __rbd_unregister_watch rbd_dev 0000000032b21531
Aug 7 22:31:30 ceph-rbd-client-1 kernel: [98052.119343] rbd: image stashstaging: failed to unwatch: -512

History

#1 Updated by Kishore Kasi about 1 month ago

This is likely due to disconnect between librbd and libobjclass_rbd during rbd dev create flow -

librbd limits the number of snapcount during retrieve -

rbd.c -

/*
5330 * We'll need room for the seq value (maximum snapshot id),
5331 * snapshot count, and array of that many snapshot ids.
5332 * For now we have a fixed upper limit on the number we're
5333 * prepared to receive.
5334 /
*5335 size = sizeof (_le64) + sizeof (_le32) +
5336 RBD_MAX_SNAP_COUNT * sizeof (__le64);

5337 reply_buf = kzalloc(size, GFP_KERNEL);

cls_rbd.c

int max_read = RBD_MAX_KEYS_READ;
vector<snapid_t> snap_ids;
string last_read = RBD_SNAP_KEY_PREFIX;
bool more;
do {
set<string> keys;
r = cls_cxx_map_get_keys(hctx, last_read, max_read, &keys, &more);
if (r < 0)
return r;
for (auto it = keys.begin(); it != keys.end(); ++it) {
if ((*it).find(RBD_SNAP_KEY_PREFIX) != 0)
break;
snapid_t snap_id = snap_id_from_key(*it);
snap_ids.push_back(snap_id);
}
if (!keys.empty())
last_read = *(keys.rbegin());
} while (more);

#2 Updated by Kishore Kasi about 1 month ago

Noticed this issue when the snapshot count was more than 510 and 'rbd map' was successful when a bunch of snapshots were removed.

Also available in: Atom PDF