Project

General

Profile

Bug #6984

RBD volume not mountable after creating 8 or more snapshots

Added by Michel Nederlof over 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
rbd, libceph
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Currently using Ceph RBD volumes for SAMBA shares. Each volume is regularly snapshotted and the snapshots are mounted in order to use the shadow_copy module of samba.

for example, when using two volumes:
$ rbd showmapped
id pool image snap device
1 rbd samba-share - /dev/rbd1
3 rbd samba-share 2013.12.09-12.00.01 /dev/rbd3
4 rbd samba-share 2013.12.10-06.00.01 /dev/rbd4
5 rbd samba-share 2013.12.10-12.00.01 /dev/rbd5
6 rbd samba-share 2013.12.11-06.00.01 /dev/rbd6
7 rbd samba-share 2013.12.11-12.00.01 /dev/rbd7
8 rbd samba-share 2013.12.12-06.00.01 /dev/rbd8
9 rbd samba-ext4 - /dev/rbd9
11 rbd samba-ext4 2013.12.09-12.00.01 /dev/rbd11
12 rbd samba-ext4 2013.12.10-06.00.01 /dev/rbd12
13 rbd samba-ext4 2013.12.10-12.00.01 /dev/rbd13
14 rbd samba-ext4 2013.12.11-06.00.01 /dev/rbd14
15 rbd samba-ext4 2013.12.11-12.00.01 /dev/rbd15
16 rbd samba-ext4 2013.12.12-06.00.01 /dev/rbd16
17 rbd samba-ext4 2013.12.12-12.00.01 /dev/rbd17
18 rbd samba-share 2013.12.12-12.00.01 /dev/rbd18

$ mount
/dev/rbd1p1 on /storage/rbd type xfs (rw,noatime,logbsize=256k)
/dev/rbd3p1 on type xfs (ro,nouuid,norecovery)
/dev/rbd4p1 on type xfs (ro,nouuid,norecovery)
/dev/rbd5p1 on type xfs (ro,nouuid,norecovery)
/dev/rbd6p1 on type xfs (ro,nouuid,norecovery)
/dev/rbd7p1 on type xfs (ro,nouuid,norecovery)
/dev/rbd8p1 on type xfs (ro,nouuid,norecovery)
/dev/rbd9p1 on /storage/samba-ext4 type ext4 (rw,noatime)
/dev/rbd11p1 on type ext4 (ro,norecovery)
/dev/rbd12p1 on type ext4 (ro,norecovery)
/dev/rbd13p1 on type ext4 (ro,norecovery)
/dev/rbd14p1 on type ext4 (ro,norecovery)
/dev/rbd15p1 on type ext4 (ro,norecovery)
/dev/rbd16p1 on type ext4 (ro,norecovery)
/dev/rbd17p1 on type ext4 (ro,norecovery)
/dev/rbd18p1 on type xfs (ro,nouuid,norecovery)

A cronjob creates a new snapshot every 6 hours and removes one snapshot as a workaround.

Client rbd version: 0.72.1 (installed from ceph repository)
Client kernel version: 3.11.0-14-generic #21-Ubuntu (installed from ubuntu saucy repository)
Cluster ceph version: 0.67.4 (installed from ubuntu saucy repository)
Cluster kernel version: 3.11.0-13-generic #20-Ubuntu (installed from ubuntu saucy repository)

The crash occurs on the mount of the volume.

The script executes these commands when creating a snapshot of the volume:
$ sudo sync
$ sudo xfs_freeze -f /storage/rbd/
$ sudo rbd snap create
$ sudo xfs_freeze -u /storage/rbd/
$ sudo mkdir
$ sudo rbd map
$ sudo mount / -o ro,nouuid,norecovery
(at this point the system crashes... makes some mention of fixing a recursive loop...)

Also tried on a ext4 volume, made no difference.

When rebooted, i get this message while trying to mount the rbd volume (the main volume, not a snapshot)
Trace in the dmesg:
[ 85.104240] ------------[ cut here ]------------
[ 85.104314] Kernel BUG at ffffffffa011e5e2 [verbose debug info unavailable]
[ 85.104398] invalid opcode: 0000 [#1] SMP
[ 85.104500] Modules linked in: xfs(F) rbd(F) vmxnet(OF) vmhgfs(OF) nfsd(F) auth_rpcgss(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) ceph(F) libceph(F) libcrc32c(F) coretemp vmw_balloon(F) microcode(F) psmouse(F) serio_raw(F) vmwgfx ttm i2c_piix4 shpchp vmw_vmci drm mac_hid lp(F) parport(F) vmxnet3(F) vmw_pvscs i(F)
[ 85.105575] CPU: 0 PID: 2362 Comm: mount Tainted: GF O 3.11.0-14-generic #21-Ubuntu
[ 85.105681] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011
[ 85.105808] task: ffff88013709ddc0 ti: ffff880137122000 task.ti: ffff880137122000
[ 85.105904] RIP: 0010:[<ffffffffa011e5e2>] [<ffffffffa011e5e2>] ceph_osdc_build_request+0x4c2/0x510 [libceph]
[ 85.106059] RSP: 0018:ffff880137123828 EFLAGS: 00010216
[ 85.106134] RAX: ffff8801369a4500 RBX: 0000000000000001 RCX: ffff8801369a45f2
[ 85.106251] RDX: ffff8801369a45ef RSI: ffff8801370f1308 RDI: ffff880135169360
[ 85.106344] RBP: ffff880137123880 R08: 00000000000179f0 R09: 0000000000000000
[ 85.106437] R10: ffff880137233450 R11: 3664303330303030 R12: 0000000000080000
[ 85.106529] R13: 0000000000080000 R14: ffff880137100360 R15: ffff8801369a4592
[ 85.106689] FS: 00007fc1996b2840(0000) GS:ffff88013b200000(0000) knlGS:0000000000000000
[ 85.106919] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 85.107063] CR2: 00007fbb37ee8010 CR3: 000000013708e000 CR4: 00000000000007f0
[ 85.107275] Stack:
[ 85.107384] 00000000002f8000 00000024357fe750 ffff8801370f1308 ffff880135169360
[ 85.107709] ffff880100002201 0000000000080000 ffff8801369980b0 ffff880137100360
[ 85.108034] ffff880137231b40 ffff88013783c780 0000000000000000 ffff8801371238b8
[ 85.108359] Call Trace:
[ 85.108476] [<ffffffffa02ba06f>] rbd_osd_req_format_write+0x4f/0x90 [rbd]
[ 85.108636] [<ffffffffa02bbced>] rbd_img_request_fill+0xdd/0x900 [rbd]
[ 85.108793] [<ffffffff816e9e32>] ? down_read+0x12/0x30
[ 85.108935] [<ffffffffa02bdc5e>] rbd_request_fn+0x1ee/0x320 [rbd]
[ 85.109092] [<ffffffff81335e73>] __blk_run_queue+0x33/0x40
[ 85.109235] [<ffffffff8133601a>] queue_unplugged+0x2a/0xa0
[ 85.109379] [<ffffffff81338b68>] blk_flush_plug_list+0x1d8/0x210
[ 85.109528] [<ffffffff81338f74>] blk_finish_plug+0x14/0x40
[ 85.109686] [<ffffffffa02d9c7d>] _xfs_buf_ioapply+0x2cd/0x380 [xfs]
[ 85.109845] [<ffffffffa02db5d5>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[ 85.110001] [<ffffffffa02db536>] xfs_buf_iorequest+0x46/0x90 [xfs]
[ 85.110159] [<ffffffffa02db5d5>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[ 85.110313] [<ffffffffa02dbc04>] xfs_bwrite+0x24/0x60 [xfs]
[ 85.110471] [<ffffffffa0327dbe>] xlog_bwrite+0x7e/0x100 [xfs]
[ 85.110631] [<ffffffffa0328c23>] xlog_write_log_records+0x1b3/0x250 [xfs]
[ 85.110800] [<ffffffffa0328dd8>] xlog_clear_stale_blocks+0x118/0x1c0 [xfs]
[ 85.110970] [<ffffffffa03286cf>] ? xlog_bread+0x3f/0x50 [xfs]
[ 85.111130] [<ffffffffa032b569>] xlog_find_tail+0x339/0x3d0 [xfs]
[ 85.111293] [<ffffffffa032d30e>] xlog_recover+0x1e/0xc0 [xfs]
[ 85.111454] [<ffffffffa0335fec>] xfs_log_mount+0x9c/0x170 [xfs]
[ 85.111616] [<ffffffffa0330222>] xfs_mountfs+0x352/0x6b0 [xfs]
[ 85.111772] [<ffffffffa02ed6c2>] xfs_fs_fill_super+0x2b2/0x340 [xfs]
[ 85.111926] [<ffffffff811aa4a8>] mount_bdev+0x1b8/0x200
[ 85.112076] [<ffffffffa02ed410>] ? xfs_parseargs+0xc10/0xc10 [xfs]
[ 85.112234] [<ffffffffa02eba25>] xfs_fs_mount+0x15/0x20 [xfs]
[ 85.112380] [<ffffffff811aae19>] mount_fs+0x39/0x1b0
[ 85.112529] [<ffffffff811c4c57>] vfs_kern_mount+0x67/0x100
[ 85.116694] [<ffffffff811c716e>] do_mount+0x23e/0xa20
[ 85.116836] [<ffffffff8114322e>] ? __get_free_pages+0xe/0x50
[ 85.116982] [<ffffffff811c6db6>] ? copy_mount_options+0x36/0x170
[ 85.117130] [<ffffffff811c79d3>] SyS_mount+0x83/0xc0
[ 85.117270] [<ffffffff816f571d>] system_call_fastpath+0x1a/0x1f
[ 85.117416] Code: be 6d 02 00 00 48 c7 c7 20 40 13 a0 e8 b8 38 f4 e0 41 0f b7 34 24 48 c7 c7 58 40 13 a0 31 c0 e8 9e 19 5c e1 31 c0 e9 de fc ff ff <0f> 0b 48 c7 c6 a8 40 13 a0 48 c7 c7 18 ab 13 a0 31 c0 e8 27 ce
[ 85.119498] RIP [<ffffffffa011e5e2>] ceph_osdc_build_request+0x4c2/0x510 [libceph]
[ 85.119756] RSP <ffff880137123828>
[ 85.119896] ---[ end trace 128336a72c6a19ff ]---

The main volume can be mounted again, when the newest snapshot is removed...

History

#1 Updated by Greg Farnum over 10 years ago

  • Project changed from Ceph to Linux kernel client
  • Category deleted (librbd)
  • Target version deleted (v0.74)

Apparently this is because the kernel client can't tolerate arbitrary-length replies from the OSD to different kinds of calls (for memory management reasons, it needs to pre-allocate space for the response to every request it sends).
In particular, it uses a librados "exec" call to list the snapshots available, but has a limited size buffer with which to store the response. Sounds like 8 is too many. :/

#2 Updated by Ilya Dryomov about 10 years ago

  • Status changed from New to Resolved

This has been fixed in 3.12-rc1, commit
03507db631c94a48e316c7f638ffb2991544d617.

Also available in: Atom PDF