Bug #2346
xfs filesystem on top of rbd volume corrupts
0%
Description
Ceph version: 0.44.1-1~bpo70+1
Kernel version: 3.2.12-1
Ceph config:
[global]
auth supported = cephx
keyring = /srv/ceph/keyring.admin
[mon]
mon data = /srv/ceph/mon
[mon.n3c1]
host = n3c1
mon addr = 1.1.1.1:6789
[mon.n8c1]
host = n8c1
mon addr = 2.2.2.2:6789
[mon.n4c1]
host = n4c1
mon addr = 3.3.3.3:6789
[mds]
debug mds = 1
keyring = /srv/ceph/ceph-stage2/keyring.$name
[mds.n3c1]
host = n3c1
mds standby replay = true
mds standby for name = n4c1
[mds.n4c1]
host = n4c1
[osd]
osd data = /srv/ceph/$name
osd journal = /srv/ceph/$name.journal
osd journal size = 1000
filestore btrfs snap = 0
keyring = /srv/ceph/ceph-stage2/keyring.$name
debug osd = 1
[osd.1]
host = n3c1
[osd.0]
host = n4c1
On some of my rbd volumes xfs filesystem corrupts randomly. Not all rbd volumes in my ceph cluster were affected (so far).
Apr 17 01:38:56 i-10-0-8-28 kernel: [780277.487802] libceph: mon0 1.1.1.1:6789 socket closed Apr 17 01:38:56 i-10-0-8-28 kernel: [780277.487815] libceph: mon0 1.1.1.1:6789 session lost, hunting for new mon Apr 17 01:39:16 i-10-0-8-28 kernel: [780296.869187] libceph: mon0 1.1.1.1:6789 session established Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927290] ffff88001ceb3000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927302] XFS (rbd0): Internal error xfs_btree_check_sblock at line 119 of file /build/buildd-linux-2.6_3.2.12-1-amd64-FiPNYf/linux-2.6-3.2.12/debian/build/source_amd64_none/fs/xfs/xfs_btree.c. Caller 0xffffffffa03a3864 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927304] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927312] Pid: 9429, comm: postgres Not tainted 3.2.0-2-amd64 #1 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927316] Call Trace: Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927335] [<ffffffffa038171d>] ? xfs_corruption_error+0x54/0x6f [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927347] [<ffffffffa03a3864>] ? xfs_btree_read_buf_block.constprop.21+0x75/0x9a [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927360] [<ffffffffa03a30ef>] ? xfs_btree_check_sblock+0xe4/0xfd [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927371] [<ffffffffa03a3864>] ? xfs_btree_read_buf_block.constprop.21+0x75/0x9a [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927382] [<ffffffffa03a3864>] ? xfs_btree_read_buf_block.constprop.21+0x75/0x9a [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927389] [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927394] [<ffffffff81006c52>] ? check_events+0x12/0x20 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927404] [<ffffffffa03a3904>] ? xfs_btree_lookup_get_block+0x7b/0xa3 [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927416] [<ffffffffa03a668b>] ? xfs_btree_lookup+0x123/0x3cc [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927429] [<ffffffffa03b2dfd>] ? xfs_dialloc+0x290/0x77e [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927433] [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927445] [<ffffffffa03b4406>] ? xfs_ialloc+0x58/0x4ff [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927460] [<ffffffffa0390cb9>] ? kmem_zone_alloc+0x27/0x71 [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927475] [<ffffffffa038ddbc>] ? xfs_dir_ialloc+0x74/0x24d [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927488] [<ffffffffa03c2595>] ? xfs_log_reserve+0xe6/0xfe [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927500] [<ffffffffa03be9b2>] ? xfs_trans_reserve+0xc8/0x193 [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927504] [<ffffffff8103642f>] ? should_resched+0x5/0x23 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927510] [<ffffffff81347d3f>] ? _cond_resched+0x7/0x1c Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927520] [<ffffffffa038f68e>] ? xfs_create+0x313/0x540 [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927530] [<ffffffffa0387f94>] ? xfs_vn_mknod+0xd1/0x15d [xfs] Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927536] [<ffffffff811031c4>] ? vfs_create+0x66/0x88 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927539] [<ffffffff81101109>] ? d_alloc_and_lookup+0x3a/0x60 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927544] [<ffffffff81103c09>] ? do_last+0x25b/0x58d Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927548] [<ffffffff81104533>] ? path_openat+0xce/0x32a Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927553] [<ffffffff810e95df>] ? arch_local_irq_disable+0x7/0x8 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927557] [<ffffffff81104851>] ? do_filp_open+0x2a/0x6e Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927561] [<ffffffff81347d3f>] ? _cond_resched+0x7/0x1c Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927565] [<ffffffff811b1c39>] ? __strncpy_from_user+0x18/0x48 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927569] [<ffffffff8110d587>] ? alloc_fd+0x64/0x109 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927573] [<ffffffff810f8880>] ? do_sys_open+0x5e/0xe5 Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927577] [<ffffffff8134e012>] ? system_call_fastpath+0x16/0x1b Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927581] XFS (rbd0): Corruption detected. Unmount and run xfs_repair
History
#1 Updated by Sage Weil over 11 years ago
- Category set to librbd
- Priority changed from Normal to High
- Source changed from Development to Community (user)
#2 Updated by Sage Weil over 11 years ago
- Status changed from New to Need More Info
Has this issue been diagnosed?
#3 Updated by Maciej Galkiewicz over 11 years ago
I am not 100% sure but it looks like kernel 3.2.17-1 fixed the problem. Let's wait 4 weeks to make sure of it.
#4 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Resolved
No news is good news!