Project

General

Profile

Actions

Bug #2346

closed

xfs filesystem on top of rbd volume corrupts

Added by Maciej Galkiewicz almost 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
librbd
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph version: 0.44.1-1~bpo70+1
Kernel version: 3.2.12-1

Ceph config:
[global]
auth supported = cephx
keyring = /srv/ceph/keyring.admin

[mon]
mon data = /srv/ceph/mon

[mon.n3c1]
host = n3c1
mon addr = 1.1.1.1:6789
[mon.n8c1]
host = n8c1
mon addr = 2.2.2.2:6789
[mon.n4c1]
host = n4c1
mon addr = 3.3.3.3:6789

[mds]
debug mds = 1
keyring = /srv/ceph/ceph-stage2/keyring.$name

[mds.n3c1]
host = n3c1
mds standby replay = true
mds standby for name = n4c1
[mds.n4c1]
host = n4c1

[osd]
osd data = /srv/ceph/$name
osd journal = /srv/ceph/$name.journal
osd journal size = 1000
filestore btrfs snap = 0
keyring = /srv/ceph/ceph-stage2/keyring.$name
debug osd = 1

[osd.1]
host = n3c1
[osd.0]
host = n4c1

On some of my rbd volumes xfs filesystem corrupts randomly. Not all rbd volumes in my ceph cluster were affected (so far).

Apr 17 01:38:56 i-10-0-8-28 kernel: [780277.487802] libceph: mon0 1.1.1.1:6789 socket closed
Apr 17 01:38:56 i-10-0-8-28 kernel: [780277.487815] libceph: mon0 1.1.1.1:6789 session lost, hunting for new mon
Apr 17 01:39:16 i-10-0-8-28 kernel: [780296.869187] libceph: mon0 1.1.1.1:6789 session established
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927290] ffff88001ceb3000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927302] XFS (rbd0): Internal error xfs_btree_check_sblock at line 119 of file /build/buildd-linux-2.6_3.2.12-1-amd64-FiPNYf/linux-2.6-3.2.12/debian/build/source_amd64_none/fs/xfs/xfs_btree.c.  Caller 0xffffffffa03a3864
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927304] 
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927312] Pid: 9429, comm: postgres Not tainted 3.2.0-2-amd64 #1
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927316] Call Trace:
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927335]  [<ffffffffa038171d>] ? xfs_corruption_error+0x54/0x6f [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927347]  [<ffffffffa03a3864>] ? xfs_btree_read_buf_block.constprop.21+0x75/0x9a [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927360]  [<ffffffffa03a30ef>] ? xfs_btree_check_sblock+0xe4/0xfd [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927371]  [<ffffffffa03a3864>] ? xfs_btree_read_buf_block.constprop.21+0x75/0x9a [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927382]  [<ffffffffa03a3864>] ? xfs_btree_read_buf_block.constprop.21+0x75/0x9a [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927389]  [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927394]  [<ffffffff81006c52>] ? check_events+0x12/0x20
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927404]  [<ffffffffa03a3904>] ? xfs_btree_lookup_get_block+0x7b/0xa3 [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927416]  [<ffffffffa03a668b>] ? xfs_btree_lookup+0x123/0x3cc [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927429]  [<ffffffffa03b2dfd>] ? xfs_dialloc+0x290/0x77e [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927433]  [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927445]  [<ffffffffa03b4406>] ? xfs_ialloc+0x58/0x4ff [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927460]  [<ffffffffa0390cb9>] ? kmem_zone_alloc+0x27/0x71 [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927475]  [<ffffffffa038ddbc>] ? xfs_dir_ialloc+0x74/0x24d [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927488]  [<ffffffffa03c2595>] ? xfs_log_reserve+0xe6/0xfe [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927500]  [<ffffffffa03be9b2>] ? xfs_trans_reserve+0xc8/0x193 [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927504]  [<ffffffff8103642f>] ? should_resched+0x5/0x23
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927510]  [<ffffffff81347d3f>] ? _cond_resched+0x7/0x1c
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927520]  [<ffffffffa038f68e>] ? xfs_create+0x313/0x540 [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927530]  [<ffffffffa0387f94>] ? xfs_vn_mknod+0xd1/0x15d [xfs]
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927536]  [<ffffffff811031c4>] ? vfs_create+0x66/0x88
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927539]  [<ffffffff81101109>] ? d_alloc_and_lookup+0x3a/0x60
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927544]  [<ffffffff81103c09>] ? do_last+0x25b/0x58d
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927548]  [<ffffffff81104533>] ? path_openat+0xce/0x32a
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927553]  [<ffffffff810e95df>] ? arch_local_irq_disable+0x7/0x8
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927557]  [<ffffffff81104851>] ? do_filp_open+0x2a/0x6e
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927561]  [<ffffffff81347d3f>] ? _cond_resched+0x7/0x1c
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927565]  [<ffffffff811b1c39>] ? __strncpy_from_user+0x18/0x48
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927569]  [<ffffffff8110d587>] ? alloc_fd+0x64/0x109
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927573]  [<ffffffff810f8880>] ? do_sys_open+0x5e/0xe5
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927577]  [<ffffffff8134e012>] ? system_call_fastpath+0x16/0x1b
Apr 17 03:26:06 i-10-0-8-28 kernel: [786706.927581] XFS (rbd0): Corruption detected. Unmount and run xfs_repair
Actions #1

Updated by Sage Weil almost 12 years ago

  • Category set to librbd
  • Priority changed from Normal to High
  • Source changed from Development to Community (user)
Actions #2

Updated by Sage Weil almost 12 years ago

  • Status changed from New to Need More Info

Has this issue been diagnosed?

Actions #3

Updated by Maciej Galkiewicz almost 12 years ago

I am not 100% sure but it looks like kernel 3.2.17-1 fixed the problem. Let's wait 4 weeks to make sure of it.

Actions #4

Updated by Sage Weil almost 12 years ago

  • Status changed from Need More Info to Resolved

No news is good news!

Actions

Also available in: Atom PDF