Project

General

Profile

Actions

Bug #8335

closed

Crash while recovering from XFS corruption

Added by Pavel Veretennikov almost 10 years ago. Updated almost 10 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There was a FS corruption:

[158483.317151] ffff880e00d4e000: 90 55 4c a6 21 e3 0c 33 6d 22 7e 8a a7 71 7d 27  .UL.!..3m"~..q}'
[158483.395094] ffff880e00d4e010: a9 7a b7 45 f6 aa 68 a7 0f da 14 87 33 bb 22 6d  .z.E..h.....3."m
[158483.472947] ffff880e00d4e020: 50 1d 97 ac 36 fc bf ed aa ca ae 0e 75 1a 97 75  P...6.......u..u
[158483.550580] ffff880e00d4e030: 6e f9 96 9a 67 bd 6f bb a7 16 14 e6 cf 08 cc 45  n...g.o........E
[158483.628243] XFS (sdb1): Internal error xfs_attr3_leaf_read_verify at line 246 of file /build/buildd/linux-3.13.0/fs/xfs/xfs_attr_leaf.c.  Caller 0xffffffffa03156c5
[158483.745645] CPU: 5 PID: 1727 Comm: kworker/5:1H Not tainted 3.13.0-23-generic #45-Ubuntu
[158483.745647] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 1.0b 09/19/2012
[158483.745667] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
[158483.745670]  0000000000000001 ffff8810075afd68 ffffffff81715384 ffff880d0a478800
[158483.745674]  ffff8810075afd80 ffffffffa031853b ffffffffa03156c5 ffff8810075afdb8
[158483.745678]  ffffffffa0318595 000000f6f6aafdb7 ffff8806f67af480 ffff880d0a478800
[158483.745682] Call Trace:
[158483.745689]  [<ffffffff81715384>] dump_stack+0x45/0x56
[158483.745711]  [<ffffffffa031853b>] xfs_error_report+0x3b/0x40 [xfs]
[158483.745725]  [<ffffffffa03156c5>] ? xfs_buf_iodone_work+0x85/0xf0 [xfs]
[158483.745737]  [<ffffffffa0318595>] xfs_corruption_error+0x55/0x80 [xfs]
[158483.745755]  [<ffffffffa033594d>] xfs_attr3_leaf_read_verify+0x6d/0xf0 [xfs]
[158483.745767]  [<ffffffffa03156c5>] ? xfs_buf_iodone_work+0x85/0xf0 [xfs]
[158483.745772]  [<ffffffff81097488>] ? finish_task_switch+0x128/0x170
[158483.745783]  [<ffffffffa03156c5>] xfs_buf_iodone_work+0x85/0xf0 [xfs]
[158483.745788]  [<ffffffff81083892>] process_one_work+0x182/0x450
[158483.745791]  [<ffffffff81084631>] worker_thread+0x121/0x410
[158483.745795]  [<ffffffff81084510>] ? rescuer_thread+0x3e0/0x3e0
[158483.745798]  [<ffffffff8108b302>] kthread+0xd2/0xf0
[158483.745801]  [<ffffffff8108b230>] ? kthread_create_on_node+0x1d0/0x1d0
[158483.745805]  [<ffffffff81725cbc>] ret_from_fork+0x7c/0xb0
[158483.745807]  [<ffffffff8108b230>] ? kthread_create_on_node+0x1d0/0x1d0
[158483.745810] XFS (sdb1): Corruption detected. Unmount and run xfs_repair
[158483.785242] XFS (sdb1): metadata I/O error: block 0x65d6ef5d0 ("xfs_trans_read_buf_map") error 117 numblks 8
[158483.909650] init: ceph-osd (ceph/0) main process (15534) killed by ABRT signal
[158483.909677] init: ceph-osd (ceph/0) respawning too fast, stopped

After xfs_repair I restarted osd and it started recovery. But every ~30 minutes it crashes and restarts, probably hitting some broken data.

OSD log attached.
ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)


Files

ceph-osd.log.gz (2.97 MB) ceph-osd.log.gz Pavel Veretennikov, 05/12/2014 07:38 AM
Actions #1

Updated by Dmitry Smirnov almost 10 years ago

How is this related to Ceph?
Corruption on xfs may be manifestation of hardware errors or a kernel bug.

I would close this bug as "invalid".

Actions #2

Updated by Pavel Veretennikov almost 10 years ago

After successful XFS recovery Ceph is crashing. Or Ceph cannot recover itself? Do I need to delete this OSD and create new one?

Actions #3

Updated by Sage Weil almost 10 years ago

  • Status changed from New to Rejected
  • Source changed from other to Community (user)

Recreate this OSD (after you confirm the cluster has other copies of the data :)

Actions

Also available in: Atom PDF