Actions
Bug #8335
closedCrash while recovering from XFS corruption
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
There was a FS corruption:
[158483.317151] ffff880e00d4e000: 90 55 4c a6 21 e3 0c 33 6d 22 7e 8a a7 71 7d 27 .UL.!..3m"~..q}' [158483.395094] ffff880e00d4e010: a9 7a b7 45 f6 aa 68 a7 0f da 14 87 33 bb 22 6d .z.E..h.....3."m [158483.472947] ffff880e00d4e020: 50 1d 97 ac 36 fc bf ed aa ca ae 0e 75 1a 97 75 P...6.......u..u [158483.550580] ffff880e00d4e030: 6e f9 96 9a 67 bd 6f bb a7 16 14 e6 cf 08 cc 45 n...g.o........E [158483.628243] XFS (sdb1): Internal error xfs_attr3_leaf_read_verify at line 246 of file /build/buildd/linux-3.13.0/fs/xfs/xfs_attr_leaf.c. Caller 0xffffffffa03156c5 [158483.745645] CPU: 5 PID: 1727 Comm: kworker/5:1H Not tainted 3.13.0-23-generic #45-Ubuntu [158483.745647] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 1.0b 09/19/2012 [158483.745667] Workqueue: xfslogd xfs_buf_iodone_work [xfs] [158483.745670] 0000000000000001 ffff8810075afd68 ffffffff81715384 ffff880d0a478800 [158483.745674] ffff8810075afd80 ffffffffa031853b ffffffffa03156c5 ffff8810075afdb8 [158483.745678] ffffffffa0318595 000000f6f6aafdb7 ffff8806f67af480 ffff880d0a478800 [158483.745682] Call Trace: [158483.745689] [<ffffffff81715384>] dump_stack+0x45/0x56 [158483.745711] [<ffffffffa031853b>] xfs_error_report+0x3b/0x40 [xfs] [158483.745725] [<ffffffffa03156c5>] ? xfs_buf_iodone_work+0x85/0xf0 [xfs] [158483.745737] [<ffffffffa0318595>] xfs_corruption_error+0x55/0x80 [xfs] [158483.745755] [<ffffffffa033594d>] xfs_attr3_leaf_read_verify+0x6d/0xf0 [xfs] [158483.745767] [<ffffffffa03156c5>] ? xfs_buf_iodone_work+0x85/0xf0 [xfs] [158483.745772] [<ffffffff81097488>] ? finish_task_switch+0x128/0x170 [158483.745783] [<ffffffffa03156c5>] xfs_buf_iodone_work+0x85/0xf0 [xfs] [158483.745788] [<ffffffff81083892>] process_one_work+0x182/0x450 [158483.745791] [<ffffffff81084631>] worker_thread+0x121/0x410 [158483.745795] [<ffffffff81084510>] ? rescuer_thread+0x3e0/0x3e0 [158483.745798] [<ffffffff8108b302>] kthread+0xd2/0xf0 [158483.745801] [<ffffffff8108b230>] ? kthread_create_on_node+0x1d0/0x1d0 [158483.745805] [<ffffffff81725cbc>] ret_from_fork+0x7c/0xb0 [158483.745807] [<ffffffff8108b230>] ? kthread_create_on_node+0x1d0/0x1d0 [158483.745810] XFS (sdb1): Corruption detected. Unmount and run xfs_repair [158483.785242] XFS (sdb1): metadata I/O error: block 0x65d6ef5d0 ("xfs_trans_read_buf_map") error 117 numblks 8 [158483.909650] init: ceph-osd (ceph/0) main process (15534) killed by ABRT signal [158483.909677] init: ceph-osd (ceph/0) respawning too fast, stopped
After xfs_repair I restarted osd and it started recovery. But every ~30 minutes it crashes and restarts, probably hitting some broken data.
OSD log attached.
ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
Files
Updated by Dmitry Smirnov almost 10 years ago
How is this related to Ceph?
Corruption on xfs may be manifestation of hardware errors or a kernel bug.
I would close this bug as "invalid".
Updated by Pavel Veretennikov almost 10 years ago
After successful XFS recovery Ceph is crashing. Or Ceph cannot recover itself? Do I need to delete this OSD and create new one?
Updated by Sage Weil almost 10 years ago
- Status changed from New to Rejected
- Source changed from other to Community (user)
Recreate this OSD (after you confirm the cluster has other copies of the data :)
Actions