Project

General

Profile

Actions

Bug #7474

closed

Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read]

Added by Peter Waller about 10 years ago. Updated almost 8 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm on Ubuntu 13.10 and I've installed the packages distributed with it (ceph-deploy 1.2.3-0ubuntu1 and `ceph` 0.67.4-0ubuntu2.2). I'm not sure whether it's appropriate to file it upstream or here, I'm happy to file upstream if appropriate.

Mostly for my educational interest I was testing a distributed setup with two machines, let's call them `mon+mds+osd0` and `osd1`. I appreciate that `mon+osd0` might be a bad idea but I think that is irrelevant to this bug.

I then mounted cephfs (`mon+mds+osd0`:/) on both machines and proceeded to untar a 1GB file onto it on osd1.

It proceeded for several minutes, and then hung, the machine was stuck in iowait. After a bit longer the following stack trace appeared, which I couldn't find anywhere on the internet.

I would like to try Emperor, but I can't find packages for Ubuntu 13.10, and since this appears to be an issue with cephfs which is in the kernel it is not clear that it would help (nor can I find packages which look like they would upgrade this).

INFO: task tar:7339 blocked for more than 120 seconds.

tar D 0000000000000000 0 7339 751

ffff8800e8e71aa8 0000000000000246 ffff8800e8e71fd8 0000000000014580
ffff8800e8e71fd8 0000000000014580 ffff8800778f4650 ffff8800ef614e28
ffff8800e8e71b30 0000000000000002 ffffffff8113eaf0 ffff8800e8e71b20
Call Trace:
[<ffffffff8113eaf0>] ? wait_on_page_read+0x60/0x60
[<ffffffff816ea87d>] io_schedule+0x9d/0x130
[<ffffffff8113eafe>] sleep_on_page+0xe/0x20
[<ffffffff816e86cb>] __wait_on_bit_lock+0x5b/0xc0
[<ffffffff8109b8cc>] ? update_cfs_shares+0xac/0x100
[<ffffffff8113ec0a>] __lock_page+0x6a/0x70
[<ffffffff81085600>] ? wake_atomic_t_function+0x40/0x40
[<ffffffff8113f3f4>] find_lock_page+0x54/0x70
[<ffffffff8113fcbf>] grab_cache_page_write_begin+0x5f/0xd0
[<ffffffffa037c5ef>] ceph_write_begin+0x6f/0xc0 [ceph]
[<ffffffff8113ef23>] generic_file_buffered_write+0x103/0x270
[<ffffffffa0378b6c>] ceph_aio_write+0x8fc/0xaa0 [ceph]
[<ffffffff81065b42>] ? current_fs_time+0x12/0x60
[<ffffffff811c07d1>] ? touch_atime+0x71/0x140
[<ffffffff811a6590>] do_sync_write+0x80/0xb0
[<ffffffff811a6ccd>] vfs_write+0xbd/0x1e0
[<ffffffff811a6b97>] ? vfs_read+0xf7/0x170
[<ffffffff811a7709>] SyS_write+0x49/0xa0
[<ffffffff816f521d>] system_call_fastpath+0x1a/0x1f
xen_netfront: xennet: skb rides the rocket: 19 slots
Actions

Also available in: Atom PDF