Bug #7474: Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read] - CephFS - Ceph

Actions

Copy link

Bug #7474

closed

Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read]

Added by Peter Waller about 10 years ago. Updated almost 8 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

kceph

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I'm on Ubuntu 13.10 and I've installed the packages distributed with it (ceph-deploy 1.2.3-0ubuntu1 and `ceph` 0.67.4-0ubuntu2.2). I'm not sure whether it's appropriate to file it upstream or here, I'm happy to file upstream if appropriate.

Mostly for my educational interest I was testing a distributed setup with two machines, let's call them `mon+mds+osd0` and `osd1`. I appreciate that `mon+osd0` might be a bad idea but I think that is irrelevant to this bug.

I then mounted cephfs (`mon+mds+osd0`:/) on both machines and proceeded to untar a 1GB file onto it on osd1.

It proceeded for several minutes, and then hung, the machine was stuck in iowait. After a bit longer the following stack trace appeared, which I couldn't find anywhere on the internet.

I would like to try Emperor, but I can't find packages for Ubuntu 13.10, and since this appears to be an issue with cephfs which is in the kernel it is not clear that it would help (nor can I find packages which look like they would upgrade this).

INFO: task tar:7339 blocked for more than 120 seconds.

tar D 0000000000000000 0 7339 751

ffff8800e8e71aa8 0000000000000246 ffff8800e8e71fd8 0000000000014580
 ffff8800e8e71fd8 0000000000014580 ffff8800778f4650 ffff8800ef614e28
 ffff8800e8e71b30 0000000000000002 ffffffff8113eaf0 ffff8800e8e71b20
Call Trace:
 [&lt;ffffffff8113eaf0&gt;] ? wait_on_page_read+0x60/0x60
 [&lt;ffffffff816ea87d&gt;] io_schedule+0x9d/0x130
 [&lt;ffffffff8113eafe&gt;] sleep_on_page+0xe/0x20
 [&lt;ffffffff816e86cb&gt;] __wait_on_bit_lock+0x5b/0xc0
 [&lt;ffffffff8109b8cc&gt;] ? update_cfs_shares+0xac/0x100
 [&lt;ffffffff8113ec0a&gt;] __lock_page+0x6a/0x70
 [&lt;ffffffff81085600&gt;] ? wake_atomic_t_function+0x40/0x40
 [&lt;ffffffff8113f3f4&gt;] find_lock_page+0x54/0x70
 [&lt;ffffffff8113fcbf&gt;] grab_cache_page_write_begin+0x5f/0xd0
 [&lt;ffffffffa037c5ef&gt;] ceph_write_begin+0x6f/0xc0 [ceph]
 [&lt;ffffffff8113ef23&gt;] generic_file_buffered_write+0x103/0x270
 [&lt;ffffffffa0378b6c&gt;] ceph_aio_write+0x8fc/0xaa0 [ceph]
 [&lt;ffffffff81065b42&gt;] ? current_fs_time+0x12/0x60
 [&lt;ffffffff811c07d1&gt;] ? touch_atime+0x71/0x140
 [&lt;ffffffff811a6590&gt;] do_sync_write+0x80/0xb0
 [&lt;ffffffff811a6ccd&gt;] vfs_write+0xbd/0x1e0
 [&lt;ffffffff811a6b97&gt;] ? vfs_read+0xf7/0x170
 [&lt;ffffffff811a7709&gt;] SyS_write+0x49/0xa0

[&lt;ffffffff816f521d&gt;] system_call_fastpath+0x1a/0x1f
xen_netfront: xennet: skb rides the rocket: 19 slots

Actions

Copy link

Updated by Zheng Yan about 10 years ago

are you using 3.8 kernel? if you are, please try 3.12 or 3.13

Actions

Copy link

Updated by Greg Farnum about 10 years ago

Zheng, do you have a specific bug you think this is so we can close it out?

Actions

Copy link

Updated by Peter Waller about 10 years ago

I wasn't on 3.8, it was 3.11. Unfortunately I can't use the machines I was experimenting with for this purpose anymore so won't be able to try to reproduce it again for now. Probably by the time I next get a chance to try Ubuntu Trusty will be out. Apologies.

Actions

Copy link

Updated by Greg Farnum about 10 years ago

Status changed from New to Won't Fix

This looks like it's the writeback deadlock when trying to flush from the client to the OSD on a single memory-constrained host.

Actions

Copy link

Updated by Markus Blank-Burian almost 10 years ago

I have encountered the same issue on v3.12.17. Is there already a patch available for this one?

[Fri May 2 15:03:25 2014] SysRq : Show Blocked State
[Fri May 2 15:03:25 2014] task PC stack pid father
[Fri May 2 15:03:25 2014] moldyn2 D ffff8800decde370 0 4283 1 0x00000004
[Fri May 2 15:03:25 2014] ffff8800df3b7b00 0000000000000002 ffffffff81610430 ffff8800df3b7fd8
[Fri May 2 15:03:25 2014] ffff8800df3b7fd8 0000000000011b00 ffff8800decddf00 ffff88011bc11b00
[Fri May 2 15:03:25 2014] ffff8800decddf00 ffff8800df3b7ba0 0000000000000002 ffffffff810a33b0
[Fri May 2 15:03:25 2014] Call Trace:
[Fri May 2 15:03:25 2014] [<ffffffff810a33b0>] ? wait_on_page_read+0x37/0x37
[Fri May 2 15:03:25 2014] [<ffffffff813c4fa9>] schedule+0x60/0x62
[Fri May 2 15:03:25 2014] [<ffffffff813c5149>] io_schedule+0x5b/0x75
[Fri May 2 15:03:25 2014] [<ffffffff810a33b9>] sleep_on_page+0x9/0xd
[Fri May 2 15:03:25 2014] [<ffffffff813c3155>] __wait_on_bit_lock+0x41/0x89
[Fri May 2 15:03:25 2014] [<ffffffff810a346a>] __lock_page+0x64/0x66
[Fri May 2 15:03:25 2014] [<ffffffff81046909>] ? wake_atomic_t_function+0x28/0x28
[Fri May 2 15:03:25 2014] [<ffffffff810a393a>] ? find_get_page+0x64/0x70
[Fri May 2 15:03:25 2014] [<ffffffff810a3baf>] lock_page+0x19/0x1c
[Fri May 2 15:03:25 2014] [<ffffffff810a3c01>] find_lock_page+0x2e/0x50
[Fri May 2 15:03:25 2014] [<ffffffff810a4298>] grab_cache_page_write_begin+0x4e/0xb3
[Fri May 2 15:03:25 2014] [<ffffffffa05e9840>] ceph_write_begin+0x37/0x69 [ceph]
[Fri May 2 15:03:25 2014] [<ffffffff810a449f>] generic_file_buffered_write+0xf7/0x20b
[Fri May 2 15:03:25 2014] [<ffffffffa05e7135>] ceph_aio_write+0x6d0/0x809 [ceph]
[Fri May 2 15:03:25 2014] [<ffffffffa05e76f4>] ? ceph_aio_read+0x486/0x4fb [ceph]
[Fri May 2 15:03:25 2014] [<ffffffff81008bc4>] ? native_sched_clock+0x39/0x3b
[Fri May 2 15:03:25 2014] [<ffffffff81052e47>] ? sched_clock_local+0x12/0x72
[Fri May 2 15:03:25 2014] [<ffffffff810f0089>] do_sync_write+0x54/0x73
[Fri May 2 15:03:25 2014] [<ffffffff810f03ff>] vfs_write+0xad/0x113
[Fri May 2 15:03:25 2014] [<ffffffff810f0a50>] SyS_write+0x41/0x74
[Fri May 2 15:03:25 2014] [<ffffffff813c709b>] tracesys+0xdd/0xe2

Actions

Copy link

Updated by Peter Waller almost 10 years ago

There is an interesting article which details why this happens for nfs, and what they're doing to fix the problem there:

http://lwn.net/Articles/595652/

The answer at the moment for ceph is: don't run server and client on the same machine (in kernel space).

It sounds like it might work if you run the client with Fuse.

I'd love to see this work because I'd like to try out ceph as a cluster filesystem across machines which would consume that filesystem. But I suspect ceph assumes you have dedicated hardware for hosting the filesystem.

That's why it's marked as "WONTFIX".

Ceph devs, any chance this situation might ever change?

Actions

Copy link

Updated by Markus Blank-Burian almost 10 years ago

On deadlocked node, i have only the cephfs kernel client running and no OSD. Memory is plenty available and I did not see any OOM messages on the affected nodes.

Previously I had the Fuse-Client running, but I wanted to move to the kernel module because of fcntl locking support and better memory usage.

Actions

Copy link

Updated by Zheng Yan almost 10 years ago

is there hang OSD request ? (in /sys/kernel/debug/ceph/*/mdsc)

Actions

Copy link

Updated by Markus Blank-Burian almost 10 years ago

There are some outstanding osd requests, as i can see. Restarting the corresponding OSDs unfroze the hanging tasks.

kaa-2 0e37fa0d-fc15-4783-9f8c-d29c92fb13a1.client2559223 # cat mdsc
kaa-2 0e37fa0d-fc15-4783-9f8c-d29c92fb13a1.client2559223 # cat osdc
1088784 osd7 0.48759e27 100001f6901.0000001f write startsync
1312876 osd7 0.5291744a 1000021a47e.00000002 read

kaa-4 0e37fa0d-fc15-4783-9f8c-d29c92fb13a1.client2559229 # cat mdsc
kaa-4 0e37fa0d-fc15-4783-9f8c-d29c92fb13a1.client2559229 # cat osdc
290154 osd33 0.df63fc4b 1000020af22.00000003 write startsync
290155 osd33 0.df63fc4b 1000020af22.00000003 write startsync

Actions

Copy link

#10