Bug #17115: kernel panic when running IO with cephfs and resource pool becomes full - CephFS - Ceph

Actions

Copy link

Bug #17115

closed

kernel panic when running IO with cephfs and resource pool becomes full

Added by Rohith Radhakrishnan over 7 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v10.2.2

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Steps:-

Create a data pool with limited quota size and start running IO from client. After the pool becomes full, client kernel will panic and we will not be able to unmount the cephfs FS till a further reboot attaching logs.

Config:-
Linux Rack3-Ramp-2 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux. Ubuntu 16.04.1

Files

file (107 KB) file

Rohith Radhakrishnan, 08/24/2016 01:12 PM

Actions

Copy link

Updated by Zheng Yan over 7 years ago

These are warning (write blocked for too long) instead of panic. When pool is full, write osd requests get paused. If you add new storage to the cluster, these blocked write will recover.

Actions

Copy link

Updated by Rohith Radhakrishnan over 7 years ago

We increased the pool size to a higher size. but system is in same state

Steps done:-

===================================================================================================================================
ceph osd pool get-quota cephfs_data
quotas for pool 'cephfs_data':
max objects: N/A
max bytes : 679GB ===================================================================================================================================
ceph osd pool set-quota cephfs_data max_bytes 1073741824000
set-quota max_bytes = 1073741824000 for pool cephfs_data ==================================================================================================================================
ceph osd pool get-quota cephfs_data
quotas for pool 'cephfs_data':
max objects: N/A
max bytes : 1000GB ===================================================================================================================================
ceph -w
cluster bc03d542-bff6-4cde-9a8d-c6a71b1648c3
health HEALTH_WARN
too many PGs per OSD (303 > max 300)
pool 'cephfs_data' is full
monmap e1: 1 mons at {rack2-client-3=10.242.42.216:6789/0}
election epoch 4, quorum 0 rack2-client-3
fsmap e5: 1/1/1 up {0=rack2-client-3=up:active}
osdmap e123: 32 osds: 32 up, 32 in
flags sortbitwise
pgmap v3434: 3262 pgs, 4 pools, 1025 GB data, 328 kobjects
2179 GB used, 221 TB / 223 TB avail
3262 active+clean

2016-08-24 21:40:39.100408 mon.0 [INF] pgmap v3437: 3262 pgs: 3262 active+clean; 1025 GB data, 2179 GB used, 221 TB / 223 TB avail
2016-08-24 21:40:39.164029 mon.0 [INF] pool 'cephfs_data' no longer full; removing FULL flag
2016-08-24 21:40:39.170770 mon.0 [INF] osdmap e125: 32 osds: 32 up, 32 in

========================================================================================================================================================
sudo umount -f /mnt/mycephfs
umount: /mnt/mycephfs: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).) =======================================================================================================================================================
getfattr -n ceph.dir.layout /mnt/mycephfs
getfattr: /mnt/mycephfs: Input/output error ==================================================================================================================================================

Also the dmesg warnings still exist and we cannot unmount the FS.

Actions

Copy link

Updated by Jeff Layton over 7 years ago

4.4.0 is pretty old at this point, and there are some fixes that may help this that have gone upstream since then. Is there any way for you to try a newer kernel on the client and let us know if this is reproducible there? Something from the v4.8-rc* series would be ideal...

Actions

Copy link

Updated by Greg Farnum over 7 years ago

Turns out we don't actually test the kernel against full pools; see #9466 for updates on it.

Actions

Copy link

Updated by Rohith Radhakrishnan over 7 years ago

4.4 is is the latest for Ubuntu 14.04.5. But let me see if i can get hold of a 16.04 machine with 4.8 kernel and try to reproduce it there. Will get back at the latest.

Actions

Copy link

Updated by Rohith Radhakrishnan over 7 years ago

Rohith Radhakrishnan wrote:

4.4 is is the latest for Ubuntu 14.04.5. But let me see if i can get hold of a 16.04 machine with 4.8 kernel and try to reproduce it there. Will get back at the earliest.

Actions

Copy link

Updated by Rohith Radhakrishnan over 7 years ago

@Greg Farnum: How to proceed now. Is there a need to test with 4.8 kernel now?

Actions

Copy link

Updated by Greg Farnum over 7 years ago

It would be helpful; we're still surprised that this is a problem. Just noting that we don't include it in our nightlies and want to.

Actions

Copy link

Updated by Rohith Radhakrishnan over 7 years ago

Reproduced with below 4.8 kernel :-
rack6-client-5:~$ uname -a
Linux rack6-client-5 4.4.8-040408-generic #201604200335 SMP Wed Apr 20 07:37:30 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
ems@rack6-client-5:~$ cat /etc/issue
Ubuntu 16.04.1 LTS \n \l ==========================================================================================================================
kernel warnings are a bit different but looks related.
INFO: task umount:8907 blocked for more than 120 seconds.
[ 7800.358090] Not tainted 4.4.8-040408-generic #201604200335
[ 7800.359900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7800.362276] umount D ffff8803cea57ae8 0 8907 8906 0x00000000
[ 7800.362292] ffff8803cea57ae8 0000000000000002 ffff880ff7c36040 ffff880ff70ec4c0
[ 7800.362294] ffff8803cea58000 ffff880ffebd6b00 7fffffffffffffff ffffffff81802980
[ 7800.362295] ffff8803cea57c40 ffff8803cea57b00 ffffffff81802185 0000000000000000
[ 7800.362296] Call Trace:
[ 7800.362301] [<ffffffff81802980>] ? bit_wait+0x60/0x60
[ 7800.362303] [<ffffffff81802185>] schedule+0x35/0x80
[ 7800.362304] [<ffffffff818052a5>] schedule_timeout+0x1b5/0x270
[ 7800.362307] [<ffffffff8118d89b>] ? find_get_entries+0x12b/0x140
[ 7800.362309] [<ffffffff81802980>] ? bit_wait+0x60/0x60
[ 7800.362310] [<ffffffff818016d4>] io_schedule_timeout+0xa4/0x110
[ 7800.362311] [<ffffffff8180299b>] bit_wait_io+0x1b/0x70
[ 7800.362312] [<ffffffff8180252d>] __wait_on_bit+0x5d/0x90
[ 7800.362314] [<ffffffff8118a5fb>] wait_on_page_bit+0xcb/0xf0
[ 7800.362316] [<ffffffff810c2780>] ? autoremove_wake_function+0x40/0x40
[ 7800.362318] [<ffffffff8118a713>] __filemap_fdatawait_range+0xf3/0x160
[ 7800.362320] [<ffffffff8118d1e1>] filemap_fdatawait_keep_errors+0x21/0x30
[ 7800.362322] [<ffffffff8123708a>] sync_inodes_sb+0x16a/0x1f0
[ 7800.362325] [<ffffffff8123d58a>] sync_filesystem+0x5a/0xa0
[ 7800.362328] [<ffffffff8120bcc7>] generic_shutdown_super+0x27/0x100
[ 7800.362336] [<ffffffffc03ed137>] ceph_kill_sb+0x37/0x70 [ceph]
[ 7800.362338] [<ffffffff8120c1c3>] deactivate_locked_super+0x43/0x70
[ 7800.362352] [<ffffffff8120c69c>] deactivate_super+0x5c/0x60
[ 7800.362354] [<ffffffff81229a1f>] cleanup_mnt+0x3f/0x90
[ 7800.362356] [<ffffffff81229ab2>] __cleanup_mnt+0x12/0x20
[ 7800.362359] [<ffffffff8109d763>] task_work_run+0x73/0x90
[ 7800.362362] [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0
[ 7800.362364] [<ffffffff81003c6e>] syscall_return_slowpath+0x4e/0x60
[ 7800.362366] [<ffffffff818063d8>] int_ret_from_sys_call+0x25/0x8f

The error is hit when trying to unmount the ceph FS after pool becomes full

Actions

Copy link

#10

Updated by Zheng Yan over 7 years ago

This is the expected behaviour. (otherwise cephfs needs to drop some dirty data silently). Does kernel stop to print these warnings when pool no longer full

Actions

Copy link

#11

Updated by Rohith Radhakrishnan over 7 years ago

In this latest kernel, the warnings appear only when we try to unmount the FS. And the umount command hangs and fails. The same behavior is seen even after increasing the pool size. The only way to come out of this is to reboot the client. Warnings don't stop till reboot

Actions

Copy link

#12

Updated by Zheng Yan over 7 years ago

I tried pool quota on 4.8-rc1 kernel. the kernel does recover from hang when unset quota

Actions

Copy link

#13

Updated by Rohith Radhakrishnan over 7 years ago

i am using 4.4.8-040408-generic

Actions

Copy link

#14

Updated by Rohith Radhakrishnan over 7 years ago

Upgraded to 4.8.0-040800rc1-generic. Results are different now.

When pool becomes full gets the below message:-
libceph: FULL or reached pool quota
[ 849.827461] __submit_request: 2420 callbacks suppressed
[ 849.827463] libceph: FULL or reached pool quota
[ 849.827848] libceph: FULL or reached pool quota
[ 849.828196] libceph: FULL or reached pool quota

==================================================================================================================================
When directly unmounting the FS , getting same errors as before:-
umount D ffff9a343ecd7fc0 0 4249 4248 0x00000004
[ 484.358099] ffff9a3437eb8580 ffff9a342e1ba340 000000000000000c ffff9a34358e4000
[ 484.358102] ffff9a34358e3c10 7fffffffffffffff ffffffffa12158c0 ffff9a34358e3c90
[ 484.358106] 0000000000000000 ffffffffa1215131 0000000000000000 ffffffffa1218620
[ 484.358109] Call Trace:
[ 484.358119] [<ffffffffa12158c0>] ? bit_wait+0x60/0x60
[ 484.358122] [<ffffffffa1215131>] ? schedule+0x31/0x80
[ 484.358126] [<ffffffffa1218620>] ? schedule_timeout+0x2d0/0x490
[ 484.358130] [<ffffffffa0cf4ffc>] ? ktime_get+0x3c/0xb0
[ 484.358132] [<ffffffffa12158c0>] ? bit_wait+0x60/0x60
[ 484.358135] [<ffffffffa121497d>] ? io_schedule_timeout+0x9d/0x100
[ 484.358138] [<ffffffffa0cc1e36>] ? prepare_to_wait+0x56/0x80
[ 484.358141] [<ffffffffa12158d7>] ? bit_wait_io+0x17/0x60
[ 484.358143] [<ffffffffa12154a3>] ? __wait_on_bit+0x53/0x80
[ 484.358148] [<ffffffffa0d88248>] ? find_get_pages_tag+0x158/0x2e0
[ 484.358152] [<ffffffffa0d874cf>] ? wait_on_page_bit+0xbf/0xe0
[ 484.358155] [<ffffffffa0cc22d0>] ? wake_atomic_t_function+0x60/0x60
[ 484.358158] [<ffffffffa0d875d0>] ? __filemap_fdatawait_range+0xe0/0x140
[ 484.358164] [<ffffffffa0e3cbc7>] ? sync_inodes_sb+0x227/0x280
[ 484.358170] [<ffffffffa0e43667>] ? sync_filesystem+0x57/0xa0
[ 484.358175] [<ffffffffa0e119a2>] ? generic_shutdown_super+0x22/0xf0
[ 484.358192] [<ffffffffc07e211d>] ? ceph_kill_sb+0x2d/0x70 [ceph]
[ 484.358196] [<ffffffffa0e11e24>] ? deactivate_locked_super+0x34/0x60
[ 484.358199] [<ffffffffa0e3089b>] ? cleanup_mnt+0x3b/0x80
[ 484.358203] [<ffffffffa0c9d129>] ? task_work_run+0x79/0xa0
[ 484.358206] [<ffffffffa0c032aa>] ? exit_to_usermode_loop+0xba/0xc0
[ 484.358209] [<ffffffffa0c03b35>] ? syscall_return_slowpath+0x45/0x50
[ 484.358213] [<ffffffffa12195fe>] ? entry_SYSCALL_64_fastpath+0xa6/0xa8

==================================================================================================================================

After increasing the pool size am able to mount and unmount properly. Looks like issue is resolved with this build. Is there no way to make unmount faster in the above scenario when pool becomes full? i.e without printing the kernel warnings ? As of now the only way to unmount without warnings is to increase the pool size and unmount. otherwise the umount command will hang for sometime even though eventually it will suceed.

Actions

Copy link

#15