Bug #17410
closedrbd image stale/stuck (mapped and mounted)
0%
Description
Hello!
We have a problem:
three rbd images are formated and mounted in the system
linux kernel 3.18.35-35
Steps:
1.When we writing/reading data to mounted rbd, all operations looks good.
2.we stopped some osds -> write/read operations are performed normally.
3.But when we start back previously stopped osds, rbd images are stuck:
no read requests are proceed, and DirtyCache pages flush stale.
ceph -s don't show any io request after that. All PG's are ok, health OK.
I'am attach sysrq -t file from node, where rbd's mount.
Please help.
P.S. kernel lib using:
Module Size Used by
rbd 53251 6
libceph 141489 1 rbd
jewel 10.2.2
Files
Updated by Sergey Jerusalimov over 7 years ago
it's 100% seems like http://www.spinics.net/lists/ceph-users/msg24111.html
Updated by Sergey Jerusalimov over 7 years ago
can you backport https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=922dab6134178cae317ae00de86376cba59f3147 to 3.18 kernel
Updated by Sergey Jerusalimov over 7 years ago
When i investigate "cat /sys/kernel/debug/ceph/*/osdc" on problem node, i see stuck static picture.
When i restart osd's from "cat /sys/kernel/debug/ceph/*/osdc" list, all operations start to works normaly
Updated by Jason Dillaman over 7 years ago
- Project changed from Ceph to Linux kernel client
Updated by Ilya Dryomov over 7 years ago
- Category set to libceph
- Status changed from New to Closed
- Assignee set to Ilya Dryomov
This bug is fixed in kernels 4.7 and above (and also in RHEL 7.3 based kernels, e.g. kernel-3.10.0-514.2.2.el7).
That commit is indeed part of the fix, but cannot be backported to 3.18 - see http://lxr.linux.no/linux+v4.9.3/Documentation/stable_kernel_rules.txt.