Project

General

Profile

Actions

Bug #17410

closed

rbd image stale/stuck (mapped and mounted)

Added by Sergey Jerusalimov over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hello!

We have a problem:

three rbd images are formated and mounted in the system
linux kernel 3.18.35-35

Steps:
1.When we writing/reading data to mounted rbd, all operations looks good.
2.we stopped some osds -> write/read operations are performed normally.
3.But when we start back previously stopped osds, rbd images are stuck:
no read requests are proceed, and DirtyCache pages flush stale.

ceph -s don't show any io request after that. All PG's are ok, health OK.

I'am attach sysrq -t file from node, where rbd's mount.

Please help.

P.S. kernel lib using:

Module Size Used by
rbd 53251 6
libceph 141489 1 rbd

jewel 10.2.2


Files

sysrq-s.gz (103 KB) sysrq-s.gz Sysrq status Sergey Jerusalimov, 09/26/2016 08:52 PM
Actions #3

Updated by Sergey Jerusalimov over 7 years ago

When i investigate "cat /sys/kernel/debug/ceph/*/osdc" on problem node, i see stuck static picture.
When i restart osd's from "cat /sys/kernel/debug/ceph/*/osdc" list, all operations start to works normaly

Actions #4

Updated by Jason Dillaman over 7 years ago

  • Project changed from Ceph to Linux kernel client
Actions #5

Updated by Ilya Dryomov over 7 years ago

  • Category set to libceph
  • Status changed from New to Closed
  • Assignee set to Ilya Dryomov

This bug is fixed in kernels 4.7 and above (and also in RHEL 7.3 based kernels, e.g. kernel-3.10.0-514.2.2.el7).

That commit is indeed part of the fix, but cannot be backported to 3.18 - see http://lxr.linux.no/linux+v4.9.3/Documentation/stable_kernel_rules.txt.

Actions

Also available in: Atom PDF