Project

General

Profile

Bug #8464

krbd: deadlock

Added by Zack Cerza almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/teuthology-2014-05-27_23:09:10-krbd-firefly-testing-basic-plana/277770/

ubuntu   16681  0.0  0.0      0     0 ?        Zl   May28   0:28 [ffsb] <defunct>

ubuntu@plana90:~$ sudo ls /proc/16681/fd
ubuntu@plana90:~$ sudo cat /proc/16681/stat
16681 (ffsb) Z 1 15015 15013 0 -1 4244492 455 176 0 0 160 2678 0 0 20 0 31 0 466989 0 0 18446744073709551615 0 0 0 0 0 0 0 0 0 18446744073709551615 0 0 17 2 0 0 506 0 0 0 0 0 0 0 0 0 15
ubuntu@plana90:~$ sudo ls /proc/15015
ls: cannot access /proc/15015: No such file or directory

Its parent PID has no /proc/ entry. I don't know why the ffsb.sh process is not terminating. Maybe it has to do with the crash (dmesg output attached)

crash.txt View (30 KB) Zack Cerza, 05/29/2014 07:57 AM

kern-stripped.log View (498 KB) Ilya Dryomov, 05/29/2014 09:52 AM


Related issues

Duplicates Linux kernel client - Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph] Resolved 07/11/2014

History

#1 Updated by Zack Cerza almost 10 years ago

  • Subject changed from Job hung during workunit suites/ffsb.sh with crashed to Job hung during workunit suites/ffsb.sh with crashed ceph process

#2 Updated by Sage Weil almost 10 years ago

yeah, looks like it's just because the block io is hung. also, this:

12432 ?        Ss     0:00 cron
18720 ?        S      0:00  \_ CRON
18722 ?        Ss     0:00      \_ /bin/sh -c test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
18723 ?        S      0:00          \_ run-parts --report /etc/cron.daily
18829 ?        S      0:00              \_ /bin/bash /etc/cron.daily/mlocate
18835 ?        D      0:00                  \_ /usr/bin/updatedb.mlocate

which might be worth disabling somewhere for all teuthology runs...

#3 Updated by Sage Weil almost 10 years ago

  • Project changed from teuthology to rbd
  • Subject changed from Job hung during workunit suites/ffsb.sh with crashed ceph process to krbd: deadlock
  • Assignee set to Ilya Dryomov
  • Priority changed from Normal to Urgent
  • Source changed from other to Q/A

#5 Updated by Ilya Dryomov over 9 years ago

I haven't seen this on nightly runs (the only place it seemed to pop up) in a while.

#6 Updated by Ilya Dryomov over 9 years ago

  • Project changed from rbd to Linux kernel client

#7 Updated by Ilya Dryomov over 9 years ago

  • Status changed from New to Resolved

OK, thanks everybody.

rbd: rework rbd_request_fn()

Also available in: Atom PDF