Project

General

Profile

Actions

Bug #461

closed

Hanging OSD during recovery

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While my cluster was recovering from a few OSD crashes, one of my OSD's.

root@node02:~# ps aux|grep cosd
root     14773  9.4 94.9 7929808 3849936 ?     Dsl  20:38   2:04 /usr/bin/cosd -i 1 -c /etc/ceph/ceph.conf
root     15490  0.0  0.0   7672   820 pts/0    D+   21:00   0:00 grep --color=auto cosd
root@node02:~#

As you can see, the OSD is using a lot of memory and is waiting for I/O.

The logs show:

root@node02:~# date
Mon Oct  4 21:01:19 CEST 2010
root@node02:~# tail /var/log/ceph/osd.1.log
2010-10-04 20:44:55.539151 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:55.595650 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:55.689574 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:55.799051 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:55.828022 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:55.859495 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:55.977324 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:56.007724 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:56.068909 7f39ce8b1710 journal throttle: waited for ops
2010-10-04 20:44:56.126037 7f39ce8b1710 journal throttle: waited for ops
root@node02:~#

As you can see, the OSD is hanging for about 20 minutes now.

Right now it is marked as "down" since it isn't responding to anything.

Killing the OSD won't work either, it just keeps hanging.

Actions

Also available in: Atom PDF