Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from" - Ceph - Ceph

Actions

Copy link

Bug #2116

closed

Repeated messages of "heartbeat_check: no heartbeat from"

Added by Wido den Hollander about 12 years ago. Updated about 12 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

OSD

Target version:

v0.44

% Done:

Spent time:

18:00 h

Source:

Community (dev)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

As discussed on the ml I gathered some logs.

Today I upgraded my whole cluster to 0.42.2 from 0.41.

Due to the on-disk change I formatted my cluster and started it again.

Immediately after the start of the OSD's the "no heartbeat from" started:

2012-02-28 16:30:31.951132    pg v708: 7920 pgs: 7920 active+clean; 8730 bytes data, 164 MB used, 74439 GB / 74520 GB avail
2012-02-28 16:30:31.980965   mds e4: 1/1/1 up {0=alpha=up:active}
2012-02-28 16:30:31.981010   osd e65: 40 osds: 40 up, 40 in
2012-02-28 16:30:31.981192   log 2012-02-28 16:30:30.344847 mon.0 [2a00:f10:11a:408::1]:6789/0 10728 : [INF] osd.5 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6803/14395 failed (by osd.15 [2a00:f10:11b:cef0:225:90ff:fe33:49b0]:6809/12129)
2012-02-28 16:30:31.981314   mon e1: 3 mons at {pri=[2a00:f10:11b:cef0:230:48ff:fed3:b086]:6789/0,sec=[2a00:f10:11a:408::1]:6789/0,third=[2a00:f10:11a:409::1]:6789/0}
2012-02-28 16:30:33.680286   log 2012-02-28 16:30:33.533593 mon.0 [2a00:f10:11a:408::1]:6789/0 10729 : [INF] osd.0 [2a00:f10:11b:cef0:225:90ff:fe33:49fe]:6800/12003 failed (by osd.13 [2a00:f10:11b:cef0:225:90ff:fe33:49b0]:6803/11917)
2012-02-28 16:30:34.705162   log 2012-02-28 16:30:33.709207 mon.0 [2a00:f10:11a:408::1]:6789/0 10730 : [INF] osd.6 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6806/14486 failed (by osd.9 [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:6809/19176)
2012-02-28 16:30:34.705162   log 2012-02-28 16:30:33.968388 mon.0 [2a00:f10:11a:408::1]:6789/0 10731 : [INF] osd.30 [2a00:f10:11b:cef0:225:90ff:fe33:499c]:6806/29719 failed (by osd.37 [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6803/26068)
2012-02-28 16:30:35.737775   log 2012-02-28 16:30:35.346912 mon.0 [2a00:f10:11a:408::1]:6789/0 10732 : [INF] osd.5 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6803/14395 failed (by osd.15 [2a00:f10:11b:cef0:225:90ff:fe33:49b0]:6809/12129)

As you can see, the cluster is completely fresh. No data at all and no I/O load.

One of the things that came to mind was that it might be a clock issue, not being synchronized. I verified all the clocks and those are synchronized.

Information on all the OSD's:

ceph version 0.42.2 (732f3ec94e39d458230b7728b2a936d431e19322)
Kernel: 3.2.0
Memory: 4GB
CPU: Atom D525
Disks: 2TB Seagate/WD

Attached is my ceph.conf and a couple of hours of logging for osd.5 and osd.15

Files

Download all files

ceph.conf (6.67 KB) ceph.conf	My ceph configuration	Wido den Hollander, 02/28/2012 07:35 AM
osd.5.log_heartbeat.gz (28.8 MB) osd.5.log_heartbeat.gz		Wido den Hollander, 02/28/2012 07:35 AM
osd.15.log_heartbeat.gz (25.5 MB) osd.15.log_heartbeat.gz		Wido den Hollander, 02/28/2012 07:35 AM
osd.3.heartbeat.log.gz (32.2 MB) osd.3.heartbeat.log.gz		Wido den Hollander, 03/01/2012 02:48 AM
osd.8.heartbeat.log.gz (26.2 MB) osd.8.heartbeat.log.gz		Wido den Hollander, 03/01/2012 02:48 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #2116

Repeated messages of "heartbeat_check: no heartbeat from"

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Greg Farnum about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Wido den Hollander about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Wido den Hollander about 12 years ago

Updated by Wido den Hollander about 12 years ago

Updated by Wido den Hollander about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago