Project

General

Profile

Actions

Bug #374

closed

mon: osd will null addr added to map

Added by Sage Weil over 13 years ago. Updated over 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On Wido's cluster, saw

10.08.23_14:15:20.398952 7f7f5a5ca710 log [INF] : osd1 [2001:16f8:10:2::c3c3:a24f]:6800/19228 boot
10.08.23_14:15:20.399163 7f7f5a5ca710 log [INF] : osd3 [2001:16f8:10:2::c3c3:2e3a]:6800/23848 boot
10.08.23_14:15:20.399255 7f7f5a5ca710 log [INF] : osd2 [2001:16f8:10:2::c3c3:4a8c]:6800/7865 boot

and shortly after
10.08.23_14:15:32.438157 7f7f5a5ca710 log [INF] : osd1 :/0 boot
10.08.23_14:15:32.438356 7f7f5a5ca710 log [INF] : osd3 :/0 boot
10.08.23_14:15:32.438500 7f7f5a5ca710 log [INF] : osd2 :/0 boot

Map sequence was:

10209:
osd1 in weight 1 up   (up_from 10207 up_thru 10196 down_at 10204 last_clean 6116-10203) [2001:16f8:10:2::c3c3:a24f]:6800/19228 [2001:16f8:10:2::c3c3:a24f]:6801/19228
osd2 in weight 1 up   (up_from 10207 up_thru 9970 down_at 10205 last_clean 6120-10204) [2001:16f8:10:2::c3c3:4a8c]:6800/7865 [2001:16f8:10:2::c3c3:4a8c]:6801/7865
osd3 in weight 1 up   (up_from 10207 up_thru 10202 down_at 10205 last_clean 6124-10204) [2001:16f8:10:2::c3c3:2e3a]:6800/23848 [2001:16f8:10:2::c3c3:2e3a]:6801/23848

10210:
osd1 in weight 1 down (up_from 10207 up_thru 10209 down_at 10210 last_clean 6116-10203)
osd2 in weight 1 down (up_from 10207 up_thru 10207 down_at 10210 last_clean 6120-10204)
osd3 in weight 1 down (up_from 10207 up_thru 10209 down_at 10210 last_clean 6124-10204)

10211:
osd1 in weight 1 up   (up_from 10211 up_thru 10209 down_at 10210 last_clean 6116-10203) :/0 [2001:16f8:10:2::c3c3:a24f]:6801/19228
osd2 in weight 1 up   (up_from 10211 up_thru 10207 down_at 10210 last_clean 6120-10204) :/0 [2001:16f8:10:2::c3c3:4a8c]:6801/7865
osd3 in weight 1 up   (up_from 10211 up_thru 10209 down_at 10210 last_clean 6124-10204) :/0 [2001:16f8:10:2::c3c3:2e3a]:6801/23848

No intervening failure note in mon log, so this was a mark down+up in the boot handler code.

Maybe osd sent dup boots, and the current vs pending checks in the monitor are off?

Actions #1

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.21.2 to v0.21.3
Actions #2

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.21.3 to v0.21.4
Actions #3

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.21.4 to v0.23
Actions #4

Updated by Sage Weil over 13 years ago

  • Status changed from New to Can't reproduce

Couldn't find anything with code inspection, and haven't been able to reproduce. Hopefully if/when this pops up again we'll have full monitor logs.

Actions

Also available in: Atom PDF