Project

General

Profile

Actions

Bug #3695

closed

monitor crashed after an upgrade in Monitor::timecheck

Added by Tamilarasi muthamizhan over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version : 0.55.1-329-g01376d4 (01376d44d73189080d207f701fc7e38cf55c738d)

cluster:
burnupi15[running osd.1, osd.2 and mon.a on argonaut]
burnupi19 [osd.3 and osd.4 are stopped]
burnupi20 [running osd.5, osd.6, mon.c and mds.a on v0.55.1]

Initially, argonaut was running on all the three nodes and i had to bring burnupi19 daemons down for some other test. so left with burnupi15 and burnupi20.

I upgraded burnupi20 with no ceph auth entries in ceph.conf and when trying to access rbd pool, it errored out:

ubuntu@burnupi20:~$ rbd ls
rbd: couldn't connect to the cluster!
2012-12-28 17:35:14.798345 7f894f333780 -1 monclient(hunting): failed to open keyring: (2) No such file or directory
2012-12-28 17:35:14.798364 7f894f333780 0 librados: client.admin initialization error (2) No such file or directory

so, i modified ceph.conf with auth settings set to none and upgraded again, restarted ceph service,

[global]
auth service required = none
auth client required = none
auth cluster required = none

[osd]
osd journal size = 1000
filestore xattr use omap = true
osd min pg log entries = 10

[osd.1]
host = burnupi15

[osd.2]
host = burnupi15

[osd.3]
host = burnupi19

[osd.4]
host = burnupi19

[osd.5]
host = burnupi20

[osd.6]
host = burnupi20

[mon.a]
host = burnupi15
mon addr = 10.214.134.22:6789

[mon.b]
host = burnupi19
mon addr = 10.214.134.14:6789

[mon.c]
host = burnupi20
mon addr = 10.214.134.12:6789

[mds.a]
host = burnupi20

and this crashed mon.c running on burnupi20,

mon/Monitor.cc: 2318: FAILED assert(is_leader())

 ceph version 0.55.1-329-g01376d4 (01376d44d73189080d207f701fc7e38cf55c738d)
 1: (Monitor::timecheck()+0x989) [0x48d669]
 2: (SafeTimer::timer_thread()+0x425) [0x5ed6d5]
 3: (SafeTimerThread::entry()+0xd) [0x5ee31d]
 4: (()+0x7e9a) [0x7f30ea7a6e9a]
 5: (clone()+0x6d) [0x7f30e8f5ccbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #3633: mon: clock drift errors not reported by ceph statusResolvedJoao Eduardo Luis12/17/2012

Actions
Actions #1

Updated by Ian Colle over 11 years ago

  • Assignee set to Joao Eduardo Luis
Actions #2

Updated by Joao Eduardo Luis over 11 years ago

  • Status changed from New to In Progress
  • Priority changed from Normal to High
Actions #3

Updated by Joao Eduardo Luis over 11 years ago

  • Status changed from In Progress to 7
Actions #4

Updated by Joao Eduardo Luis over 11 years ago

I've been unable to reproduce this bug, but the cause was pretty obvious, so I pushed a fix that should deal with this case (and a couple of other similar cases as well).

Actions #5

Updated by Ian Colle over 11 years ago

Believed fixed by patch to 3633

684d4ba242b26828bd7927860226bfc8a0cfcc2b

Actions #6

Updated by Ian Colle over 11 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF