Bug #3695: monitor crashed after an upgrade in Monitor::timecheck - Ceph - Ceph

Actions

Copy link

Bug #3695

closed

monitor crashed after an upgrade in Monitor::timecheck

Added by Tamilarasi muthamizhan over 11 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

High

Assignee:

Joao Eduardo Luis

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

ceph version : 0.55.1-329-g01376d4 (01376d44d73189080d207f701fc7e38cf55c738d)

cluster:
burnupi15[running osd.1, osd.2 and mon.a on argonaut]
burnupi19 [osd.3 and osd.4 are stopped]
burnupi20 [running osd.5, osd.6, mon.c and mds.a on v0.55.1]

Initially, argonaut was running on all the three nodes and i had to bring burnupi19 daemons down for some other test. so left with burnupi15 and burnupi20.

I upgraded burnupi20 with no ceph auth entries in ceph.conf and when trying to access rbd pool, it errored out:

ubuntu@burnupi20:~$ rbd ls
rbd: couldn't connect to the cluster!
2012-12-28 17:35:14.798345 7f894f333780 -1 monclient(hunting): failed to open keyring: (2) No such file or directory
2012-12-28 17:35:14.798364 7f894f333780 0 librados: client.admin initialization error (2) No such file or directory

so, i modified ceph.conf with auth settings set to none and upgraded again, restarted ceph service,

[global]
auth service required = none
auth client required = none
auth cluster required = none

[osd]
osd journal size = 1000
filestore xattr use omap = true
osd min pg log entries = 10

[osd.1]
host = burnupi15

[osd.2]
host = burnupi15

[osd.3]
host = burnupi19

[osd.4]
host = burnupi19

[osd.5]
host = burnupi20

[osd.6]
host = burnupi20

[mon.a]
host = burnupi15
mon addr = 10.214.134.22:6789

[mon.b]
host = burnupi19
mon addr = 10.214.134.14:6789

[mon.c]
host = burnupi20
mon addr = 10.214.134.12:6789

[mds.a]
host = burnupi20

and this crashed mon.c running on burnupi20,

mon/Monitor.cc: 2318: FAILED assert(is_leader())

 ceph version 0.55.1-329-g01376d4 (01376d44d73189080d207f701fc7e38cf55c738d)
 1: (Monitor::timecheck()+0x989) [0x48d669]
 2: (SafeTimer::timer_thread()+0x425) [0x5ed6d5]
 3: (SafeTimerThread::entry()+0xd) [0x5ee31d]
 4: (()+0x7e9a) [0x7f30ea7a6e9a]
 5: (clone()+0x6d) [0x7f30e8f5ccbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Ian Colle over 11 years ago

Assignee set to Joao Eduardo Luis

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Status changed from New to In Progress
Priority changed from Normal to High

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Status changed from In Progress to 7

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

I've been unable to reproduce this bug, but the cause was pretty obvious, so I pushed a fix that should deal with this case (and a couple of other similar cases as well).

Actions

Copy link

Updated by Ian Colle over 11 years ago

Believed fixed by patch to 3633

684d4ba242b26828bd7927860226bfc8a0cfcc2b

Actions

Copy link

Updated by Ian Colle over 11 years ago

Status changed from 7 to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #3695

monitor crashed after an upgrade in Monitor::timecheck

Updated by Ian Colle over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Ian Colle over 11 years ago

Updated by Ian Colle over 11 years ago