Project

General

Profile

Actions

Bug #7216

closed

ASSERT AuthMonitor::update_from_paxos on 0.72.2

Added by Grigory Gorelov over 10 years ago. Updated about 10 years ago.

Status:
Can't reproduce
Priority:
Urgent
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Greetings.

Today i have restarted my cluster and got all tree monitors not starting. Whole output is attached but the main thing i believe is

mon.srv2@-1(probing).paxosservice(auth 251..288) refresh upgraded, format 0 -> 1
mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' ...
mon/AuthMonitor.cc: 153: FAILED assert(ret == 0)

I've found that problem happened before in different clusters with different users but was solved by patching the mon. I cannot do this with 0.72.2 version.

All my data in coma now and I hope for a help very much.
Below output, monitor data and ceph.conf is attached.

Thank you.

My respects for a best distributed fs at this time =)


Files

ceph-mon.output.txt (11.2 KB) ceph-mon.output.txt Grigory Gorelov, 01/23/2014 02:51 PM
ceph.conf (630 Bytes) ceph.conf Grigory Gorelov, 01/23/2014 02:51 PM
mon.srv2.tar.bz2 (290 KB) mon.srv2.tar.bz2 Grigory Gorelov, 01/23/2014 02:51 PM
Actions #1

Updated by Grigory Gorelov over 10 years ago

Also i think i need to add that it was clean install of 0.72.2 version. No updates, no movements.

Actions #2

Updated by Ian Colle over 10 years ago

  • Assignee set to Joao Eduardo Luis
Actions #3

Updated by Ian Colle over 10 years ago

  • Priority changed from Normal to Urgent
Actions #4

Updated by Joao Eduardo Luis over 10 years ago

  • Project changed from devops to Ceph
  • Category set to Monitor
  • Status changed from New to Need More Info

Is there a full log for this monitor, as well as for the other 2 monitors?

Actions #5

Updated by Grigory Gorelov over 10 years ago

I'm sorry to say that but there isn't. There are nothing related to ceph in /var/log.

Actions #6

Updated by Joao Eduardo Luis about 10 years ago

Unfortunately I've been unable to reproduce this locally.

Can you provide a list of the steps you took in order to trigger this? And ceph versions you might have upgraded from and to? You mentioned it was a clean 0.72.2 install, so I'm assuming you didn't have a cluster prior to that. Can you please confirm this?

Actions #7

Updated by Grigory Gorelov about 10 years ago

My steps are:

1. Install ceph-0.72.2 to three servers.
2. Created some RBD images.
3. Run qemu-kvm on them.
4. Rebooted one of the servers and it's monitor didn't start.
5. So did the other two monitors.

If you cannot reproduce this, please tell me your configuration.
./configure flags, kernel, versions if it is possible.

I'll try to build same environment and run monitor in it.

Thank you.

Actions #8

Updated by Grigory Gorelov about 10 years ago

I've opened ssh for you:

<redacted>

When you logged in you can ssh to those three servers:

ssh , pass "1" (server 1 mon data is destroyed due to experiments)
ssh , pass "1"
ssh , pass "1"

Actions #9

Updated by Grigory Gorelov about 10 years ago

I've reproduced bug on clean server:

1. Download ceph-0.72.2.tar.gz and unpack
2. Install snappy-1.1.0
3. Install libedit-20130712.3.1
4. ./configure (with no extra flags)
5. make -j4
6. Copied mon.srv3 to /home/ceph_mon
7. Assigned 10.0.0.3/24 on eth0
8. Copied ceph.conf to /etc/ceph/ceph.conf
9. ceph-mon -i srv3 -d

And accert occured.

Actions #10

Updated by Joao Eduardo Luis about 10 years ago

are you reusing a previous store, from a previously problematic cluster?

Actions #11

Updated by Grigory Gorelov about 10 years ago

No, clean server right now means there is nothing except gentoo stage3 installation.

Actions #12

Updated by Grigory Gorelov about 10 years ago

I'm sorry to say, all my data is considered lost right now. I like Ceph architecture very much but cannot use due to bugs. Will wait for a few years to let it reach stability.

Thank you for your work, i hope Ceph will be de facto in distributed storage area.

Actions #13

Updated by Ian Colle about 10 years ago

  • Status changed from Need More Info to New
Actions #14

Updated by Joao Eduardo Luis about 10 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF