Project

General

Profile

Bug #4521

mon: starting a new osd crashes all mon's

Added by Evan Felix about 11 years ago. Updated almost 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While using 0.59 today i wanted to add an ost back in that had crashed its disks. I had told ceph that the osd was lost, removed it , and re-created it with a new disk array. When I try to start the ceph-osd on the host, all the mon's crash with:

6> 2013-03-21 16:34:48.004035 7fe73abfd700  1 - 130.20.232.86:6789/0 <== osd.2 130.20.232.88:6800/7474 1 ==== auth(proto 0
26 bytes epoch 4) v1 ==== 56+0+0 (2633074795 0 0) 0x7fe7200014b0 con 0x7fe72c002bc0
5> 2013-03-21 16:34:48.004274 7fe73abfd700 1 - 130.20.232.86:6789/0 --> 130.20.232.88:6800/7474 -- auth_reply(proto 2 0 S
uccess) v1 -- ?+0 0x7fe70c069e50 con 0x7fe72c002bc0
4> 2013-03-21 16:34:48.004770 7fe73abfd700 1 - 130.20.232.86:6789/0 <== osd.2 130.20.232.88:6800/7474 2 ==== auth(proto 2
128 bytes epoch 0) v1 ==== 158+0+0 (1330793894 0 0) 0x7fe7200014b0 con 0x7fe72c002bc0
3> 2013-03-21 16:34:48.005009 7fe73abfd700 1 - 130.20.232.86:6789/0 --> 130.20.232.88:6800/7474 -- auth_reply(proto 2 0 S
uccess) v1 -- ?+0 0x7fe70c069e50 con 0x7fe72c002bc0
2> 2013-03-21 16:34:48.005398 7fe73abfd700 1 - 130.20.232.86:6789/0 <== osd.2 130.20.232.88:6800/7474 3 ==== mon_subscrib
e({monmap=5+,osd_pg_creates=0,osdmap=5631}) v2 ==== 69+0+0 (3159699410 0 0) 0x7fe7200014b0 con 0x7fe72c002bc0
-1> 2013-03-21 16:34:48.005429 7fe73abfd700 5 mon.a@0(leader).osd e6446 send_incremental [5631..6446] to osd.2 130.20.232.8
8:6800/7474
0> 2013-03-21 16:34:48.006491 7fe73abfd700 -1 mon/OSDMonitor.cc: In function 'void OSDMonitor::send_incremental(epoch_t, en
tity_inst_t&, bool)' thread 7fe73abfd700 time 2013-03-21 16:34:48.005527
mon/OSDMonitor.cc: 1441: FAILED assert(err == 0)
ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)
1: (OSDMonitor::send_incremental(unsigned int, entity_inst_t&, bool)+0x9fb) [0x4fa9fb]
2: (OSDMonitor::check_sub(Subscription*)+0xa0) [0x4fab80]
3: (Monitor::handle_subscribe(MMonSubscribe*)+0x77c) [0x48c33c]
4: (Monitor::_ms_dispatch(Message*)+0x1031) [0x4afee1]
5: (Monitor::ms_dispatch(Message*)+0x32) [0x4c8812]
6: (DispatchQueue::entry()+0x309) [0x6021d9]
7: (DispatchQueue::DispatchThread::entry()+0xd) [0x5f1e8d]
8: (()+0x7851) [0x7fe744886851]
9: (clone()+0x6d) [0x7fe7430a890d]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

In poking at the core file generated it seems that the get_version('full',first,bl) around line 1440 in OSDMonitor.c returns -2 (NOENT) I didnt track down what key was getting requested from the level db exactly, there is not a lot of logging in that code.

gdb trace:
#4 0x00007fe742ff28a5 in raise () from /lib64/libc.so.6
#5 0x00007fe742ff4085 in abort () from /lib64/libc.so.6
#6 0x00007fe7438aba5d in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#7 0x00007fe7438a9be6 in ?? () from /usr/lib64/libstdc++.so.6
#8 0x00007fe7438a9c13 in std::terminate() () from /usr/lib64/libstdc++.so.6
#9 0x00007fe7438a9d0e in _cxa_throw () from /usr/lib64/libstdc++.so.6
#10 0x0000000000589839 in ceph::
_ceph_assert_fail (assertion=0x17dfac0 "\001", file=0x7fe70c069e50 "\030\231KQx\vc",
line=1441, func=0x6a9f60 "void OSDMonitor::send_incremental(epoch_t, entity_inst_t&, bool)") at common/assert.cc:77
#11 0x00000000004fa9fb in OSDMonitor::send_incremental (this=0xffffffa6, first=5632, dest=..., onetime=true)
at mon/OSDMonitor.cc:1441
#12 0x00000000004fab80 in OSDMonitor::check_sub (this=0x17fd9f0, sub=0x7fe70c047b20) at mon/OSDMonitor.cc:1493
#13 0x000000000048c33c in Monitor::handle_subscribe (this=0x17fc970, m=0x7fe7200014b0) at mon/Monitor.cc:3592
#14 0x00000000004afee1 in Monitor::_ms_dispatch (this=0x17fc970, m=0x7fe7200014b0) at mon/Monitor.cc:3048
#15 0x00000000004c8812 in Monitor::ms_dispatch (this=0x17fc970, m=0x7fe7200014b0) at mon/Monitor.h:1346
#16 0x00000000006021d9 in ms_deliver_dispatch (this=0x17eb448) at msg/Messenger.h:553
#17 DispatchQueue::entry (this=0x17eb448) at msg/DispatchQueue.cc:107
#18 0x00000000005f1e8d in DispatchQueue::DispatchThread::entry (this=<value optimized out>) at msg/DispatchQueue.h:85
#19 0x00007fe744886851 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fe7430a890d in clone () from /lib64/libc.so.6

more log and core files are available as needed.

This happened every time i tried to start the osd.

ceph-mon.a.log.gz (239 KB) Evan Felix, 03/22/2013 08:34 AM

ceph-mon.b.log.gz (5.41 KB) Evan Felix, 03/22/2013 08:34 AM

ceph-mon.c.log.gz (118 KB) Evan Felix, 03/22/2013 08:34 AM

bug4521.log View (9.68 KB) Evan Felix, 03/26/2013 12:19 AM

output.txt View (8.5 KB) Matt Anderson, 03/26/2013 11:38 AM

mon.bk.wip4521.tgz (19.5 MB) Evan Felix, 04/16/2013 04:51 PM

wip4521.fix2 (3.28 KB) Evan Felix, 04/16/2013 04:51 PM

wip4521.fix_debugA (2.71 KB) Evan Felix, 04/18/2013 09:25 AM

ceph-mon.a.log.gz (7.33 MB) Evan Felix, 04/18/2013 09:25 AM

mon.bk.wip4521A.tgz (19.3 MB) Evan Felix, 04/18/2013 09:25 AM

ceph-mon.a.log.gz (406 KB) Evan Felix, 04/19/2013 09:40 AM

mon.bk.wip4521B.tgz (20.1 MB) Evan Felix, 04/23/2013 10:55 AM


Related issues

Related to Ceph - Bug #4394: mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t) Can't reproduce 03/08/2013

Associated revisions

Revision f25f922b (diff)
Added by Joao Eduardo Luis almost 11 years ago

mon: Monitor: convert osdmap_full as well

Store conversion wasn't converting the osdmap_full/ versions, only the
incrementals under osdmap/ and the latest full version stashed. This
would lead to some serious problems during OSDMonitor's update_from_paxos
when the latest stashed didn't correspond to the first available
incremental.

Fixes: #4521

Signed-off-by: Joao Eduardo Luis <>

History

#1 Updated by Joao Eduardo Luis about 11 years ago

  • Subject changed from starting a new osd crashes all mon's to mon: starting a new osd crashes all mon's
  • Assignee set to Joao Eduardo Luis

Can you reproduce this with 'debug mon = 20', 'debug paxos = 20' and 'debug ms = 1' on all monitors, and point me to the log (or upload it to the bug) of the crashing monitors?

#2 Updated by Evan Felix about 11 years ago

Here are the logs.. I'm not sure mon.b went down this run, but a,c did.. I may not have waited long enough for mon.b to drop.

#3 Updated by Joao Eduardo Luis about 11 years ago

Thanks for the logs!

#4 Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from New to In Progress

#5 Updated by Joao Eduardo Luis about 11 years ago

Evan, did this happen after a store conversion? The logs don't show it, but I suspect that there might be some corner case that led one of the osdmap's full versions to go missing (or it simply wasn't a full version and the conversion made some rather unfortunate assumption).

Would you be willing to do some forensics on mon.a's store?

I would appreciate if you could attach a file to this bug with the contents from the following commands:

(these can only be run while the monitor is not running)

ceph_test_store_tool /var/lib/ceph/mon/ceph-a/store.db list osdmap

ceph_test_store_tool /var/lib/ceph/mon/ceph-a/store.db get osdmap first_committed

ceph_test_store_tool /var/lib/ceph/mon/ceph-a/store.db get osdmap last_committed

ceph_test_store_tool /var/lib/ceph/mon/ceph-a/store.db get osdmap full_latest

Let me know how it goes. Also, feel free to find me on IRC whenever you can so we can sort this out.

#6 Updated by Evan Felix about 11 years ago

log of commands attached.

this was following the update to 0.59

#7 Updated by Matt Anderson about 11 years ago

Logs as per IRC. Thanks.

#8 Updated by Sage Weil almost 11 years ago

  • Priority changed from High to Urgent

#9 Updated by Joao Eduardo Luis almost 11 years ago

  • Status changed from In Progress to Fix Under Review

This was caused by a bug in the store conversion, introduced in 0.59, resulting from not converting the osdmap full versions.

Commit commit:b7fb4fef372f3225cfd148b24de03062ccbe1dbf fixes this in the monitor.

Commit commit:61c9a1917adeb1d325c92c061ab1a2f1eaedfd9a adds a tool to fix a converted monitor suffering from this bug, but requires the old-format store (prior to conversion) to perform the actual fix. Lack of the old-format store renders the monitor useless. There would be a possible workaround, but we wouldn't be able to guarantee osdmap consistency nor that the osds would work -- I was unable to ever reproduce this under such circumstances, so I have no way to tell for sure, but given the possibility I believe that's not a viable option.

Both commits are on wip-4521, pending review.

Matt tested the tool and confirmed that it worked. Thanks Matt!

#10 Updated by Evan Felix almost 11 years ago

Can you tell me if I am using this correctly:

root -#src/ceph_mon_kvstore_fix /data/mon/ /data/mon/
2013-04-12 16:26:51.438635 7f8f0dff4760 1 store(/data/mon/) mount
2013-04-12 16:26:51.593670 7f8f0dff4760 0 check_old_service_versions
2013-04-12 16:26:51.593812 7f8f0dff4760 0 service auth is okay
2013-04-12 16:26:51.596381 7f8f0dff4760 0 service logm is okay
2013-04-12 16:26:51.596421 7f8f0dff4760 0 service mdsmap is okay
2013-04-12 16:26:51.596451 7f8f0dff4760 0 service monmap is okay
2013-04-12 16:26:51.600124 7f8f0dff4760 0 service osdmap is okay
2013-04-12 16:26:51.603949 7f8f0dff4760 0 service pgmap is okay
2013-04-12 16:26:51.604025 7f8f0dff4760 0 check_new_service_versions
2013-04-12 16:26:51.604885 7f8f0dff4760 0 service auth is okay
2013-04-12 16:26:52.307240 7f8f0dff4760 0 service logm is okay
2013-04-12 16:26:52.307392 7f8f0dff4760 0 service mdsmap is okay
2013-04-12 16:26:52.307525 7f8f0dff4760 0 service monmap is okay
2013-04-12 16:26:52.319064 7f8f0dff4760 0 service osdmap is okay
2013-04-12 16:26:52.693840 7f8f0dff4760 0 service pgmap is okay
2013-04-12 16:26:52.693878 7f8f0dff4760 0 check old-format osdmap full versions
2013-04-12 16:26:52.693881 7f8f0dff4760 0 old-format available versions: [5632,6339]
2013-04-12 16:26:52.697838 7f8f0dff4760 0 move old-format osdmap full versions to kv store
2013-04-12 16:27:14.757845 7f8f0dff4760 0 check for gaps in kv store's osdmap's full versions
2013-04-12 16:27:14.775152 7f8f0dff4760 0 missing full ver 6340
2013-04-12 16:27:14.775193 7f8f0dff4760 0 missing full ver 6341
2013-04-12 16:27:14.775209 7f8f0dff4760 0 missing full ver 6342
2013-04-12 16:27:14.775230 7f8f0dff4760 0 missing full ver 6343
2013-04-12 16:27:14.775251 7f8f0dff4760 0 missing full ver 6344
2013-04-12 16:27:14.775272 7f8f0dff4760 0 missing full ver 6345
2013-04-12 16:27:14.787822 7f8f0dff4760 -1 there were some gaps in the full version history!
2013-04-12 16:27:14.787851 7f8f0dff4760 -1 something went wrong while fixing the store

It dosent seem to help the situation.

#11 Updated by Joao Eduardo Luis almost 11 years ago

Evan, the fix was admittedly based on Matt's stores, and apparently I must have missed some other corner case.

Anyway, I just pushed a revamped version of the fix to the same branch.

This time, try to run it as 'ceph_mon_kvstore_fix --checks-only --i-am-sure /data/mon/ /data/mon/', which will check if the store is fixable according to some of the criteria I could come up with (which should be enough for pretty much every case). You may also run it without the '--checks-only', and it should first do a dry run of the fix, reading from both stores on disk but writing to none.

If you run into any troubles, please add '--debug-none 20' to the command's arguments. This will output a lot of information that will be useful when tracking down what may be going wrong.

When and if you feel comfortable, running with '--for-real' should finally perform the fix.

Let us know how it goes.

#12 Updated by Evan Felix almost 11 years ago

i've tried the new fix. attaching some files.

the debug-none argument didnt seem to have much effect.

#13 Updated by Joao Eduardo Luis almost 11 years ago

  • Status changed from Fix Under Review to In Progress

Ah! Got it. Thanks for attaching that log file.

It appears that your new store must have seen some versions trimmed, and some 400 new versions, since the conversion -- which in fact I find odd.

I'll work on a fix for this first thing tomorrow morning!

#14 Updated by Joao Eduardo Luis almost 11 years ago

  • Status changed from In Progress to 4

Evan, just pushed a rework of the fix, and it looks like your store ought to be fixed by it.

Let me know how it goes.

#15 Updated by Sage Weil almost 11 years ago

fix is merged. fix tool should go in next too once we hear from Evan!

#16 Updated by Evan Felix almost 11 years ago

I pulled the updates, I compiled and installed. I ran the fix,( see attached log). I started the new mon, and tried to start the new osd. Mon crashed(see attached log), and mon directory..

#17 Updated by Joao Eduardo Luis almost 11 years ago

Evan, is that store prior or post applying the fix? It doesn't seem fixed at all.

Also, when you have the chance, could you attach a log with higher debug levels for mon and ms? That would avoid me going through some hoops to get your monitor up and running to obtain that (I'm assuming this is a fairly easy crash to trigger).

#18 Updated by Evan Felix almost 11 years ago

that store was after i ran the fix(log wip4521.fix_debugA), started the mon, then it crashed.

will run again with more logging

#19 Updated by Evan Felix almost 11 years ago

debug for ms and mon at 20, log attached.

#20 Updated by Joao Eduardo Luis almost 11 years ago

Sage opened a bug for that one here: http://tracker.ceph.com/issues/4748

#21 Updated by Joao Eduardo Luis almost 11 years ago

Evan, after a closer inspection I figured that your bug is indeed different from Sage's.

Can you confirm you ran the fix with the '--for-real' option? Running the fix without this option will just run it in a mock in-memory store.

You can check if you're missing full versions by running the following:

$ ceph_test_store_tool store.db list osdmap | grep -v 'full\|last\|latest\|first' | cut -f2 -d':' > /tmp/versions

$ ceph_test_store_tool store.db list osdmap | grep 'full_' | grep -v 'latest' | cut -f2 -d'_' | diff - /tmp/versions

#22 Updated by Evan Felix almost 11 years ago

current mon directory

#23 Updated by Joao Eduardo Luis almost 11 years ago

The issue appears to be with leveldb's state, which is returning 'Invalid argument: not an sstable (bad magic number)' when we check for the iterator's status during 'lower_bound()'.

I haven't yet been able to figure out why this happens. Attempting a leveldb repair fixes the issue, but it looks like it loses a pgmap version in the process, so that might not be quite what we are looking for.

In any case, the key and the value are there, and they match what is expected. A possible workaround for this, until we figure out what is happening, is to base MonitorDBStore::exists() on MonitorDBStore::get() -- which is based on LevelDBStore::get() -- and avoiding using the lower_bound() function. I'd really rather not do this though.

Sage, Greg, any thoughts?

#24 Updated by Samuel Just almost 11 years ago

  • Assignee changed from Joao Eduardo Luis to Samuel Just

#25 Updated by Samuel Just almost 11 years ago

Evan: what version of leveldb are you using?

#26 Updated by Samuel Just almost 11 years ago

  • Priority changed from Urgent to High

#27 Updated by Samuel Just almost 11 years ago

  • Assignee deleted (Samuel Just)

#28 Updated by Sage Weil almost 11 years ago

  • Assignee set to Samuel Just

the original conversion bug is fixed, and the fixer works for those who need it, modulo this leveldb thing. we should continue to track that down, but i don't see it as a cuttlefish release blocker.

it probably isn't necessary to include the fixer in master or any release. users who hit this bug can pull the fixer from gitbuilder as needed.

#29 Updated by Evan Felix almost 11 years ago

Samuel Just wrote:

Evan: what version of leveldb are you using?

leveldb-1.7.0-2.el6.x86_64

#30 Updated by Evan Felix almost 11 years ago

This monitor (a) is in a state now that i cannot even start it up. I was planning on removing it and wiping the directory and re-adding it to the system. Do you want me to do anything for debugging first? It seems to have lost its filesystem id, and refuses to start. Would you like a final copy of the directory?

#31 Updated by Samuel Just almost 11 years ago

A final copy would be great, you can then go ahead and wipe. Thanks!

#32 Updated by Sage Weil almost 11 years ago

  • Status changed from 4 to Can't reproduce

Also available in: Atom PDF