Project

General

Profile

Actions

Bug #4837

closed

mon: FAILED assert(!(sync_role & SYNC_ROLE_REQUESTER))

Added by Wido den Hollander almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just upgraded 3 monitors from 0.56.4 to 0.60 (next branch) and saw a monitor crash when I ran:

$ ceph osd unset noout

I upgraded the three mons first to 0.60 and after that was done I set noout:

$ ceph osd set noout

After all OSDs were up and running again I wanted to unset the flag, which caused the monitor to crash.

The logs showed:

   -11> 2013-04-26 21:33:07.030062 7f52043f5700  2 -- 10.23.24.8:6789/0 >> 10.23.24.58:6812/30099 pipe(0x312ff280 sd=7 :6789 s=2 pgs=117 cs=1 l=1).fault 0: Success
   -10> 2013-04-26 21:33:07.056466 7f52043f5700  1 -- 10.23.24.8:6789/0 >> :/0 pipe(0x2c9f8500 sd=7 :6789 s=0 pgs=0 cs=0 l=0).accept sd=7 10.23.24.57:53299/0
    -9> 2013-04-26 21:33:07.077770 7f52029e2700  1 -- 10.23.24.8:6789/0 >> :/0 pipe(0x389ab000 sd=30 :6789 s=0 pgs=0 cs=0 l=0).accept sd=30 10.23.24.58:56868/0
    -8> 2013-04-26 21:33:07.148088 7f5207315700  1 -- 10.23.24.8:6789/0 --> mon.1 10.23.24.9:6789/0 -- mon_sync( start ) v1 -- ?+0 0x26da580
    -7> 2013-04-26 21:33:07.148337 7f5207315700  1 -- 10.23.24.8:6789/0 <== mon.2 10.23.24.10:6789/0 119568 ==== mon_probe(reply 4a5a7836-1d75-4e09-b3dc-80e5ae13d681 name mon3 
quorum 0,1,2 paxos( fc 175909 lc 175933 )) v4 ==== 579+0+0 (2730281254 0 0) 0x7a95900 con 0x278f9a0
    -6> 2013-04-26 21:33:07.148442 7f5207315700  1 -- 10.23.24.8:6789/0 <== mon.0 10.23.24.8:6789/0 0 ==== mon_sync( start_reply ) v1 ==== 0+0+0 (0 0 0) 0x2e7252c0 con 0x278f2c
0
    -5> 2013-04-26 21:33:07.148541 7f5207315700  1 -- 10.23.24.8:6789/0 --> mon.0 10.23.24.8:6789/0 -- mon_sync( heartbeat ) v1 -- ?+0 0x2ef7f580
    -4> 2013-04-26 21:33:07.148586 7f5207315700  1 -- 10.23.24.8:6789/0 --> mon.0 10.23.24.8:6789/0 -- mon_sync( start_chunks ) v1 -- ?+0 0x2ef7f840
    -3> 2013-04-26 21:33:07.148622 7f5207315700  1 -- 10.23.24.8:6789/0 <== mon.0 10.23.24.8:6789/0 0 ==== mon_sync( heartbeat ) v1 ==== 0+0+0 (0 0 0) 0x2ef7f580 con 0x278f2c0
    -2> 2013-04-26 21:33:07.148667 7f5207315700  1 mon.mon1@0(synchronizing sync( requester state chunks )) e1 handle_sync_heartbeat ignored stray message mon_sync( heartbeat ) v1
    -1> 2013-04-26 21:33:07.148698 7f5207315700  1 -- 10.23.24.8:6789/0 <== mon.0 10.23.24.8:6789/0 0 ==== mon_sync( start_chunks ) v1 ==== 0+0+0 (0 0 0) 0x2ef7f840 con 0x278f2c0
     0> 2013-04-26 21:33:07.150441 7f5207315700 -1 mon/Monitor.cc: In function 'void Monitor::handle_sync_start_chunks(MMonSync*)' thread 7f5207315700 time 2013-04-26 21:33:07.148735
mon/Monitor.cc: 1136: FAILED assert(!(sync_role & SYNC_ROLE_REQUESTER))

 ceph version 0.60-669-ga2a23cc (a2a23ccd959f6e7ebe1533b27e7320902624523b)
 1: (Monitor::handle_sync_start_chunks(MMonSync*)+0x84d) [0x4b060d]
 2: (Monitor::handle_sync(MMonSync*)+0x2cb) [0x4c1b4b]
 3: (Monitor::_ms_dispatch(Message*)+0xd90) [0x4c2ae0]
 4: (Monitor::ms_dispatch(Message*)+0x32) [0x4d96f2]
 5: (DispatchQueue::entry()+0x3f1) [0x6aef01]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x63d55d]
 7: (()+0x7e9a) [0x7f520c078e9a]
 8: (clone()+0x6d) [0x7f520a628ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The full log is attached.

I set the target version to 0.61 since this was on the next branch which will be the cuttlefish release soon.


Files

ceph-mon.mon1.log.gz (291 KB) ceph-mon.mon1.log.gz monitor1 logs Wido den Hollander, 04/26/2013 12:50 PM
ceph-mon.mon2.log.1.gz (5.96 MB) ceph-mon.mon2.log.1.gz Wido den Hollander, 04/26/2013 11:05 PM
ceph-mon.mon3.log.1.gz (3.06 MB) ceph-mon.mon3.log.1.gz Wido den Hollander, 04/26/2013 11:05 PM
ceph-mon.mon1.log.gz (19.5 KB) ceph-mon.mon1.log.gz mon1 April 28th Wido den Hollander, 04/28/2013 09:46 AM
ceph-mon.mon2.log.gz (174 KB) ceph-mon.mon2.log.gz mon2 April 28th Wido den Hollander, 04/28/2013 09:46 AM
ceph-mon.mon3.log.gz (672 KB) ceph-mon.mon3.log.gz mon3 April 28th Wido den Hollander, 04/28/2013 09:46 AM
Actions #1

Updated by Greg Farnum almost 11 years ago

  • Assignee set to Greg Farnum
  • Priority changed from Normal to Urgent
Actions #2

Updated by Greg Farnum almost 11 years ago

  • Status changed from New to In Progress

Well this is different — the monitor is addressing sync requests to itself!

Actions #3

Updated by Greg Farnum almost 11 years ago

Okay, I see part of what's happening here. The sync infrastructure includes a separate forwarding mechanism, and that's broken somehow so that the monitors are just pinging them back and forth without end. Then something happens to jar the sync state machine slightly out of whack and the next one of these messages that comes in does collide with an assert.

Actions #4

Updated by Greg Farnum almost 11 years ago

Okay, so mon1, id 0, is leader. Then, suddenly, he's probing and goes into syncing. There's no logging here which is making it apparent why, but that happens.

At a later time, he's pinging a MonSync "start" back and forth between mon.2. The only way this can happen is if mon1 sends his sync request to mon.2, but mon.2 thinks mon1 is the leader. There aren't any checks for this, but I haven't yet figured out how that can happen. Wido, do you have logs from the other monitors during this time?

There's also a ton of messages incoming throughout which fail due to lack of privileges, or apparently bad networking. I'm not sure what's going on with that but so far at least it appears to not be directly related.

Actions #5

Updated by Greg Farnum almost 11 years ago

  • Status changed from In Progress to Need More Info

I can't do any more without more logs, unfortunately. :(

In order to increase our odds of getting useful logs, I've added an assert which will catch the situation this monitor ended up in much sooner, and that should hopefully contain enough information for us to do something useful with it.

Updated by Wido den Hollander almost 11 years ago

I've added the logs from mon2 and mon3.

What I did notice, that now mon1 crashed without anything in the logs. mon2 and mon3 are running just fine right now.

Updated by Wido den Hollander almost 11 years ago

So, I'm not sure if this is related, but since I'm experiencing this with the 'next' branch I'm reporting it here for now.

All 3 monitors are running now, but a "ceph -w" stalls for ever,

The logs show me:

mon1

2013-04-28 18:31:00.910173 7f4c1596b700  0 -- 10.23.24.8:6789/0 >> 10.23.24.11:0/1018463894 pipe(0x1297da00 sd=49 :6789 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.23.24.11:0/1018463894 (socket is 10.23.24.11:35141/0)
2013-04-28 18:31:30.957769 7f4c14153700  0 -- 10.23.24.8:6789/0 >> 10.23.24.11:0/1018463894 pipe(0x14549c80 sd=41 :6789 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.23.24.11:0/1018463894 (socket is 10.23.24.11:35167/0)
2013-04-28 18:32:41.069028 7f4c15b6d700  0 -- 10.23.24.8:6789/0 >> 10.23.24.11:0/1018463894 pipe(0x1457e000 sd=53 :6789 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.23.24.11:0/1018463894 (socket is 10.23.24.11:35221/0)

mon2

2013-04-28 18:34:01.197386 7fcc7bf64700  0 -- 10.23.24.9:6789/0 >> 10.23.24.11:0/1018463894 pipe(0x3cd19780 sd=39 :6789 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.23.24.11:0/1018463894 (socket is 10.23.24.11:60483/0)
2013-04-28 18:34:21.229142 7fcc7e287700  0 -- 10.23.24.9:6789/0 >> 10.23.24.11:0/1018463894 pipe(0x2c74cc80 sd=42 :6789 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.23.24.11:0/1018463894 (socket is 10.23.24.11:60498/0)
2013-04-28 18:34:31.245462 7fcc7e085700  0 -- 10

mon3

2013-04-28 18:39:17.848429 7f987c075700  1 mon.mon3@2(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere
2013-04-28 18:39:17.848527 7f987c075700  1 mon.mon3@2(probing) e1 discarding message mon_subscribe({monmap=0+,osdmap=334}) and sending client elsewhere
2013-04-28 18:39:20.914438 7f987c075700  0 mon.mon3@2(probing).data_health(2846) update_stats avail 87% total 53564924 used 3975368 avail 46868532
2013-04-28 18:39:21.704925 7f987945e700  0 -- 10.23.24.10:6789/0 >> 10.23.24.11:0/1018463894 pipe(0x9d19280 sd=45 :6789 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.23.24.11:0/1018463894 (socket is 10.23.24.11:40909/0)
2013-04-28 18:39:22.871360 7f987c075700  1 mon.mon3@2(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere
2013-04-28 18:39:22.871457 7f987c075700  1 mon.mon3@2(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere
2013-04-28 18:39:22.871491 7f987c075700  1 mon.mon3@2(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

Currently there is one Qemu VM running, but that's all.

So the monitors still aren't agreeing on much, they seem to be blocking.

After restarting the mons and running 'ceph -w' I got a bunch of messages on my screen from over 24 hours ago and then it was running fine again.

Going through the logs they don't seem to be agreeing on much yet, so my guess is they'll stop working again pretty soon.

All the logs are attached again.

Actions #8

Updated by Mike Dawson almost 11 years ago

Wido,

This sounds quite consistent with the things I am seeing. The assert you saw to start this bug report is quite similar to #4793. Plus, your note above about ceph -w hanging is consistent with my observations when I reported #4784 and #4815. Could you check the size of your leveldb on each monitor? In my case (as noted on #4815), my monitors quickly grew to 19GB, 36GB, and 36GB. Obviously, that isn't desirable.

Actions #9

Updated by Wido den Hollander almost 11 years ago

Mike Dawson wrote:

Wido,

This sounds quite consistent with the things I am seeing. The assert you saw to start this bug report is quite similar to #4793. Plus, your note above about ceph -w hanging is consistent with my observations when I reported #4784 and #4815. Could you check the size of your leveldb on each monitor? In my case (as noted on #4815), my monitors quickly grew to 19GB, 36GB, and 36GB. Obviously, that isn't desirable.

Weird, I didn't find #4793, since that seems similar indeed.

My level db stores (store.db) are this in size:
mon1: 950MB
mon2: 1.5GB
mon3: 2.5GB

Actions #10

Updated by Greg Farnum almost 11 years ago

Possibility to check later today: peons commit to disk when they receive a propose in a way that they return those values on probing, but the erstwhile leader doesn't commit until it receives responses, so the values that it's behind on are actually proposed and it has them?

Actions #11

Updated by Greg Farnum almost 11 years ago

  • Status changed from Need More Info to In Progress

Possibility not true. Worth a quick look even so.

Wido's crash logs didn't really have any new data, but they confirmed the election sequence and in the review I figured out what's going on. See 41987f380f4fc72762486ad7ddd0ab63173dc5e3
The problem is figuring out why fcaabf1a22723c571c10d402464071c6405607c0 didn't prevent it from getting into that state to begin with. The handle_last check is actually more generous than mine in the election is. :/

Actions #12

Updated by Greg Farnum almost 11 years ago

  • Status changed from In Progress to Fix Under Review

Okay, wip-4837-election-syncing should address the assert properly. The other issue with monitors not forming a quorum is one or all of several other issues we've been working on with our abuse of leveldb.

Actions #13

Updated by Greg Farnum almost 11 years ago

  • Status changed from Fix Under Review to 7

Sage signed off on this; waiting for results from the rados multimon and monthrash suites before merging into next and marking resolved.

Actions #14

Updated by Greg Farnum almost 11 years ago

  • Status changed from 7 to Resolved

Merged into next in 3cf5824f60b15cdf4db4e895b4da0d6c964b9ed4. Forgot the "Fixes" tag, whoops.

Actions

Also available in: Atom PDF