Bug #4519
mon: on auth/none/AuthNoneServiceHandler.h: FAILED assert(0) on v0.59 with auth 'none'
0%
Description
Xiaoxi came to us with the following issue both on #ceph and on ceph-devel.
2013-03-21 14:22:20.989567 7fe589b03700 -1 auth/none/AuthNoneServiceHandler.h: In function 'virtual int AuthNoneServiceHandler::handle_request(ceph::buffer::list::iterator&, ceph::bufferlist&, uint64_t&, AuthCapsInfo&, uint64_t*)' thread 7fe589b03700 time 2013-03-21 14:22:20.987929 auth/none/AuthNoneServiceHandler.h: 35: FAILED assert(0) ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759) 1: /usr/bin/ceph-mon() [0x578d2d] 2: (AuthMonitor::prep_auth(MAuth*, bool)+0x75e) [0x55fa7e] 3: (AuthMonitor::preprocess_query(PaxosServiceMessage*)+0x17d) [0x56039d] 4: (PaxosService::dispatch(PaxosServiceMessage*)+0x275) [0x4ecc25] 5: (Context::complete(int)+0xa) [0x4bfe9a] 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xbc) [0x4c70ec] 7: (Paxos::begin(ceph::buffer::list&)+0x9f1) [0x4e21b1] 8: (Paxos::propose_queued()+0xdb) [0x4e249b] 9: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x128) [0x4e2828] 10: (PaxosService::propose_pending()+0x292) [0x4ebfc2] 11: (PaxosService::dispatch(PaxosServiceMessage*)+0x655) [0x4ed005] 12: (Monitor::_ms_dispatch(Message*)+0x3a3) [0x4bebd3] 13: (Monitor::ms_dispatch(Message*)+0x32) [0x4d9e82] 14: (DispatchQueue::entry()+0x35b) [0x6899eb] 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x61ff0d] 16: (()+0x7e9a) [0x7fe58f34de9a] 17: (clone()+0x6d) [0x7fe58df9fccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This happened on a freshly mkcephfs'ed cluster, but it's easily reproducible on an existing cluster, granted you run a v0.50 without cephx.
This appears to have been caused by 436e5be950154fdbbd9e1cfaf4267be6159249d5 (a fix for #4285), which by not affecting cephx must have gone unnoticed until now.
Associated revisions
mon: AuthMonitor: delete auth_handler while increasing max_global_id
By not deleting and setting NULL the session's auth_handler, we could
hit a scenario in which we'd end up dispatching a previously-wait-listed
auth message and we wouldn't start its auth session.
This only happened when increasing max_global_id via Paxos (in which case
we would wait-list the message) and would only be noticeable when running
with cephx disabled.
Fixes: #4519
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
History
#1 Updated by Joao Eduardo Luis about 11 years ago
- Status changed from New to In Progress
I've come up with a fix and will push it shortly to wip-4519 for review.
#2 Updated by Joao Eduardo Luis about 11 years ago
pull request: https://github.com/ceph/ceph/pull/133
#3 Updated by Joao Eduardo Luis about 11 years ago
After some attempts at a successful pull request: https://github.com/ceph/ceph/pull/135
#4 Updated by Sage Weil about 11 years ago
- Status changed from In Progress to Resolved