Project

General

Profile

Bug #4519

mon: on auth/none/AuthNoneServiceHandler.h: FAILED assert(0) on v0.59 with auth 'none'

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Xiaoxi came to us with the following issue both on #ceph and on ceph-devel.

2013-03-21 14:22:20.989567 7fe589b03700 -1 auth/none/AuthNoneServiceHandler.h: In function 'virtual int AuthNoneServiceHandler::handle_request(ceph::buffer::list::iterator&, ceph::bufferlist&, uint64_t&, AuthCapsInfo&, uint64_t*)' thread 7fe589b03700 time 2013-03-21 14:22:20.987929
auth/none/AuthNoneServiceHandler.h: 35: FAILED assert(0)

 ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)
 1: /usr/bin/ceph-mon() [0x578d2d]
 2: (AuthMonitor::prep_auth(MAuth*, bool)+0x75e) [0x55fa7e]
 3: (AuthMonitor::preprocess_query(PaxosServiceMessage*)+0x17d) [0x56039d]
 4: (PaxosService::dispatch(PaxosServiceMessage*)+0x275) [0x4ecc25]
 5: (Context::complete(int)+0xa) [0x4bfe9a]
 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xbc) [0x4c70ec]
 7: (Paxos::begin(ceph::buffer::list&)+0x9f1) [0x4e21b1]
 8: (Paxos::propose_queued()+0xdb) [0x4e249b]
 9: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x128) [0x4e2828]
 10: (PaxosService::propose_pending()+0x292) [0x4ebfc2]
 11: (PaxosService::dispatch(PaxosServiceMessage*)+0x655) [0x4ed005]
 12: (Monitor::_ms_dispatch(Message*)+0x3a3) [0x4bebd3]
 13: (Monitor::ms_dispatch(Message*)+0x32) [0x4d9e82]
 14: (DispatchQueue::entry()+0x35b) [0x6899eb]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x61ff0d]
 16: (()+0x7e9a) [0x7fe58f34de9a]
 17: (clone()+0x6d) [0x7fe58df9fccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This happened on a freshly mkcephfs'ed cluster, but it's easily reproducible on an existing cluster, granted you run a v0.50 without cephx.

This appears to have been caused by 436e5be950154fdbbd9e1cfaf4267be6159249d5 (a fix for #4285), which by not affecting cephx must have gone unnoticed until now.

Associated revisions

Revision 71ec9c6b (diff)
Added by Joao Eduardo Luis about 11 years ago

mon: AuthMonitor: delete auth_handler while increasing max_global_id

By not deleting and setting NULL the session's auth_handler, we could
hit a scenario in which we'd end up dispatching a previously-wait-listed
auth message and we wouldn't start its auth session.

This only happened when increasing max_global_id via Paxos (in which case
we would wait-list the message) and would only be noticeable when running
with cephx disabled.

Fixes: #4519

Signed-off-by: Joao Eduardo Luis <>

History

#1 Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from New to In Progress

I've come up with a fix and will push it shortly to wip-4519 for review.

#2 Updated by Joao Eduardo Luis about 11 years ago

#3 Updated by Joao Eduardo Luis about 11 years ago

After some attempts at a successful pull request: https://github.com/ceph/ceph/pull/135

#4 Updated by Sage Weil about 11 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF