Actions
Bug #7093
closedosd: peering can send messages prior to auth
Status:
Resolved
Priority:
Normal
Assignee:
Ian Colle
Category:
OSD
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
we are still authenticating:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000009061a9 in Wait (mutex=..., this=0x7fff71a29a70) at ./common/Cond.h:55 #2 MonClient::authenticate (this=0x7fff71a29660, timeout=0) at mon/MonClient.cc:448 #3 0x0000000000632927 in OSD::init (this=0x24ce000) at osd/OSD.cc:1228 #4 0x00000000005de24d in main (argc=<optimized out>, argv=<optimized out>) at ceph_osd.cc:468
but get an osdmap:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000009411d2 in Wait (mutex=..., this=0x24ce538) at common/Cond.h:55 #2 ThreadPool::drain (this=0x24ce470, wq=0x24cefd0) at common/WorkQueue.cc:252 #3 0x00000000006344b4 in drain (this=0x24cefd0) at ./common/WorkQueue.h:153 #4 OSD::handle_osd_map (this=0x24ce000, m=0x2d68000) at osd/OSD.cc:5294 #5 0x0000000000636deb in OSD::_dispatch (this=0x24ce000, m=0x2d68000) at osd/OSD.cc:4647 #6 0x00000000006374ec in OSD::ms_dispatch (this=0x24ce000, m=0x2d68000) at osd/OSD.cc:4446 #7 0x00000000009fa199 in ms_deliver_dispatch (m=0x2d68000, this=0x248b000) at msg/Messenger.h:587 #8 DispatchQueue::entry (this=0x248b0e8) at msg/DispatchQueue.cc:123 #9 0x00000000009356cd in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104 #10 0x00007ff858e68e9a in start_thread (arg=0x7ff84add5700) at pthread_create.c:308 #11 0x00007ff8572213fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #12 0x0000000000000000 in ?? ()
which kicks a bunch of pg peering state machines, see that the osdmap shows our rank as up (not us, though.. we are restarting!), and sends messages. whcih crashes with
#0 0x00007ff858e70b7b in raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x00000000008888be in reraise_fatal (signum=11) at global/signal_handler.cc:59 #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:105 #3 <signal handler called> #4 0x00000000005e85ab in OSD::ms_get_authorizer (this=0x24ce000, dest_type=4, authorizer=0x7ff83f2ba338, force_new=false) at osd/OSD.cc:4471 #5 0x000000000092e9fd in ms_deliver_get_authorizer (force_new=<optimized out>, peer_type=4, this=0x248bc00) at msg/Messenger.h:661 #6 SimpleMessenger::get_authorizer (this=0x248bc00, peer_type=4, force_new=false) at msg/SimpleMessenger.cc:356 #7 0x0000000000a15664 in Pipe::connect (this=0x252c500) at msg/Pipe.cc:883 #8 0x0000000000a1890d in Pipe::writer (this=0x252c500) at msg/Pipe.cc:1518 #9 0x0000000000a227fd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #10 0x00007ff858e68e9a in start_thread (arg=0x7ff83f2bb700) at pthread_create.c:308 #11 0x00007ff8572213fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #12 0x0000000000000000 in ?? () (gdb) f 4 #4 0x00000000005e85ab in OSD::ms_get_authorizer (this=0x24ce000, dest_type=4, authorizer=0x7ff83f2ba338, force_new=false) at osd/OSD.cc:4471 4471 osd/OSD.cc: No such file or directory. (gdb) p monc->auth $1 = (AuthClientHandler *) 0x0
this was on emperor, but the bug still exists in master.
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-12-31_19:40:07-upgrade:small-master-testing-basic-plana/20548$ cat orig.config.yaml archive_path: /var/lib/teuthworker/archive/teuthology-2013-12-31_19:40:07-upgrade:small-master-testing-basic-plana/20548 description: upgrade/small/rgw/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-workload/s3tests.yaml 3-upgrade-sequence/upgrade-all.yaml 4-restart/restart.yaml 5-emperor-workload/final.yaml distro/ubuntu_12.04.yaml} email: null job_id: '20548' kernel: kdb: true sha1: f48db1e9ac6f1578ab7efef9f66c70279e2f0cb5 last_in_suite: false machine_type: plana name: teuthology-2013-12-31_19:40:07-upgrade:small-master-testing-basic-plana nuke-on-error: true os_type: ubuntu os_version: '12.04' overrides: admin_socket: branch: master ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug ms: 1 debug osd: 5 log-whitelist: - slow request sha1: cae663af403af202df76ea4df84b43f919b4a541 ceph-deploy: branch: dev: master conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 install: ceph: sha1: cae663af403af202df76ea4df84b43f919b4a541 s3tests: branch: master workunit: sha1: cae663af403af202df76ea4df84b43f919b4a541 owner: scheduled_teuthology@teuthology roles: - - mon.a - mds.a - osd.0 - osd.1 - - mon.b - mon.c - osd.2 - osd.3 - - client.0 tasks: - chef: null - clock.check: null - install: branch: dumpling - ceph: fs: xfs - rgw: - client.0 - s3tests: client.0: force-branch: dumpling rgw_server: client.0 - install.upgrade: all: branch: emperor - ceph.restart: - osd.0 - osd.1 - osd.2 - osd.3 - mon.a - mon.b - mon.c - mds.a - rgw.client.0 - s3tests: client.0: rgw_server: client.0 teuthology_branch: master verbose: true
Actions