Bug #3703
osd: crash while encrypting
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
logs: ubuntu@teuthology:/a/teuthology-2012-12-30_19:00:03-regression-next-testing-basic/32113
ubuntu@teuthology:/a/teuthology-2012-12-30_19:00:03-regression-next-testing-basic/32113$ cat config.yaml kernel: &id001 kdb: true sha1: 694eba0efb52f4954e2a954cd2ed967167056245 nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 500 fs: btrfs log-whitelist: - slow request sha1: ec5288a3126749aa0def43c8c3cdb8a3b970d788 s3tests: branch: next workunit: sha1: ec5288a3126749aa0def43c8c3cdb8a3b970d788 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana44.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYE0eu9E8TQwtUy89Wldp54VbNBEoO9XQf77eXXzzmNwYUFRrNX0mZV/I8GqyRJuMrPG8V4aZBthBHTtnEmQ6RAS7fVdthi/hEgwnM9cAqY3KX9mR5xJnHBc/fa5KLrnSr3Wrztf42PpQNEN5Tk55K6wWUlZOTHU3vE0j3kF+YQ5FeBhQbghztHPKFR8bOmZJp9TpbXgbvEM2RWr9bYtro1KuQOgrairyVVNWdAuwZuxSQT4soyHoSkY9JmeXKsNRAOamxH9w57mDC3PXui7r6Fp8OCWSK+GmlLTtPaZtulSCcucaZtpVae7F4s9JNxaRl5RxuUtwMRfgAHGlL2BZv ubuntu@plana50.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVJ+lkgUdkr27WFzrmwSQU22m+pFIiqzhfcO4Hinu8A8uyP4FIephrEcq4Rrt4hp14Syb1pxXisV6UKwAZKikDoD1Wl0LSro4TzOs6HuMEhfvzdnISvyzE3f2w0cj1zE61rHFYfPNF14b9fkE3wBf2Vb4i6ReaN2/Yd12J/xO52tJH1lPxgsFoAIRMjdQMbfVwPU6kK9SY4ngt9iLjge6gZ0O9Jwe2vrgD6+LNoMY9qvNjgRvQdCTi85OQwitU0ZMZdGC0cQ/oNbKd+yW92rW9Wu6dcyKSisesRcm7lbtS6X2uUup+u3vWze7coT+Py3TdNW6nGpIg4muyvqHfSinz ubuntu@plana60.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5sa91O7pjxDijkXWrqS3slUWFtQJZtC3hfN5cKt+oISIVRZoy4CNxFrOWhmXwKOZaIV7fS2PzmhZQe9jfA5iRP7G/opl/5sMImzTIUUz8Qrlqb7epmwYHkoE196uN45y8iW/IDwsD7o69+rOGIj0SgdT0nGBYXB27OhdECdumxHcST/vM8dFfIwwMienvhKKB+ghrS2bYa33DcfYHBQWdIPXrUf2brb/XRYDwVMg8W5uPyW8dx3GIONVxSWequnTO1m0FuMKB7BrVpH19+/H4DFOwj6vqSvvOYUWm3Go3FsQTvparxrg7/Rliee0RBWcBIE3D0gWk6r74HV0C4pnT tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: null - rbd: all: image_size: 20480 - workunit: clients: all: - suites/ffsb.sh ubuntu@teuthology:/a/teuthology-2012-12-30_19:00:03-regression-next-testing-basic/32113$ cat summary.yaml ceph-sha1: ec5288a3126749aa0def43c8c3cdb8a3b970d788 client.0-kernel-sha1: 694eba0efb52f4954e2a954cd2ed967167056245 description: collection:kernel-rbd clusters:fixed-3.yaml fs:btrfs.yaml msgr-failures:many.yaml tasks:rbd_workunit_suites_ffsb.yaml duration: 22336.48376917839 failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper kill /tmp/cephtest/binary/usr/local/bin/ceph-osd -f -i 5 -c /tmp/cephtest/ceph.conf''' flavor: basic mon.a-kernel-sha1: 694eba0efb52f4954e2a954cd2ed967167056245 mon.b-kernel-sha1: 694eba0efb52f4954e2a954cd2ed967167056245 owner: scheduled_teuthology@teuthology success: false
History
#1 Updated by Josh Durgin about 11 years ago
- Subject changed from rbd_workunit_suites_ffsb task failed in nightlies with core dump to osd: crash while encrypting
This is an osd crash.
ceph version 0.55.1-377-gec5288a (ec5288a3126749aa0def43c8c3cdb8a3b970d788) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x1069a77] 2: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x10cef56] 3: (()+0xfcb0) [0x7f7063bfbcb0] 4: (CryptoKey::encrypt(CephContext*, ceph::buffer::list const&, ceph::buffer::list&, std::string&) const+0x133) [0x10db541] 5: (void encode_encrypt_enc_bl<CephXAuthorize>(CephContext*, CephXAuthorize const&, CryptoKey const&, ceph::buffer::list&, std::string&)+0xb7) [0x10d9770] 6: (int encode_encrypt<CephXAuthorize>(CephContext*, CephXAuthorize const&, CryptoKey const&, ceph::buffer::list&, std::string&)+0x51) [0x10d8661] 7: (CephXTicketHandler::build_authorizer(unsigned long)+0x159) [0x10d4ddb] 8: (CephXTicketManager::build_authorizer(unsigned int)+0x196) [0x10d5100] 9: (CephxClientHandler::build_authorizer(unsigned int)+0x131) [0x1262679] 10: (OSD::ms_get_authorizer(int, AuthAuthorizer**, bool)+0x1d0) [0xeb6066] 11: (Messenger::ms_deliver_get_authorizer(int, bool)+0x5c) [0x1192b82] 12: (SimpleMessenger::get_authorizer(int, bool)+0x29) [0x118eac7] 13: (Pipe::connect()+0x185f) [0x1284d41] 14: (Pipe::writer()+0x265) [0x128ba0f] 15: (Pipe::Writer::entry()+0x1c) [0x1291b72] 16: (Thread::_entry_func(void*)+0x23) [0x119b389] 17: (()+0x7e9a) [0x7f7063bf3e9a] 18: (clone()+0x6d) [0x7f7061d7e4bd]
This happened shortly after rotating keys were updated.
Additionally, there are a bunch of slow requests, and corrupted MOSDSubOp messages. A snippet from the log:
-177> 2012-12-31 02:52:53.944964 7f704d732700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7000 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success -176> 2012-12-31 02:52:53.945610 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding -175> 2012-12-31 02:52:53.945638 7f704e23d700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46151 pgs=26115371 cs=26198528 l=0).fault 0: Success -174> 2012-12-31 02:52:53.945666 7f704e23d700 0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46151 pgs=26115371 cs=26198528 l=0).fault, initiating reconnect -173> 2012-12-31 02:52:53.946239 7f704d732700 1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x24a7240 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43585/0 -172> 2012-12-31 02:52:53.946701 7f704d732700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7240 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success -171> 2012-12-31 02:52:53.947084 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding -170> 2012-12-31 02:52:53.947111 7f704e23d700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46152 pgs=26115373 cs=26198530 l=0).fault 0: Success -169> 2012-12-31 02:52:53.947144 7f704e23d700 0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46152 pgs=26115373 cs=26198530 l=0).fault, initiating reconnect -168> 2012-12-31 02:52:53.947610 7f704d732700 1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x39f6d80 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43586/0 -167> 2012-12-31 02:52:53.948140 7f704d732700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x39f6d80 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success -166> 2012-12-31 02:52:53.948524 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding -165> 2012-12-31 02:52:53.948556 7f704e23d700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46153 pgs=26115375 cs=26198532 l=0).fault 0: Success -164> 2012-12-31 02:52:53.948584 7f704e23d700 0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46153 pgs=26115375 cs=26198532 l=0).fault, initiating reconnect -163> 2012-12-31 02:52:53.948955 7f704d732700 1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x39f6b40 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43587/0 -162> 2012-12-31 02:52:53.949386 7f704d732700 0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x39f6b40 sd=29 :6807 pgs=0 cs=0 l=0).accept connect_seq 26198533 vs existing 26198533 state co nnecting -161> 2012-12-31 02:52:53.949451 7f704d732700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x39f6b40 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success -160> 2012-12-31 02:52:53.950069 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding -159> 2012-12-31 02:52:53.950122 7f704e23d700 2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46154 pgs=26115377 cs=26198534 l=0).fault 0: Success -158> 2012-12-31 02:52:53.950151 7f704e23d700 0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46154 pgs=26115377 cs=26198534 l=0).fault, initiating reconnect -157> 2012-12-31 02:52:53.950722 7f704d732700 1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x39f6000 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43588/0 -156> 2012-12-31 02:52:53.950790 7f7054257700 10 monclient: tick -155> 2012-12-31 02:52:53.950817 7f7054257700 10 monclient: _check_auth_tickets getting new tickets! -154> 2012-12-31 02:52:53.950863 7f7054257700 10 monclient: _send_mon_message to mon.c at 10.214.132.34:6790/0 -153> 2012-12-31 02:52:53.950891 7f7054257700 1 -- 10.214.132.28:6806/6536 --> 10.214.132.34:6790/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x3552e00 con 0x651d9a0 -152> 2012-12-31 02:52:53.950913 7f7054257700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2012-12-31 02:52:23.950911) -151> 2012-12-31 02:52:53.950924 7f7054257700 10 monclient: renew subs? (now: 2012-12-31 02:52:53.950924; renew after: 2012-12-31 02:55:03.949116) -- no -150> 2012-12-31 02:52:53.950937 7f7054257700 10 log_queue is 60 last_log 101789 sent 101729 num 60 unsent 60 sending 60 -149> 2012-12-31 02:52:53.950949 7f7054257700 10 will send 2012-12-31 02:52:44.086475 osd.5 10.214.132.28:6806/6536 101730 : [WRN] 6 slow requests, 6 included below; oldest blocked for > 20539.004821 se cs -148> 2012-12-31 02:52:53.950979 7f7054257700 10 will send 2012-12-31 02:52:44.086484 osd.5 10.214.132.28:6806/6536 101731 : [WRN] slow request 20539.004821 seconds old, received at 2012-12-30 21:10:25. 081570: osd_op(client.4137.1:422413 rb.0.1022.684a481a.0000000010e6 [write 3534848~4096] 2.f73b84f1) currently waiting for sub ops -147> 2012-12-31 02:52:53.950996 7f7054257700 10 will send 2012-12-31 02:52:44.086488 osd.5 10.214.132.28:6806/6536 101732 : [WRN] slow request 20539.003856 seconds old, received at 2012-12-30 21:10:25. 082535: osd_op(client.4137.1:422433 rb.0.1022.684a481a.0000000010e6 [write 2605056~4096] 2.f73b84f1) currently waiting for sub ops
#2 Updated by Ian Colle about 11 years ago
- Assignee set to Josh Durgin
#3 Updated by Sage Weil about 11 years ago
- Priority changed from Normal to High
#4 Updated by Ian Colle about 11 years ago
- Status changed from New to Can't reproduce