Project

General

Profile

Bug #3703

osd: crash while encrypting

Added by Tamilarasi muthamizhan about 11 years ago. Updated about 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

logs: ubuntu@teuthology:/a/teuthology-2012-12-30_19:00:03-regression-next-testing-basic/32113

ubuntu@teuthology:/a/teuthology-2012-12-30_19:00:03-regression-next-testing-basic/32113$ cat config.yaml kernel: &id001
  kdb: true
  sha1: 694eba0efb52f4954e2a954cd2ed967167056245
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 500
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: ec5288a3126749aa0def43c8c3cdb8a3b970d788
  s3tests:
    branch: next
  workunit:
    sha1: ec5288a3126749aa0def43c8c3cdb8a3b970d788
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana44.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYE0eu9E8TQwtUy89Wldp54VbNBEoO9XQf77eXXzzmNwYUFRrNX0mZV/I8GqyRJuMrPG8V4aZBthBHTtnEmQ6RAS7fVdthi/hEgwnM9cAqY3KX9mR5xJnHBc/fa5KLrnSr3Wrztf42PpQNEN5Tk55K6wWUlZOTHU3vE0j3kF+YQ5FeBhQbghztHPKFR8bOmZJp9TpbXgbvEM2RWr9bYtro1KuQOgrairyVVNWdAuwZuxSQT4soyHoSkY9JmeXKsNRAOamxH9w57mDC3PXui7r6Fp8OCWSK+GmlLTtPaZtulSCcucaZtpVae7F4s9JNxaRl5RxuUtwMRfgAHGlL2BZv
  ubuntu@plana50.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVJ+lkgUdkr27WFzrmwSQU22m+pFIiqzhfcO4Hinu8A8uyP4FIephrEcq4Rrt4hp14Syb1pxXisV6UKwAZKikDoD1Wl0LSro4TzOs6HuMEhfvzdnISvyzE3f2w0cj1zE61rHFYfPNF14b9fkE3wBf2Vb4i6ReaN2/Yd12J/xO52tJH1lPxgsFoAIRMjdQMbfVwPU6kK9SY4ngt9iLjge6gZ0O9Jwe2vrgD6+LNoMY9qvNjgRvQdCTi85OQwitU0ZMZdGC0cQ/oNbKd+yW92rW9Wu6dcyKSisesRcm7lbtS6X2uUup+u3vWze7coT+Py3TdNW6nGpIg4muyvqHfSinz
  ubuntu@plana60.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5sa91O7pjxDijkXWrqS3slUWFtQJZtC3hfN5cKt+oISIVRZoy4CNxFrOWhmXwKOZaIV7fS2PzmhZQe9jfA5iRP7G/opl/5sMImzTIUUz8Qrlqb7epmwYHkoE196uN45y8iW/IDwsD7o69+rOGIj0SgdT0nGBYXB27OhdECdumxHcST/vM8dFfIwwMienvhKKB+ghrS2bYa33DcfYHBQWdIPXrUf2brb/XRYDwVMg8W5uPyW8dx3GIONVxSWequnTO1m0FuMKB7BrVpH19+/H4DFOwj6vqSvvOYUWm3Go3FsQTvparxrg7/Rliee0RBWcBIE3D0gWk6r74HV0C4pnT
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- rbd:
    all:
      image_size: 20480
- workunit:
    clients:
      all:
      - suites/ffsb.sh
ubuntu@teuthology:/a/teuthology-2012-12-30_19:00:03-regression-next-testing-basic/32113$ cat summary.yaml 
ceph-sha1: ec5288a3126749aa0def43c8c3cdb8a3b970d788
client.0-kernel-sha1: 694eba0efb52f4954e2a954cd2ed967167056245
description: collection:kernel-rbd clusters:fixed-3.yaml fs:btrfs.yaml msgr-failures:many.yaml
  tasks:rbd_workunit_suites_ffsb.yaml
duration: 22336.48376917839
failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage
  /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper kill /tmp/cephtest/binary/usr/local/bin/ceph-osd
  -f -i 5 -c /tmp/cephtest/ceph.conf'''
flavor: basic
mon.a-kernel-sha1: 694eba0efb52f4954e2a954cd2ed967167056245
mon.b-kernel-sha1: 694eba0efb52f4954e2a954cd2ed967167056245
owner: scheduled_teuthology@teuthology
success: false

History

#1 Updated by Josh Durgin about 11 years ago

  • Subject changed from rbd_workunit_suites_ffsb task failed in nightlies with core dump to osd: crash while encrypting

This is an osd crash.

 ceph version 0.55.1-377-gec5288a (ec5288a3126749aa0def43c8c3cdb8a3b970d788)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x1069a77]
 2: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x10cef56]
 3: (()+0xfcb0) [0x7f7063bfbcb0]
 4: (CryptoKey::encrypt(CephContext*, ceph::buffer::list const&, ceph::buffer::list&, std::string&) const+0x133) [0x10db541]
 5: (void encode_encrypt_enc_bl<CephXAuthorize>(CephContext*, CephXAuthorize const&, CryptoKey const&, ceph::buffer::list&, std::string&)+0xb7) [0x10d9770]
 6: (int encode_encrypt<CephXAuthorize>(CephContext*, CephXAuthorize const&, CryptoKey const&, ceph::buffer::list&, std::string&)+0x51) [0x10d8661]
 7: (CephXTicketHandler::build_authorizer(unsigned long)+0x159) [0x10d4ddb]
 8: (CephXTicketManager::build_authorizer(unsigned int)+0x196) [0x10d5100]
 9: (CephxClientHandler::build_authorizer(unsigned int)+0x131) [0x1262679]
 10: (OSD::ms_get_authorizer(int, AuthAuthorizer**, bool)+0x1d0) [0xeb6066]
 11: (Messenger::ms_deliver_get_authorizer(int, bool)+0x5c) [0x1192b82]
 12: (SimpleMessenger::get_authorizer(int, bool)+0x29) [0x118eac7]
 13: (Pipe::connect()+0x185f) [0x1284d41]
 14: (Pipe::writer()+0x265) [0x128ba0f]
 15: (Pipe::Writer::entry()+0x1c) [0x1291b72]
 16: (Thread::_entry_func(void*)+0x23) [0x119b389]
 17: (()+0x7e9a) [0x7f7063bf3e9a]
 18: (clone()+0x6d) [0x7f7061d7e4bd]

This happened shortly after rotating keys were updated.

Additionally, there are a bunch of slow requests, and corrupted MOSDSubOp messages. A snippet from the log:

-177> 2012-12-31 02:52:53.944964 7f704d732700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7000 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success
  -176> 2012-12-31 02:52:53.945610 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding
  -175> 2012-12-31 02:52:53.945638 7f704e23d700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46151 pgs=26115371 cs=26198528 l=0).fault 0: Success
  -174> 2012-12-31 02:52:53.945666 7f704e23d700  0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46151 pgs=26115371 cs=26198528 l=0).fault, initiating reconnect
  -173> 2012-12-31 02:52:53.946239 7f704d732700  1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x24a7240 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43585/0
  -172> 2012-12-31 02:52:53.946701 7f704d732700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7240 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success
  -171> 2012-12-31 02:52:53.947084 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding
  -170> 2012-12-31 02:52:53.947111 7f704e23d700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46152 pgs=26115373 cs=26198530 l=0).fault 0: Success
  -169> 2012-12-31 02:52:53.947144 7f704e23d700  0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46152 pgs=26115373 cs=26198530 l=0).fault, initiating reconnect
  -168> 2012-12-31 02:52:53.947610 7f704d732700  1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x39f6d80 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43586/0
  -167> 2012-12-31 02:52:53.948140 7f704d732700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x39f6d80 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success
  -166> 2012-12-31 02:52:53.948524 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding
  -165> 2012-12-31 02:52:53.948556 7f704e23d700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46153 pgs=26115375 cs=26198532 l=0).fault 0: Success
  -164> 2012-12-31 02:52:53.948584 7f704e23d700  0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46153 pgs=26115375 cs=26198532 l=0).fault, initiating reconnect
  -163> 2012-12-31 02:52:53.948955 7f704d732700  1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x39f6b40 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43587/0
  -162> 2012-12-31 02:52:53.949386 7f704d732700  0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x39f6b40 sd=29 :6807 pgs=0 cs=0 l=0).accept connect_seq 26198533 vs existing 26198533 state co
nnecting
  -161> 2012-12-31 02:52:53.949451 7f704d732700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x39f6b40 sd=29 :6807 pgs=0 cs=0 l=0).fault 0: Success
  -160> 2012-12-31 02:52:53.950069 7f704e23d700 -1 failed to decode message of type 76 v7: buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding
  -159> 2012-12-31 02:52:53.950122 7f704e23d700  2 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46154 pgs=26115377 cs=26198534 l=0).fault 0: Success
  -158> 2012-12-31 02:52:53.950151 7f704e23d700  0 -- 10.214.132.28:6807/6536 >> 10.214.132.34:6804/5881 pipe(0x24a7480 sd=27 :46154 pgs=26115377 cs=26198534 l=0).fault, initiating reconnect
  -157> 2012-12-31 02:52:53.950722 7f704d732700  1 -- 10.214.132.28:6807/6536 >> :/0 pipe(0x39f6000 sd=29 :6807 pgs=0 cs=0 l=0).accept sd=29 10.214.132.34:43588/0
  -156> 2012-12-31 02:52:53.950790 7f7054257700 10 monclient: tick
  -155> 2012-12-31 02:52:53.950817 7f7054257700 10 monclient: _check_auth_tickets getting new tickets!
  -154> 2012-12-31 02:52:53.950863 7f7054257700 10 monclient: _send_mon_message to mon.c at 10.214.132.34:6790/0
  -153> 2012-12-31 02:52:53.950891 7f7054257700  1 -- 10.214.132.28:6806/6536 --> 10.214.132.34:6790/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x3552e00 con 0x651d9a0
  -152> 2012-12-31 02:52:53.950913 7f7054257700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2012-12-31 02:52:23.950911)
  -151> 2012-12-31 02:52:53.950924 7f7054257700 10 monclient: renew subs? (now: 2012-12-31 02:52:53.950924; renew after: 2012-12-31 02:55:03.949116) -- no
  -150> 2012-12-31 02:52:53.950937 7f7054257700 10  log_queue is 60 last_log 101789 sent 101729 num 60 unsent 60 sending 60
  -149> 2012-12-31 02:52:53.950949 7f7054257700 10  will send 2012-12-31 02:52:44.086475 osd.5 10.214.132.28:6806/6536 101730 : [WRN] 6 slow requests, 6 included below; oldest blocked for > 20539.004821 se
cs
  -148> 2012-12-31 02:52:53.950979 7f7054257700 10  will send 2012-12-31 02:52:44.086484 osd.5 10.214.132.28:6806/6536 101731 : [WRN] slow request 20539.004821 seconds old, received at 2012-12-30 21:10:25.
081570: osd_op(client.4137.1:422413 rb.0.1022.684a481a.0000000010e6 [write 3534848~4096] 2.f73b84f1) currently waiting for sub ops
  -147> 2012-12-31 02:52:53.950996 7f7054257700 10  will send 2012-12-31 02:52:44.086488 osd.5 10.214.132.28:6806/6536 101732 : [WRN] slow request 20539.003856 seconds old, received at 2012-12-30 21:10:25.
082535: osd_op(client.4137.1:422433 rb.0.1022.684a481a.0000000010e6 [write 2605056~4096] 2.f73b84f1) currently waiting for sub ops

#2 Updated by Ian Colle about 11 years ago

  • Assignee set to Josh Durgin

#3 Updated by Sage Weil about 11 years ago

  • Priority changed from Normal to High

#4 Updated by Ian Colle about 11 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF