Project

General

Profile

Bug #20164

libceph: got bad padding

Added by Jorge Pinilla almost 3 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:

Description

I have just instaled a 3 node ceph instalation. Each node acts as monitor and 2 osd (1 sata 1 ssd).
I am getting a lot of messages like:
libceph: ceph_aes_decrypt got bad padding 31 on src len 32

Is that a normal thing? What could be the problem?

gotbadpadding.JPG View - an capture where you can see the problem (149 KB) Jorge Pinilla, 06/02/2017 09:33 AM

History

#1 Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to Linux kernel client
  • Category deleted (OSD)

That's the kernel client. Probably means you have a mismatch between how your daemons and clients are set up for authentication, but I'm not sure.

#2 Updated by Jorge Pinilla almost 3 years ago

They all running over VMWare machines and a debian 8 instalation. I am using Jewel ceph instalation.

What could be the problem?

#3 Updated by Ilya Dryomov over 2 years ago

Hi Jorge,

Do you still see this issue? My apologies for the late reply -- it went unnoticed under "Support" label, which is rarely used.

#4 Updated by Ilya Dryomov over 2 years ago

  • Category set to libceph

#5 Updated by Chris Creswell over 2 years ago

I'm seeing this issue too at Lehigh University. We're also running Debian 8 in a VMWare virtual machine, but with Ceph infernalis rather than Jewel.

#6 Updated by Jorge Pinilla over 2 years ago

Its a problem of the kernel, actually Debian 8 runs 3.16 when ceph rbd feature uses 4.X, I changed to Ubuntu server and its working without any problems.

#7 Updated by Ilya Dryomov over 2 years ago

  • Tracker changed from Support to Bug
  • Assignee set to Ilya Dryomov
  • Regression set to No
  • Severity set to 3 - minor

Let me see if I can reproduce with Debian 8 kernel.

#8 Updated by Timo Foerster over 2 years ago

I can confirm that this problem randomly occurs for kernel 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u:

Aug  4 08:52:50 hostname kernel: [150062.829836] libceph: ceph_aes_decrypt got bad padding 99 on src len 32
Aug  4 08:52:50 hostname kernel: [150062.836900] libceph: osd1 XX.XX.XX.113:6800 socket error on read 
Aug  4 09:50:15 hostname kernel: [153507.805909] libceph: ceph_aes_decrypt got bad padding -47 on src len 32
Aug  4 09:50:15 hostname kernel: [153507.806160] libceph: osd1 XX.XX.XX.113:6800 socket error on read

root@hostname:~# modinfo libceph
filename:       /lib/modules/3.16.0-4-amd64/kernel/net/ceph/libceph.ko
license:        GPL
description:    Ceph filesystem for Linux
author:         Patience Warnick <patience@newdream.net>
author:         Yehuda Sadeh <yehuda@hq.newdream.net>
author:         Sage Weil <sage@newdream.net>
depends:        libcrc32c
intree:         Y
vermagic:       3.16.0-4-amd64 SMP mod_unload modversions 

#9 Updated by Марк Коренберг almost 2 years ago

Exactly the same. Ceph Luminous 12.2.5 on server.

On client:
Debian 9
Linux musht 4.13.0-37-generic #42~16.04.1-Ubuntu SMP Wed Mar 7 16:03:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

#10 Updated by Марк Коренберг almost 2 years ago

[2981334.249321] libceph: ceph_aes_crypt got bad padding -24 on in_len 32
[2981334.249721] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2981334.249762] libceph: ceph_aes_crypt got bad padding 55 on in_len 32
[2981334.250086] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2981858.537426] libceph: ceph_aes_crypt got bad padding -90 on in_len 32
[2981858.537856] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2981858.537910] libceph: ceph_aes_crypt got bad padding 0 on in_len 32
[2981858.538255] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2982382.824775] libceph: ceph_aes_crypt got bad padding 36 on in_len 32
[2982382.825203] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2982382.825248] libceph: ceph_aes_crypt got bad padding -77 on in_len 32
[2982382.825580] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2982907.113056] libceph: ceph_aes_crypt got bad padding 24 on in_len 32
[2982907.113481] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2982907.113525] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2983431.400208] libceph: ceph_aes_crypt got bad padding -13 on in_len 32
[2983431.400613] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2983431.401875] libceph: ceph_aes_crypt got bad padding -4 on in_len 32
[2983431.402143] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2983955.687826] libceph: ceph_aes_crypt got bad padding -35 on in_len 32
[2983955.688284] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2983955.689737] libceph: ceph_aes_crypt got bad padding 33 on in_len 32
[2983955.690025] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2984479.975754] libceph: ceph_aes_crypt got bad padding -26 on in_len 32
[2984479.976188] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2984479.976233] libceph: ceph_aes_crypt got bad padding -85 on in_len 32
[2984479.976573] libceph: mds0 10.80.20.100:6800 bad authorize reply
[2985004.263246] libceph: ceph_aes_crypt got bad padding -90 on in_len 32

#11 Updated by Марк Коренберг almost 2 years ago

After client reboot, problem disappears

#12 Updated by Ilya Dryomov almost 2 years ago

How often do you see it? Can you share more of the kernel log?

#13 Updated by Xiaoxi Chen about 1 year ago

we hit this as well on RHEL 7.4, not a single one but a batch of our kernel client were running into this situation.

One thing worth to mention was I did a mds failover
----- Log for mds.17 ---
2019-01-30 17:26:22.088342 7f7d83762700 1 mds.7.cache handle_mds_failure mds.17 : recovery peers are 0,1,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
2019-01-30 17:26:22.097176 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38684 from mon.1
2019-01-30 17:26:22.097281 7f7d83762700 1 mds.7.cache handle_mds_failure mds.17 : recovery peers are 0,1,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
2019-01-30 17:26:23.103167 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38685 from mon.1
2019-01-30 17:26:23.103415 7f7d83762700 1 mds.7.75 recovery set is 0,1,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
2019-01-30 17:26:23.111149 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.196.149.53:6800/2120387906 conn(0x555f3d6e1000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer
2019-01-30 17:26:23.112364 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.196.149.53:6800/2120387906 conn(0x555f3d6e1000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY
2019-01-30 17:26:24.107851 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38686 from mon.1
2019-01-30 17:26:26.802300 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38688 from mds.13
2019-01-30 17:26:55.806541 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38689 from mon.1
2019-01-30 17:26:56.818474 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38690 from mon.1
2019-01-30 17:26:57.857935 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38691 from mon.1
2019-01-30 17:26:58.860218 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38692 from mon.1
2019-01-30 17:27:00.878781 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38693 from mon.1
2019-01-30 17:27:01.881912 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38694 from mon.1
2019-01-30 17:27:04.895504 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38695 from mon.1
2019-01-30 17:27:05.900146 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38696 from mon.1
2019-01-30 17:27:08.913311 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38697 from mon.1
2019-01-30 17:27:09.927762 7f7d83762700 1 mds.lvsaz01calmds17 Updating MDS map to version 38698 from mon.1
2019-01-30 17:27:11.128704 7f7d85703700 0 auth: could not find secret_id=2755
2019-01-30 17:27:11.128710 7f7d85703700 0 cephx: verify_authorizer could not get service secret for service mds secret_id=2755
2019-01-30 17:27:11.128714 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.134.16.26:0/1329496440 conn(0x555f1ff89800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2019-01-30 17:27:11.147970 7f7d85703700 0 auth: could not find secret_id=2594
2019-01-30 17:27:11.148065 7f7d85703700 0 cephx: verify_authorizer could not get service secret for service mds secret_id=2594
2019-01-30 17:27:11.148109 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.161.4.16:0/3328959822 conn(0x555f0051d000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2019-01-30 17:27:11.179411 7f7d85703700 0 auth: could not find secret_id=2430
2019-01-30 17:27:11.179437 7f7d85703700 0 cephx: verify_authorizer could not get service secret for service mds secret_id=2430
2019-01-30 17:27:11.179449 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.134.147.93:0/1208465764 conn(0x555eff62c000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer

...
2019-01-30 18:10:02.238241 7f7d85703700 0 auth: could not find secret_id=2757
2019-01-30 18:10:02.238252 7f7d85703700 0 cephx: verify_authorizer could not get service secret for service mds secret_id=2757
2019-01-30 18:10:02.238267 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.134.20.13:0/909978330 conn(0x555f00e17000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2019-01-30 18:10:02.541794 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.134.135.122:0/2655060682 conn(0x555f6b28c000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2019-01-30 18:10:02.690319 7f7d85703700 0 auth: do not have service unknown, i am mds
2019-01-30 18:10:02.690323 7f7d85703700 0 cephx: verify_authorizer could not get service secret for service unknown secret_id=0
2019-01-30 18:10:02.690326 7f7d85703700 0 -- 10.199.72.135:6800/2488908091 >> 10.161.4.57:0/897553654 conn(0x555f57263000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer

---------------- kernel version -----------------

filename: /lib/modules/3.10.0-693.11.1.el7.x86_64/kernel/net/ceph/libceph.ko.xz
license: GPL
description: Ceph core library
author: Patience Warnick <>
author: Yehuda Sadeh <>
author: Sage Weil <>
rhelversion: 7.4
srcversion: 2A5F276D5DB24F60F0176E6
depends: libcrc32c,dns_resolver
intree: Y
vermagic: 3.10.0-693.11.1.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: 61:B8:E8:7B:84:11:84:F6:2F:80:D6:07:79:AB:69:2A:49:D8:3B:AF
sig_hashalgo: sha256

---- Kernel log- -------

Wed Jan 30 17:26:23 2019] libceph: ceph_aes_crypt got bad padding 54 on in_len 32
[Wed Jan 30 17:26:23 2019] libceph: mds17 10.196.149.53:6800 bad authorize reply
[Wed Jan 30 17:26:23 2019] libceph: ceph_aes_crypt got bad padding -7 on in_len 32
[Wed Jan 30 17:26:23 2019] libceph: mds17 10.196.149.53:6800 bad authorize reply
[Wed Jan 30 17:26:24 2019] libceph: ceph_aes_crypt got bad padding 68 on in_len 32
[Wed Jan 30 17:26:24 2019] libceph: mds17 10.196.149.53:6800 bad authorize reply
[Wed Jan 30 17:26:26 2019] libceph: ceph_aes_crypt got bad padding -24 on in_len 32
[Wed Jan 30 17:26:26 2019] libceph: mds17 10.196.149.53:6800 bad authorize reply
[Wed Jan 30 17:26:30 2019] libceph: ceph_aes_crypt got bad padding -83 on in_len 32
[Wed Jan 30 17:26:30 2019] libceph: mds17 10.196.149.53:6800 bad authorize reply
[Wed Jan 30 17:26:38 2019] libceph: mds17 10.196.149.53:6800 bad authorize reply
[Wed Jan 30 17:26:54 2019] libceph: ceph_aes_crypt got bad padding 89 on in_len 32
[Wed Jan 30 17:26:54 2019] libceph: mds17 10.196.149.53:6800 bad authorize reply
[Wed Jan 30 17:27:10 2019] libceph: mds27 10.199.74.128:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds31 10.199.74.126:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds29 10.199.74.129:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds19 10.199.74.134:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds6 10.199.72.137:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds4 10.199.150.18:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds7 10.199.72.135:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds25 10.196.219.42:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds26 10.196.219.49:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds24 10.196.129.97:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds8 10.196.165.86:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds15 10.196.149.46:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds22 10.196.149.51:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds23 10.196.129.94:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds20 10.196.149.47:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds1 10.196.165.85:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds5 10.199.72.139:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds30 10.196.219.56:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds10 10.199.75.180:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds13 10.199.74.130:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds28 10.196.149.49:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds21 10.199.74.135:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds9 10.199.75.178:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds18 10.199.74.127:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds12 10.199.72.133:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds3 10.196.165.84:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds11 10.199.74.132:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds16 10.199.75.179:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds14 10.196.129.93:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds2 10.199.72.136:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:10 2019] libceph: mds0 10.199.72.143:6800 socket closed (con state OPEN)
[Wed Jan 30 17:27:14 2019] ceph: mds17 recovery completed
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -22 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds3 10.196.165.84:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -57 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds2 10.199.72.136:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -96 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds5 10.199.72.139:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding 77 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds4 10.199.150.18:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding 39 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds9 10.199.75.178:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: mds0 10.199.72.143:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding 76 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds1 10.196.165.85:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -84 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds7 10.199.72.135:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: mds8 10.196.165.86:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding 34 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds14 10.196.129.93:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -40 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds12 10.199.72.133:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -25 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds6 10.199.72.137:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding 106 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds15 10.196.149.46:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -70 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds20 10.196.149.47:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding 108 on in_len 32
[Wed Jan 30 17:27:14 2019] libceph: mds11 10.199.74.132:6800 bad authorize reply
[Wed Jan 30 17:27:14 2019] libceph: ceph_aes_crypt got bad padding -103 on in_len 32

#14 Updated by Ilya Dryomov about 1 year ago

  • Status changed from New to In Progress
  • Priority changed from Normal to Urgent

I think I know the cause of "got bad padding" errors, but only in conjunction with "bad authorize reply" errors (i.e. https://tracker.ceph.com/issues/20164#note-10 and https://tracker.ceph.com/issues/20164#note-13). I'm working on a fix.

The ones without "bad authorize reply" errors are still a mystery, but they were seen on a very old kernel.

#15 Updated by Ilya Dryomov about 1 year ago

  • Status changed from In Progress to Fix Under Review

[PATCH] libceph: handle an empty authorize reply

#16 Updated by Ilya Dryomov about 1 year ago

I managed to reproduce "got bad padding" errors, but not without a code change to make the tickets go and stay stale (had to switch cephx v1 signatures too, cephx v2 signatures are encrypted a bit differently -- no leading len and ceph_x_encrypt_header).

[  326.799859] libceph: osd1 192.168.122.1:6805 socket closed (con state OPEN)
[  326.805441] libceph: osd1 192.168.122.1:6805 bad authorize reply
[  327.953339] libceph: osd1 192.168.122.1:6805 socket closed (con state NEGOTIATING)
[  328.913565] libceph: osd1 192.168.122.1:6805 socket closed (con state NEGOTIATING)
[  330.962607] libceph: ceph_aes_crypt got bad padding -69 on in_len 32
[  330.964987] libceph: osd1 192.168.122.1:6805 bad authorize reply
[  334.931686] libceph: ceph_aes_crypt got bad padding 20 on in_len 32
[  334.934471] libceph: osd1 192.168.122.1:6805 bad authorize reply
[  343.185876] libceph: osd1 192.168.122.1:6805 socket closed (con state CONNECTING)
[  360.083842] libceph: osd1 192.168.122.1:6805 socket closed (con state CONNECTING)
[  392.339815] libceph: ceph_aes_crypt got bad padding 26 on in_len 32
[  392.342255] libceph: osd1 192.168.122.1:6805 bad authorize reply
[  460.946420] libceph: ceph_aes_crypt got bad padding 42 on in_len 32
[  460.947916] libceph: osd1 192.168.122.1:6805 bad authorize reply
[  592.020349] libceph: ceph_aes_crypt got bad padding -14 on in_len 32
[  592.022388] libceph: osd1 192.168.122.1:6805 bad authorize reply

Xiaoxi, I'm not sure how your system got into that state, but this patch should help. Instead of spinning on a bad authorize reply, libceph will invalidate its tickets and reconnect:

[  774.077718] libceph: osd1 192.168.122.1:6805 socket closed (con state OPEN)
[  774.086984] libceph: osd1 192.168.122.1:6805 connect authorization failure
[  774.091717] libceph: mon0 192.168.122.1:40100 session established

#17 Updated by Ilya Dryomov about 1 year ago

  • Status changed from Fix Under Review to 15

#18 Updated by Ilya Dryomov about 1 year ago

  • Status changed from 15 to Pending Backport

#20 Updated by Ilya Dryomov about 1 year ago

In 4.9.161, 4.14.104, 4.19.26, 4.20.13.

#21 Updated by Ilya Dryomov about 1 year ago

  • Status changed from Pending Backport to Resolved

In 3.18.137, 4.4.177.

Also available in: Atom PDF