Project

General

Profile

Actions

Bug #38128

closed

msgr: unexpected "handle_cephx_auth got bad authorizer, auth_reply_len=0"

Added by Patrick Donnelly about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From the client:

2019-01-31 15:05:47.538 7f963effd700  1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 4 ==== client_caps(flush_ack ino 0x1000000002c 47 seq 0 tid 66 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (1811838955 0 0) 0x7f964800dde0 con 0x7f9644003730
2019-01-31 15:05:47.538 7f9658541700  1 -- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 msgr2=0x7f96440026b0 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 20
2019-01-31 15:05:47.538 7f963effd700 10 client.5290  mds.0 seq now 2
2019-01-31 15:05:47.538 7f9658541700  1 -- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 msgr2=0x7f96440026b0 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=READY pgs=13 cs=0 l=0).handle_read_frame_length_and_tag read frame length and tag failed r=-1 ((1) Operation not permitted)
2019-01-31 15:05:47.538 7f963effd700  5 client.5290 handle_cap_flush_ack mds.0 cleaned Fw on 0x1000000002c.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) flushing_caps=Fw objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) with Fw
2019-01-31 15:05:47.538 7f963effd700  5 client.5290   flushing_caps Fw -> -
2019-01-31 15:05:47.538 7f963effd700 10 client.5290  0x1000000002c.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) !flushing
2019-01-31 15:05:47.538 7f963effd700 10 client.5290 put_inode on 0x1000000002c.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40)
2019-01-31 15:05:47.538 7f963effd700 15 inode.put on 0x7f9644032d40 0x1000000002c.head now 0
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=READY pgs=13 cs=0 l=0)._fault initiating reconnect
2019-01-31 15:05:47.538 7f963effd700 10 client.5290 remove_cap mds.0 on 0x1000000002c.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40)
2019-01-31 15:05:47.538 7f963effd700 15 client.5290 remove_cap last one, closing snaprealm 0x7f9644001cd0
2019-01-31 15:05:47.538 7f963effd700 20 client.5290 put_snap_realm 0x1 0x7f9644001cd0 21 -> 20
2019-01-31 15:05:47.538 7f963effd700 10 client.5290 put_inode deleting 0x1000000002c.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=- objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40)
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_bad_auth authentication failed error code=1 error message=Bad Authorizer
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=START_CONNECT pgs=13 cs=1 l=0)._fault waiting 0.200000
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_done authentication done, flags=0
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_session_reset received session reset
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).reset_session

Log: /ceph/teuthology-archive/pdonnell-2019-01-31_14:25:32-fs-wip-pdonnell-testing-20190131.014926-distro-basic-smithi/3530585/remote/smithi198/log/ceph-client.0.16222.log.gz

From the MDS:

2019-01-31 15:05:47.538 7f53f12a4700  0 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 msgr2=0x56321c902800 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until injecting socket failure
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.42 send_message_client_counted client.5290 seq 3 client_caps(flush_ack ino 0x1000000002d 48 seq 0 tid 67 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11
2019-01-31 15:05:47.538 7f53eea9f700  1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002d 48 seq 0 tid 67 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321d608880 con 0x56321c907180
2019-01-31 15:05:47.538 7f53f12a4700  1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 msgr2=0x56321c902800 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 33
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker  head inode [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53f12a4700  1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 msgr2=0x56321c902800 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.cache pick_inode_snap follows 1 on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53f12a4700  1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 0x56321c902800 :-1 s=READY pgs=103 cs=0 l=0).handle_read_frame_length_and_tag read frame length and tag failed r=-1 ((1) Operation not permitted)
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker  follows 1 retains pAsLsXsFsc dirty - on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker _do_cap_update dirty - issued pAsLsXsFsc wanted - on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.locker inode is file
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval 2496 [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53f12a4700  1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 0x56321c902800 :-1 s=READY pgs=103 cs=0 l=0)._fault server, going to standby, even though i have stuff queued
2019-01-31 15:05:47.538 7f53eea9f700  7 mds.0.locker file_eval wanted= loner_wanted= other_wanted=  filelock=(ifile sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (iauth sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ilink sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ixattr sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval done
2019-01-31 15:05:47.538 7f53eea9f700 10 MDSInternalContextBase::complete: 18C_MDS_RetryMessage
2019-01-31 15:05:47.538 7f53eea9f700  7 mds.0.locker handle_client_caps  on 0x1000000002e tid 68 follows 1 op update flags 0x0
2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.42 get_session have 0x56321d608400 client.5290 v2:172.21.15.198:0/3613562614 state open
2019-01-31 15:05:47.538 7f53eea9f700  7 mds.0.locker handle_client_caps already flushed tid 68 for client.5290
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.42 send_message_client_counted client.5290 seq 4 client_caps(flush_ack ino 0x1000000002e 49 seq 0 tid 68 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11
2019-01-31 15:05:47.538 7f53eea9f700  1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002e 49 seq 0 tid 68 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321d608d00 con 0x56321c907180
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker  head inode [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.cache pick_inode_snap follows 1 on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker  follows 1 retains pAsLsXsFsc dirty - on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker _do_cap_update dirty - issued pAsLsXsFsc wanted - on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.locker inode is file
2019-01-31 15:05:47.538 7f53f1aa5700  1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >>  conn(0x56321c905b00 0x56321c900a00 :-1 s=NONE pgs=0 cs=0 l=0).accept
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval 2496 [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700  7 mds.0.locker file_eval wanted= loner_wanted= other_wanted=  filelock=(ifile sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (iauth sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ilink sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ixattr sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval done
2019-01-31 15:05:47.538 7f53eea9f700 10 MDSInternalContextBase::complete: 18C_MDS_RetryMessage
2019-01-31 15:05:47.538 7f53eea9f700  7 mds.0.locker handle_client_caps  on 0x1000000002f tid 69 follows 1 op update flags 0x0
2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.42 get_session have 0x56321d608400 client.5290 v2:172.21.15.198:0/3613562614 state open
2019-01-31 15:05:47.538 7f53eea9f700  7 mds.0.locker handle_client_caps already flushed tid 69 for client.5290
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.42 send_message_client_counted client.5290 seq 5 client_caps(flush_ack ino 0x1000000002f 50 seq 0 tid 69 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11
2019-01-31 15:05:47.538 7f53f12a4700  1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >>  conn(0x56321c905b00 0x56321c900a00 :-1 s=ACCEPTING pgs=0 cs=0 l=0)._banner_exchange_handle_peer_banner peer_type=8 supported=0 required=0
2019-01-31 15:05:47.538 7f53eea9f700  1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002f 50 seq 0 tid 69 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321d609180 con 0x56321c907180
2019-01-31 15:05:47.538 7f53f1aa5700  1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] <== osd.3 v2:172.21.15.198:6826/13492 16 ==== osd_op_reply(26 100.00000000 [omap-get-header,omap-get-vals,getxattr] v0'0 uv1 ondisk = 0) v8 ==== 240+0+5089 (1026197720 0 1847883836) 0x56321d625500 con 0x56321d606d80
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker  head inode [inode 0x1000000002f [c,head] /client.0/tmp/102/18 auth v40 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.827300 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8a00]
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.cache pick_inode_snap follows 1 on [inode 0x1000000002f [c,head] /client.0/tmp/102/18 auth v40 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.827300 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8a00]
2019-01-31 15:05:47.538 7f53f12a4700  1 verify authorizer, authorizer_data.length()=0
2019-01-31 15:05:47.538 7f53e8a93700 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker  follows 1 retains pAsLsXsFsc dirty - on [inode 0x1000000002f [c,head] /client.0/tmp/102/18 auth v40 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.827300 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8a00]
2019-01-31 15:05:47.538 7f53f12a4700  0 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >>  conn(0x56321c905b00 0x56321c900a00 :-1 s=ACCEPTING pgs=0 cs=0 l=0).handle_cephx_auth got bad authorizer, auth_reply_len=0

Log: /ceph/teuthology-archive/pdonnell-2019-01-31_14:25:32-fs-wip-pdonnell-testing-20190131.014926-distro-basic-smithi/3530585/remote/smithi198/log/ceph-mds.a.log.gz

Bad authorizer seems unexpected in the last line of the MDS log. Note that the MDS injected a socket failure to trigger the connection reconnect.


Related issues 1 (0 open1 closed)

Related to Messengers - Bug #38118: msgr2: race connection results in hangResolved01/30/2019

Actions
Actions #1

Updated by Patrick Donnelly about 5 years ago

What makes this stranger is the connection mysteriously reconnects. I don't see how in the logs. Additionally, the client doesn't see a few dozen of the messages the MDS sent between.

First message the MDS sends:

2019-01-31 15:05:47.538 7f53eea9f700  1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002c 47 seq 0 tid 66 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321c906d00 con 0x56321c907180

and is received:

2019-01-31 15:05:47.538 7f963effd700  1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 4 ==== client_caps(flush_ack ino 0x1000000002c 47 seq 0 tid 66 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (1811838955 0 0) 0x7f964800dde0 con 0x7f9644003730

Then there are several other client_caps messages sent. And then the MDS sends:

2019-01-31 15:05:47.554 7f53ea296700 10 mds.0.42 send_message_client client.5290 v2:172.21.15.198:0/3613562614 client_reply(???:174 = 0 (0) Success unsafe) v1
2019-01-31 15:05:47.538 7f963effd700 10 client.5290 put_inode deleting 0x1000000002c.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=- objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40)
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_bad_auth authentication failed error code=1 error message=Bad Authorizer
2019-01-31 15:05:47.538 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=START_CONNECT pgs=13 cs=1 l=0)._fault waiting 0.200000
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_done authentication done, flags=0
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_session_reset received session reset
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).reset_session
2019-01-31 15:05:47.742 7f963effd700  0 client.5290 ms_handle_remote_reset on v2:172.21.15.198:6834/560627122
2019-01-31 15:05:47.742 7f963effd700  1 -- v1:172.21.15.198:0/3613562614 --> [v2:172.21.15.198:3300/0,v1:172.21.15.198:6789/0] -- mon_subscribe({osdmap=32}) v3 -- 0x7f965000b110 con 0x7f96440134a0
2019-01-31 15:05:47.742 7f963effd700  1 client.5290 reset from mds we were open; mark session as stale
2019-01-31 15:05:47.742 7f9658541700  1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=READY pgs=17 cs=0 l=0).ready entity=mds.0 cookie=7749724510b65b3 in_seq=0 out_seq=0
2019-01-31 15:05:47.742 7f963effd700 10 client.5290 ms_handle_connect on v2:172.21.15.198:6834/560627122
2019-01-31 15:05:47.742 7f963effd700  1 -- v1:172.21.15.198:0/3613562614 <== mon.1 v2:172.21.15.198:3300/0 12 ==== osd_map(32..33 src has 1..33) v4 ==== 622+0+0 (1569331327 0 0) 0x7f964c00cef0 con 0x7f96440134a0
2019-01-31 15:05:48.250 7f963ffff700  1 -- v1:172.21.15.198:0/3613562614 --> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] -- client_cap_release(1) v2 -- 0x7f9630016e80 con 0x7f9644003730
2019-01-31 15:05:48.250 7f963ffff700 20 client.5290 trim_cache size 4 max 16384
2019-01-31 15:05:49.250 7f963ffff700 20 client.5290 trim_cache size 4 max 16384
2019-01-31 15:05:49.590 7f963effd700  1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 1 ==== client_caps(grant ino 0x1000000003f 67 seq 2 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2019-01-31 15:04:19.875612) v11 ==== 252+0+0 (512396912 0 0) 0x7f964800dde0 con 0x7f9644003730
2019-01-31 15:05:49.590 7f963effd700  5 client.5290 set_cap_epoch_barrier epoch = 33
2019-01-31 15:05:49.590 7f963effd700 10 client.5290  mds.0 seq now 3
2019-01-31 15:05:49.590 7f963effd700  5 client.5290 handle_caps don't have vino 0x1000000003f.head, dropping
2019-01-31 15:05:49.594 7f963effd700  1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 2 ==== client_reply(???:174 = 0 (0) Success safe) v1 ==== 27+0+0 (1943119121 0 0) 0x7f964800dde0 con 0x7f9644003730

After the mysterious reconnection, the client receives a "grant ino" message. I don't see where the MDS even sent that. Then the client receives the safe reply. All of the other cap_flush/cap_revokes are lost.

Actions #2

Updated by Sage Weil about 5 years ago

  • Related to Bug #38118: msgr2: race connection results in hang added
Actions #3

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to CephFS

Hopefully fixed by that related bug? Otherwise move it back into the msgr queue.

Actions #4

Updated by Patrick Donnelly about 5 years ago

  • Status changed from New to Resolved
  • Assignee set to Sage Weil
Actions

Also available in: Atom PDF