Bug #38128
closedmsgr: unexpected "handle_cephx_auth got bad authorizer, auth_reply_len=0"
0%
Description
From the client:
2019-01-31 15:05:47.538 7f963effd700 1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 4 ==== client_caps(flush_ack ino 0x1000000002c 47 seq 0 tid 66 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (1811838955 0 0) 0x7f964800dde0 con 0x7f9644003730 2019-01-31 15:05:47.538 7f9658541700 1 -- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 msgr2=0x7f96440026b0 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 20 2019-01-31 15:05:47.538 7f963effd700 10 client.5290 mds.0 seq now 2 2019-01-31 15:05:47.538 7f9658541700 1 -- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 msgr2=0x7f96440026b0 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=READY pgs=13 cs=0 l=0).handle_read_frame_length_and_tag read frame length and tag failed r=-1 ((1) Operation not permitted) 2019-01-31 15:05:47.538 7f963effd700 5 client.5290 handle_cap_flush_ack mds.0 cleaned Fw on 0x1000000002c.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) flushing_caps=Fw objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) with Fw 2019-01-31 15:05:47.538 7f963effd700 5 client.5290 flushing_caps Fw -> - 2019-01-31 15:05:47.538 7f963effd700 10 client.5290 0x1000000002c.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) !flushing 2019-01-31 15:05:47.538 7f963effd700 10 client.5290 put_inode on 0x1000000002c.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) 2019-01-31 15:05:47.538 7f963effd700 15 inode.put on 0x7f9644032d40 0x1000000002c.head now 0 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=READY pgs=13 cs=0 l=0)._fault initiating reconnect 2019-01-31 15:05:47.538 7f963effd700 10 client.5290 remove_cap mds.0 on 0x1000000002c.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) 2019-01-31 15:05:47.538 7f963effd700 15 client.5290 remove_cap last one, closing snaprealm 0x7f9644001cd0 2019-01-31 15:05:47.538 7f963effd700 20 client.5290 put_snap_realm 0x1 0x7f9644001cd0 21 -> 20 2019-01-31 15:05:47.538 7f963effd700 10 client.5290 put_inode deleting 0x1000000002c.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=- objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_bad_auth authentication failed error code=1 error message=Bad Authorizer 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=START_CONNECT pgs=13 cs=1 l=0)._fault waiting 0.200000 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_done authentication done, flags=0 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_session_reset received session reset 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).reset_session
Log: /ceph/teuthology-archive/pdonnell-2019-01-31_14:25:32-fs-wip-pdonnell-testing-20190131.014926-distro-basic-smithi/3530585/remote/smithi198/log/ceph-client.0.16222.log.gz
From the MDS:
2019-01-31 15:05:47.538 7f53f12a4700 0 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 msgr2=0x56321c902800 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until injecting socket failure 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.42 send_message_client_counted client.5290 seq 3 client_caps(flush_ack ino 0x1000000002d 48 seq 0 tid 67 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 2019-01-31 15:05:47.538 7f53eea9f700 1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002d 48 seq 0 tid 67 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321d608880 con 0x56321c907180 2019-01-31 15:05:47.538 7f53f12a4700 1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 msgr2=0x56321c902800 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 33 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker head inode [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53f12a4700 1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 msgr2=0x56321c902800 :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.cache pick_inode_snap follows 1 on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53f12a4700 1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 0x56321c902800 :-1 s=READY pgs=103 cs=0 l=0).handle_read_frame_length_and_tag read frame length and tag failed r=-1 ((1) Operation not permitted) 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker follows 1 retains pAsLsXsFsc dirty - on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker _do_cap_update dirty - issued pAsLsXsFsc wanted - on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.locker inode is file 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval 2496 [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53f12a4700 1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> v2:172.21.15.198:0/3613562614 conn(0x56321c907180 0x56321c902800 :-1 s=READY pgs=103 cs=0 l=0)._fault server, going to standby, even though i have stuff queued 2019-01-31 15:05:47.538 7f53eea9f700 7 mds.0.locker file_eval wanted= loner_wanted= other_wanted= filelock=(ifile sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (iauth sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ilink sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ixattr sync) on [inode 0x1000000002d [c,head] /client.0/tmp/102/16 auth v36 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.820790 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f7c00] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval done 2019-01-31 15:05:47.538 7f53eea9f700 10 MDSInternalContextBase::complete: 18C_MDS_RetryMessage 2019-01-31 15:05:47.538 7f53eea9f700 7 mds.0.locker handle_client_caps on 0x1000000002e tid 68 follows 1 op update flags 0x0 2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.42 get_session have 0x56321d608400 client.5290 v2:172.21.15.198:0/3613562614 state open 2019-01-31 15:05:47.538 7f53eea9f700 7 mds.0.locker handle_client_caps already flushed tid 68 for client.5290 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.42 send_message_client_counted client.5290 seq 4 client_caps(flush_ack ino 0x1000000002e 49 seq 0 tid 68 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 2019-01-31 15:05:47.538 7f53eea9f700 1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002e 49 seq 0 tid 68 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321d608d00 con 0x56321c907180 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker head inode [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.cache pick_inode_snap follows 1 on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker follows 1 retains pAsLsXsFsc dirty - on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker _do_cap_update dirty - issued pAsLsXsFsc wanted - on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.locker inode is file 2019-01-31 15:05:47.538 7f53f1aa5700 1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> conn(0x56321c905b00 0x56321c900a00 :-1 s=NONE pgs=0 cs=0 l=0).accept 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval 2496 [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 7 mds.0.locker file_eval wanted= loner_wanted= other_wanted= filelock=(ifile sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (iauth sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ilink sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker simple_eval (ixattr sync) on [inode 0x1000000002e [c,head] /client.0/tmp/102/17 auth v38 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.824211 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8300] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker eval done 2019-01-31 15:05:47.538 7f53eea9f700 10 MDSInternalContextBase::complete: 18C_MDS_RetryMessage 2019-01-31 15:05:47.538 7f53eea9f700 7 mds.0.locker handle_client_caps on 0x1000000002f tid 69 follows 1 op update flags 0x0 2019-01-31 15:05:47.538 7f53eea9f700 20 mds.0.42 get_session have 0x56321d608400 client.5290 v2:172.21.15.198:0/3613562614 state open 2019-01-31 15:05:47.538 7f53eea9f700 7 mds.0.locker handle_client_caps already flushed tid 69 for client.5290 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.42 send_message_client_counted client.5290 seq 5 client_caps(flush_ack ino 0x1000000002f 50 seq 0 tid 69 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 2019-01-31 15:05:47.538 7f53f12a4700 1 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> conn(0x56321c905b00 0x56321c900a00 :-1 s=ACCEPTING pgs=0 cs=0 l=0)._banner_exchange_handle_peer_banner peer_type=8 supported=0 required=0 2019-01-31 15:05:47.538 7f53eea9f700 1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002f 50 seq 0 tid 69 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321d609180 con 0x56321c907180 2019-01-31 15:05:47.538 7f53f1aa5700 1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] <== osd.3 v2:172.21.15.198:6826/13492 16 ==== osd_op_reply(26 100.00000000 [omap-get-header,omap-get-vals,getxattr] v0'0 uv1 ondisk = 0) v8 ==== 240+0+5089 (1026197720 0 1847883836) 0x56321d625500 con 0x56321d606d80 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker head inode [inode 0x1000000002f [c,head] /client.0/tmp/102/18 auth v40 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.827300 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8a00] 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.cache pick_inode_snap follows 1 on [inode 0x1000000002f [c,head] /client.0/tmp/102/18 auth v40 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.827300 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8a00] 2019-01-31 15:05:47.538 7f53f12a4700 1 verify authorizer, authorizer_data.length()=0 2019-01-31 15:05:47.538 7f53e8a93700 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched 2019-01-31 15:05:47.538 7f53eea9f700 10 mds.0.locker follows 1 retains pAsLsXsFsc dirty - on [inode 0x1000000002f [c,head] /client.0/tmp/102/18 auth v40 dirtyparent s=51 n(v0 rc2019-01-31 15:04:19.827300 b51 1=1+0) (iversion lock) caps={5290=pAsLsXsFsc/-@0} | caps=1 dirtyparent=1 dirty=1 0x56321d6f8a00] 2019-01-31 15:05:47.538 7f53f12a4700 0 --2- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] >> conn(0x56321c905b00 0x56321c900a00 :-1 s=ACCEPTING pgs=0 cs=0 l=0).handle_cephx_auth got bad authorizer, auth_reply_len=0
Log: /ceph/teuthology-archive/pdonnell-2019-01-31_14:25:32-fs-wip-pdonnell-testing-20190131.014926-distro-basic-smithi/3530585/remote/smithi198/log/ceph-mds.a.log.gz
Bad authorizer seems unexpected in the last line of the MDS log. Note that the MDS injected a socket failure to trigger the connection reconnect.
Updated by Patrick Donnelly about 5 years ago
What makes this stranger is the connection mysteriously reconnects. I don't see how in the logs. Additionally, the client doesn't see a few dozen of the messages the MDS sent between.
First message the MDS sends:
2019-01-31 15:05:47.538 7f53eea9f700 1 -- [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] --> v2:172.21.15.198:0/3613562614 -- client_caps(flush_ack ino 0x1000000002c 47 seq 0 tid 66 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 -- 0x56321c906d00 con 0x56321c907180
and is received:
2019-01-31 15:05:47.538 7f963effd700 1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 4 ==== client_caps(flush_ack ino 0x1000000002c 47 seq 0 tid 66 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (1811838955 0 0) 0x7f964800dde0 con 0x7f9644003730
Then there are several other client_caps messages sent. And then the MDS sends:
2019-01-31 15:05:47.554 7f53ea296700 10 mds.0.42 send_message_client client.5290 v2:172.21.15.198:0/3613562614 client_reply(???:174 = 0 (0) Success unsafe) v1
2019-01-31 15:05:47.538 7f963effd700 10 client.5290 put_inode deleting 0x1000000002c.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=51/4194304 nlink=1 btime=2019-01-31 15:04:19.814908 mtime=2019-01-31 15:04:19.817580 ctime=2019-01-31 15:04:19.817580 caps=- objectset[0x1000000002c ts 0/0 objects 1 dirty_or_tx 0] 0x7f9644032d40) 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_bad_auth authentication failed error code=1 error message=Bad Authorizer 2019-01-31 15:05:47.538 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=START_CONNECT pgs=13 cs=1 l=0)._fault waiting 0.200000 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0)._banner_exchange_handle_peer_banner peer_type=2 supported=0 required=0 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_auth_done authentication done, flags=0 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).handle_session_reset received session reset 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=CONNECTING pgs=13 cs=1 l=0).reset_session 2019-01-31 15:05:47.742 7f963effd700 0 client.5290 ms_handle_remote_reset on v2:172.21.15.198:6834/560627122 2019-01-31 15:05:47.742 7f963effd700 1 -- v1:172.21.15.198:0/3613562614 --> [v2:172.21.15.198:3300/0,v1:172.21.15.198:6789/0] -- mon_subscribe({osdmap=32}) v3 -- 0x7f965000b110 con 0x7f96440134a0 2019-01-31 15:05:47.742 7f963effd700 1 client.5290 reset from mds we were open; mark session as stale 2019-01-31 15:05:47.742 7f9658541700 1 --2- v1:172.21.15.198:0/3613562614 >> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] conn(0x7f9644003730 0x7f96440026b0 :-1 s=READY pgs=17 cs=0 l=0).ready entity=mds.0 cookie=7749724510b65b3 in_seq=0 out_seq=0 2019-01-31 15:05:47.742 7f963effd700 10 client.5290 ms_handle_connect on v2:172.21.15.198:6834/560627122 2019-01-31 15:05:47.742 7f963effd700 1 -- v1:172.21.15.198:0/3613562614 <== mon.1 v2:172.21.15.198:3300/0 12 ==== osd_map(32..33 src has 1..33) v4 ==== 622+0+0 (1569331327 0 0) 0x7f964c00cef0 con 0x7f96440134a0 2019-01-31 15:05:48.250 7f963ffff700 1 -- v1:172.21.15.198:0/3613562614 --> [v2:172.21.15.198:6834/560627122,v1:172.21.15.198:6835/560627122] -- client_cap_release(1) v2 -- 0x7f9630016e80 con 0x7f9644003730 2019-01-31 15:05:48.250 7f963ffff700 20 client.5290 trim_cache size 4 max 16384 2019-01-31 15:05:49.250 7f963ffff700 20 client.5290 trim_cache size 4 max 16384 2019-01-31 15:05:49.590 7f963effd700 1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 1 ==== client_caps(grant ino 0x1000000003f 67 seq 2 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2019-01-31 15:04:19.875612) v11 ==== 252+0+0 (512396912 0 0) 0x7f964800dde0 con 0x7f9644003730 2019-01-31 15:05:49.590 7f963effd700 5 client.5290 set_cap_epoch_barrier epoch = 33 2019-01-31 15:05:49.590 7f963effd700 10 client.5290 mds.0 seq now 3 2019-01-31 15:05:49.590 7f963effd700 5 client.5290 handle_caps don't have vino 0x1000000003f.head, dropping 2019-01-31 15:05:49.594 7f963effd700 1 -- v1:172.21.15.198:0/3613562614 <== mds.0 v2:172.21.15.198:6834/560627122 2 ==== client_reply(???:174 = 0 (0) Success safe) v1 ==== 27+0+0 (1943119121 0 0) 0x7f964800dde0 con 0x7f9644003730
After the mysterious reconnection, the client receives a "grant ino" message. I don't see where the MDS even sent that. Then the client receives the safe reply. All of the other cap_flush/cap_revokes are lost.
Updated by Sage Weil about 5 years ago
- Related to Bug #38118: msgr2: race connection results in hang added
Updated by Greg Farnum about 5 years ago
- Project changed from Ceph to CephFS
Hopefully fixed by that related bug? Otherwise move it back into the msgr queue.
Updated by Patrick Donnelly about 5 years ago
- Status changed from New to Resolved
- Assignee set to Sage Weil