Project

General

Profile

Actions

Bug #8379

closed

cephx: clock skew causes verify_reply decode error

Added by jian li almost 10 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
cephx
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

linux: 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
4 osd and 1 mon
ceph version: firefly
the following is cepf.conf
[global]
fsid = 49a9d6de-0583-4941-b6ee-7007482ddbcf
mon_initial_members = estack4-HP-Z220-SFF-Workstation
mon_host = 192.168.12.5
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

the details error message after run cli rados bench p rbd 10 write
Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects
Object prefix: benchmark_data_estack3-HP-Z220-SFF-Workstati_16356
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 02014-05-17 10:46:54.492303 7f163c3a9700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2014-05-17 10:46:54.492310 7f163c3a9700 0 -
192.168.12.4:0/1016356 >> 192.168.12.4:6804/15397 pipe(0xce5fd0 sd=4 :48528 s=1 pgs=0 cs=0 l=1 c=0xce6240).failed verifying authorize reply
2014-05-17 10:46:54.492351 7f163c3a9700 0 -- 192.168.12.4:0/1016356 >> 192.168.12.4:6804/15397 pipe(0xce5fd0 sd=4 :48528 s=1 pgs=0 cs=0 l=1 c=0xce6240).fault
2014-05-17 10:46:54.492700 7f163c3a9700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2014-05-17 10:46:54.492715 7f163c3a9700 0 -- 192.168.12.4:0/1016356 >> 192.168.12.4:6804/15397 pipe(0xce5fd0 sd=4 :48529 s=1 pgs=0 cs=0 l=1 c=0xce6240).failed verifying authorize reply
0 0 - 0
2014-05-17 10:46:54.518700 7f163c2a8700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2014-05-17 10:46:54.518701 7f163c3a9700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2014-05-17 10:46:54.518707 7f163c2a8700 0 -- 192.168.12.4:0/1016356 >> 192.168.12.3:6800/15005 pipe(0xce7200 sd=5 :51696 s=1 pgs=0 cs=0 l=1 c=0xce7470).failed verifying authorize reply
2014-05-17 10:46:54.518710 7f163c3a9700 0 -- 192.168.12.4:0/1016356 >> 192.168.12.4:6804/15397 pipe(0xce5fd0 sd=4 :48533 s=1 pgs=0 cs=0 l=1 c=0xce6240).failed verifying authorize reply
2014-05-17 10:46:54.518729 7f163c2a8700 0 -- 192.168.12.4:0/1016356 >> 192.168.12.3:6800/15005 pipe(0xce7200 sd=5 :51696 s=1 pgs=0 cs=0 l=1 c=0xce7470).fault

estack@estack2-HP-Z220-SFF-Workstation:/var/log/ceph$ ceph health
2014-05-17 11:48:39.381253 7f51376bf700 1 -- :/0 messenger.start
2014-05-17 11:48:39.381734 7f51376bf700 1 -- :/1016928 --> 192.168.12.5:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f5130024a30 con 0x7f51300245e0
2014-05-17 11:48:39.382311 7f5134748700 1 -- 192.168.12.3:0/1016928 learned my addr 192.168.12.3:0/1016928
2014-05-17 11:48:39.383252 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 1 ==== mon_map v1 ==== 221+0+0 (149619065 0 0) 0x7f511c000ad0 con 0x7f51300245e0
2014-05-17 11:48:39.383351 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (4013688572 0 0) 0x7f511c000fc0 con 0x7f51300245e0
2014-05-17 11:48:39.383526 7f513574a700 1 -- 192.168.12.3:0/1016928 --> 192.168.12.5:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7f5120001330 con 0x7f51300245e0
2014-05-17 11:48:39.384319 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (2213948709 0 0) 0x7f511c000fc0 con 0x7f51300245e0
2014-05-17 11:48:39.384482 7f513574a700 1 -- 192.168.12.3:0/1016928 --> 192.168.12.5:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7f5120001e30 con 0x7f51300245e0
2014-05-17 11:48:39.385188 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 393+0+0 (2549800467 0 0) 0x7f511c000fc0 con 0x7f51300245e0
2014-05-17 11:48:39.385310 7f513574a700 1 -- 192.168.12.3:0/1016928 --> 192.168.12.5:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f5130024c40 con 0x7f51300245e0
2014-05-17 11:48:39.385468 7f51376bf700 1 -- 192.168.12.3:0/1016928 --> 192.168.12.5:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7f5130025a80 con 0x7f51300245e0
2014-05-17 11:48:39.385483 7f51376bf700 1 -- 192.168.12.3:0/1016928 --> 192.168.12.5:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7f51300210d0 con 0x7f51300245e0
2014-05-17 11:48:39.386057 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 5 ==== mon_map v1 ==== 221+0+0 (149619065 0 0) 0x7f511c0012b0 con 0x7f51300245e0
2014-05-17 11:48:39.386139 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (187160225 0 0) 0x7f511c001550 con 0x7f51300245e0
2014-05-17 11:48:39.386216 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 7 ==== osd_map(148..148 src has 1..148) v3 ==== 4080+0+0 (2292252469 0 0) 0x7f511c001230 con 0x7f51300245e0
2014-05-17 11:48:39.386359 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (187160225 0 0) 0x7f511c0027f0 con 0x7f51300245e0
2014-05-17 11:48:39.386380 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 9 ==== osd_map(148..148 src has 1..148) v3 ==== 4080+0+0 (2292252469 0 0) 0x7f511c0039a0 con 0x7f51300245e0
2014-05-17 11:48:39.386392 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 10 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (187160225 0 0) 0x7f511c003bb0 con 0x7f51300245e0
2014-05-17 11:48:39.387767 7f51376bf700 1 -- 192.168.12.3:0/1016928 --> 192.168.12.5:6789/0 -- mon_command({"prefix": "get_command_descriptions"} v 0) v1 -- ?+0 0x7f5130016570 con 0x7f51300245e0
2014-05-17 11:48:39.391012 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 11 ==== mon_command_ack([{"prefix": "get_command_descriptions"}]=0 v0) v1 ==== 72+0+29582 (1092875540 0 1116028220) 0x7f511c001bc0 con 0x7f51300245e0
2014-05-17 11:48:39.445616 7f51376bf700 1 -- 192.168.12.3:0/1016928 --> 192.168.12.5:6789/0 -- mon_command({"prefix": "health"} v 0) v1 -- ?+0 0x7f5130024c40 con 0x7f51300245e0
2014-05-17 11:48:39.446479 7f513574a700 1 -- 192.168.12.3:0/1016928 <== mon.0 192.168.12.5:6789/0 12 ==== mon_command_ack([{"prefix": "health"}]=0 v0) v1 ==== 54+0+10 (2555570491 0 3610226052) 0x7f511c000fc0 con 0x7f51300245e0
HEALTH_OK
2014-05-17 11:48:39.449378 7f51376bf700 1 -- 192.168.12.3:0/1016928 mark_down 0x7f51300245e0 -- 0x7f5130024370
2014-05-17 11:48:39.449439 7f51376bf700 1 -- 192.168.12.3:0/1016928 mark_down_all
2014-05-17 11:48:39.449555 7f51376bf700 1 -- 192.168.12.3:0/1016928 shutdown complete.

it works again after I restart the OSD, But it does not work again when I do not use any ceph operations for a period of time? i need to restart every osd again then it works

please help to look at that issue, thanks!


Files

ceph_log.zip (3.52 MB) ceph_log.zip jian li, 05/20/2014 05:58 AM
Actions

Also available in: Atom PDF