Project

General

Profile

Actions

Bug #52621

closed

cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt

Added by Neha Ojha over 2 years ago. Updated almost 2 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

failure_reason: '"2021-09-12T05:50:52.189786+0000 osd.0 (osd.0) 1 : cluster [WRN]
  Monitor daemon marked osd.0 down, but it is still running" in cluster log'
2021-09-12T06:21:03.558 INFO:tasks.ceph.ceph_manager.ceph:PG 1.5 is not active+clean
2021-09-12T06:21:03.558 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '1.5', 'version': "0'0", 'reported_seq': 29, 'reported_epoch': 38, 'state': 'peering', 'last_fresh': '2021-09-12T06:00:54.056552+0000', 'last_change': '2021-09-12T06:00:54.056552+0000', 'last_active': '2021-09-12T05:50:24.316802+0000', 'last_peered': '2021-09-12T05:48:29.487726+0000', 'last_clean': '2021-09-12T05:48:29.487726+0000', 'last_became_active': '2021-09-12T05:48:29.487452+0000', 'last_became_peered': '2021-09-12T05:48:29.487452+0000', 'last_unstale': '2021-09-12T06:00:54.056552+0000', 'last_undegraded': '2021-09-12T06:00:54.056552+0000', 'last_fullsized': '2021-09-12T06:00:54.056552+0000', 'mapping_epoch': 38, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 10, 'last_epoch_clean': 11, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'last_clean_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'log_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [2, 0], 'acting': [2, 0], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [0], 'up_primary': 2, 'acting_primary': 2, 'purged_snaps': []}
2021-09-12T06:21:03.558 INFO:tasks.ceph.ceph_manager.ceph:PG 1.2 is not active+clean
2021-09-12T06:21:03.559 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '1.2', 'version': "0'0", 'reported_seq': 12, 'reported_epoch': 35, 'state': 'active+undersized', 'last_fresh': '2021-09-12T05:50:52.253386+0000', 'last_change': '2021-09-12T05:50:49.935156+0000', 'last_active': '2021-09-12T05:50:52.253386+0000', 'last_peered': '2021-09-12T05:50:52.253386+0000', 'last_clean': '2021-09-12T05:48:28.464377+0000', 'last_became_active': '2021-09-12T05:50:49.935156+0000', 'last_became_peered': '2021-09-12T05:50:49.935156+0000', 'last_unstale': '2021-09-12T05:50:52.253386+0000', 'last_undegraded': '2021-09-12T05:50:52.253386+0000', 'last_fullsized': '2021-09-12T05:50:49.933578+0000', 'mapping_epoch': 32, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 10, 'last_epoch_clean': 11, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'last_clean_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'log_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [1], 'acting': [1], 'avail_no_missing': ['1'], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 1, 'acting_primary': 1, 'purged_snaps': []}
2021-09-12T06:21:03.559 INFO:tasks.ceph.ceph_manager.ceph:PG 1.1 is not active+clean
2021-09-12T06:21:03.559 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '1.1', 'version': "0'0", 'reported_seq': 29, 'reported_epoch': 38, 'state': 'peering', 'last_fresh': '2021-09-12T06:00:54.056613+0000', 'last_change': '2021-09-12T06:00:54.056613+0000', 'last_active': '2021-09-12T05:50:24.316836+0000', 'last_peered': '2021-09-12T05:48:29.487783+0000', 'last_clean': '2021-09-12T05:48:29.487783+0000', 'last_became_active': '2021-09-12T05:48:29.487463+0000', 'last_became_peered': '2021-09-12T05:48:29.487463+0000', 'last_unstale': '2021-09-12T06:00:54.056613+0000', 'last_undegraded': '2021-09-12T06:00:54.056613+0000', 'last_fullsized': '2021-09-12T06:00:54.056613+0000', 'mapping_epoch': 38, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 10, 'last_epoch_clean': 11, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'last_clean_scrub_stamp': '2021-09-12T05:48:28.464377+0000', 'log_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [2, 0], 'acting': [2, 0], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [0], 'up_primary': 2, 'acting_primary': 2, 'purged_snaps': []}
2021-09-12T06:21:03.560 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b135a4f82b02b966053efba0501e6a47c9d2d793/teuthology/contextutil.py", line 33, in nested
    yield vars
  File "/home/teuthworker/src/github.com_ceph_ceph_9989418f7d3c0967385a5f8693f1cf837d4449ed/qa/tasks/ceph.py", line 1904, in task
    ctx.managers[config['cluster']].wait_for_clean()
  File "/home/teuthworker/src/github.com_ceph_ceph_9989418f7d3c0967385a5f8693f1cf837d4449ed/qa/tasks/ceph_manager.py", line 2706, in wait_for_clean
    'wait_for_clean: failed before timeout expired'

After the osd was restarted

2021-09-12T05:50:13.697+0000 7fdaad830080  0 ceph version 16.2.5-562-g9989418f (9989418f7d3c0967385a5f8693f1cf837d4449ed) pacific (stable), process ceph-osd, pid 30517

we start seeing "could not decrypt ticket info" messages like this

2021-09-14T19:04:25.138+0000 7f3eecf73700  0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 3620733988050280416 != 18374858748799134293
2021-09-14T19:04:25.138+0000 7f3eecf73700  1 --2- [v2:172.21.15.46:6800/30574,v1:172.21.15.46:6801/30574] >>  conn(0x5590b5140400 0x5590b5107700 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0)._auth_bad_method auth_method 2 r (13) Permission denied, allowed_methods [2], allowed_modes [1,2]
2021-09-14T19:04:25.138+0000 7f3eecf73700  1 -- [v2:172.21.15.46:6800/30574,v1:172.21.15.46:6801/30574] >>  conn(0x5590b5140400 msgr2=0x5590b5107700 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 53
2021-09-14T19:04:25.138+0000 7f3eecf73700  1 -- [v2:172.21.15.46:6800/30574,v1:172.21.15.46:6801/30574] >>  conn(0x5590b5140400 msgr2=0x5590b5107700 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2021-09-14T19:04:25.138+0000 7f3eecf73700  1 --2- [v2:172.21.15.46:6800/30574,v1:172.21.15.46:6801/30574] >>  conn(0x5590b5140400 0x5590b5107700 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)

followed by

2021-09-12T05:50:44.167+0000 7fda9e170700 -1 osd.0 31 heartbeat_check: no reply from 172.21.15.89:6814 osd.1 ever on either front or back, first ping sent 2021-09-12T05:50:24.060428+0000 (oldest deadline 2021-09-12T05:50:44.060428+0000)

/a/teuthology-2021-09-12_03:31:01-rados-pacific-distro-basic-smithi/6385742


Related issues 1 (1 open0 closed)

Related to mgr - Bug #51815: ceph pg dump times out due to no active mgr Pending BackportSage Weil

Actions
Actions #1

Updated by Neha Ojha over 2 years ago

I reran the test 10 times in https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-smithi/ - but don't see the same exact symptoms.

https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-smithi/6390071/ is particularly interesting because here we see similar "ceph_decode_ticket could not decrypt ticket info. error:bad magic in decode_decrypt" errors in the mon logs, after the mgr is restarted, leading to "ceph pg dump" timing out. We need to figure out whether these errors are just a red herring or are responsible for these issues.

2021-09-14T19:02:41.885+0000 7f2c202c0700 10 cephx server mgr.x: handle_request get_auth_session_key for mgr.x

2021-09-14T19:02:41.885+0000 7f2c202c0700 20 cephx server mgr.x:  checking key: req.key=e720f3934ea13e18 expected_key=e720f3934ea13e18

2021-09-14T19:02:41.885+0000 7f2c202c0700 20 cephx server mgr.x:  checking old_ticket: secret_id=2 len=96, old_ticket_may_be_omitted=0

2021-09-14T19:02:41.885+0000 7f2c202c0700  0 cephx: ceph_decode_ticket could not decrypt ticket info. error:bad magic in decode_decrypt, 12470617909805444553 != 18374858748799134293

2021-09-14T19:02:41.885+0000 7f2c202c0700  0 cephx server mgr.x:  attempt to reclaim global_id 10015 using bad ticket

2021-09-14T19:02:41.885+0000 7f2c202c0700  0 cephx server mgr.x:  could not verify old ticket

2021-09-14T19:02:41.885+0000 7f2c202c0700  1 --2- [v2:172.21.15.19:3300/0,v1:172.21.15.19:6789/0] >>  conn(0x55d171302400 0x55d17118c000 secure :-1 s=AUTH_ACCEPTING_MORE pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0)._auth_bad_method auth_method 2 r (13) Permission denied, allowed_methods [2], allowed_modes [2,1]

2021-09-14T19:02:41.885+0000 7f2c202c0700  1 -- [v2:172.21.15.19:3300/0,v1:172.21.15.19:6789/0] >>  conn(0x55d171302400 msgr2=0x55d17118c000 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 38

2021-09-14T19:02:41.885+0000 7f2c202c0700  1 -- [v2:172.21.15.19:3300/0,v1:172.21.15.19:6789/0] >>  conn(0x55d171302400 msgr2=0x55d17118c000 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed

2021-09-14T19:02:41.885+0000 7f2c202c0700  1 --2- [v2:172.21.15.19:3300/0,v1:172.21.15.19:6789/0] >>  conn(0x55d171302400 0x55d17118c000 secure :-1 s=AUTH_ACCEPTING_MORE pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)

2021-09-14T19:02:41.885+0000 7f2c202c0700  1 --2- [v2:172.21.15.19:3300/0,v1:172.21.15.19:6789/0] >>  conn(0x55d171302400 0x55d17118c000 secure :-1 s=AUTH_ACCEPTING_MORE pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0).stop

2021-09-14T19:02:41.885+0000 7f2c1b2b6700  1 -- [v2:172.21.15.19:3300/0,v1:172.21.15.19:6789/0] reap_dead start
Actions #2

Updated by Neha Ojha over 2 years ago

  • Subject changed from Monitor daemon marked osd down, but it is still running to cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt
  • Priority changed from Normal to High

/a/yuriw-2021-09-16_18:23:18-rados-wip-yuri2-testing-2021-09-16-0923-distro-basic-smithi/6393474

Actions #3

Updated by Neha Ojha over 2 years ago

  • Related to Bug #51815: ceph pg dump times out due to no active mgr added
Actions #4

Updated by Neha Ojha over 2 years ago

  • Backport set to pacific, octopus
ceph-mon.a.log.gz:2021-12-09T22:02:35.412+0000 7fadb9966700  0 cephx: ceph_decode_ticket could not decrypt ticket info. error:bad magic in decode_decrypt, 9013085832078629057 != 18374858748799134293
ceph-mon.b.log.gz:2021-12-09T22:02:35.412+0000 7f5515313700  0 cephx: ceph_decode_ticket could not decrypt ticket info. error:bad magic in decode_decrypt, 9013085832078629057 != 18374858748799134293
ceph-mon.c.log.gz:2021-12-09T22:02:35.412+0000 7f9f92f38700  0 cephx: ceph_decode_ticket could not decrypt ticket info. error:bad magic in decode_decrypt, 9013085832078629057 != 18374858748799134293

/a/yuriw-2021-12-09_16:11:17-rados-wip-yuri7-testing-2021-12-08-1714-octopus-distro-default-smithi/6555674/

Actions #5

Updated by Neha Ojha over 2 years ago

  • Status changed from New to Can't reproduce
Actions #6

Updated by Matan Breizman almost 2 years ago

/a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932687

Actions

Also available in: Atom PDF