Project

General

Profile

Actions

Bug #65616

open

pybind/mgr/snap_schedule: 1m scheduled snaps not reliably executed (RuntimeError: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'})

Added by Patrick Donnelly 11 days ago. Updated 3 days ago.

Status:
Triaged
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/snap_schedule
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Check timestamps:

2024-04-21T03:58:00.663+0000 7f48c60ef700 10 client.15615 send_request client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-03_58_00_UTC 2024-04-21T03:58:00.664205+0000 caller_uid=0, caller_gid=0{}) to mds.1
2024-04-21T03:58:00.663+0000 7f48c60ef700  1 -- 172.21.15.80:0/477720029 --> [v2:172.21.15.121:6836/2901487594,v1:172.21.15.121:6837/2901487594] -- client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-03_58_00_UTC 2024-04-21T03:58:00.664205+0000 caller_uid=0, caller_gid=0{}) -- 0x564333896000 con 0x5643330f5a80
2024-04-21T03:58:00.667+0000 7f48c60ef700 10 client.15615 send_request client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-03_58_00_UTC 2024-04-21T03:58:00.664205+0000 FWD=1 caller_uid=0, caller_gid=0{}) to mds.1
2024-04-21T03:58:00.667+0000 7f48c60ef700  1 -- 172.21.15.80:0/477720029 --> [v2:172.21.15.121:6836/2901487594,v1:172.21.15.121:6837/2901487594] -- client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-03_58_00_UTC 2024-04-21T03:58:00.664205+0000 FWD=1 caller_uid=0, caller_gid=0{}) -- 0x564333896380 con 0x5643330f5a80
2024-04-21T03:58:00.667+0000 7f48c60ef700 10 client.15615 send_request client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-03_58_00_UTC 2024-04-21T03:58:00.664205+0000 FWD=2 caller_uid=0, caller_gid=0{}) to mds.0
2024-04-21T03:58:00.667+0000 7f48c60ef700  1 -- 172.21.15.80:0/477720029 --> [v2:172.21.15.121:6838/3852179462,v1:172.21.15.121:6839/3852179462] -- client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-03_58_00_UTC 2024-04-21T03:58:00.664205+0000 FWD=2 caller_uid=0, caller_gid=0{}) -- 0x564333896700 con 0x5643330f5180
2024-04-21T03:59:00.741+0000 7f48bd81e700 10 client.15615 send_request client_request(unknown.0:9 mksnap #0x1//scheduled-2024-04-21-03_59_00_UTC 2024-04-21T03:59:00.742383+0000 caller_uid=0, caller_gid=0{}) to mds.0
2024-04-21T03:59:00.741+0000 7f48bd81e700  1 -- 172.21.15.80:0/477720029 --> [v2:172.21.15.121:6838/3852179462,v1:172.21.15.121:6839/3852179462] -- client_request(unknown.0:9 mksnap #0x1//scheduled-2024-04-21-03_59_00_UTC 2024-04-21T03:59:00.742383+0000 caller_uid=0, caller_gid=0{}) -- 0x564333a3aa80 con 0x5643330f5180
2024-04-21T04:05:01.828+0000 7f48c60ef700 10 client.25274 send_request client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-04_01_50_UTC 2024-04-21T04:05:01.829370+0000 caller_uid=0, caller_gid=0{}) to mds.2
2024-04-21T04:05:01.828+0000 7f48c60ef700  1 -- 172.21.15.80:0/1089533755 --> [v2:172.21.15.186:6838/4097534309,v1:172.21.15.186:6839/4097534309] -- client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-04_01_50_UTC 2024-04-21T04:05:01.829370+0000 caller_uid=0, caller_gid=0{}) -- 0x564333d78e00 con 0x5643325e7b00
2024-04-21T04:05:01.831+0000 7f48c60ef700 10 client.25274 send_request client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-04_01_50_UTC 2024-04-21T04:05:01.829370+0000 FWD=1 caller_uid=0, caller_gid=0{}) to mds.2
2024-04-21T04:05:01.831+0000 7f48c60ef700  1 -- 172.21.15.80:0/1089533755 --> [v2:172.21.15.186:6838/4097534309,v1:172.21.15.186:6839/4097534309] -- client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-04_01_50_UTC 2024-04-21T04:05:01.829370+0000 FWD=1 caller_uid=0, caller_gid=0{}) -- 0x564333d79180 con 0x5643325e7b00
2024-04-21T04:05:01.855+0000 7f48c60ef700 10 client.25274 send_request client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-04_01_50_UTC 2024-04-21T04:05:01.829370+0000 FWD=2 caller_uid=0, caller_gid=0{}) to mds.0
2024-04-21T04:05:01.855+0000 7f48c60ef700  1 -- 172.21.15.80:0/1089533755 --> [v2:172.21.15.121:6838/3852179462,v1:172.21.15.121:6839/3852179462] -- client_request(unknown.0:3 mksnap #0x1//scheduled-2024-04-21-04_01_50_UTC 2024-04-21T04:05:01.829370+0000 FWD=2 caller_uid=0, caller_gid=0{}) -- 0x564333d79500 con 0x564332f3ad80
2024-04-21T04:06:00.304+0000 7f48e1ce6700 10 client.25274 send_request client_request(unknown.0:11 mksnap #0x1//scheduled-2024-04-21-04_06_00_UTC 2024-04-21T04:06:00.305508+0000 caller_uid=0, caller_gid=0{}) to mds.0
2024-04-21T04:06:00.304+0000 7f48e1ce6700  1 -- 172.21.15.80:0/1089533755 --> [v2:172.21.15.121:6838/3852179462,v1:172.21.15.121:6839/3852179462] -- client_request(unknown.0:11 mksnap #0x1//scheduled-2024-04-21-04_06_00_UTC 2024-04-21T04:06:00.305508+0000 caller_uid=0, caller_gid=0{}) -- 0x564333c16a80 con 0x564332f3ad80
2024-04-21T04:07:00.374+0000 7f48c60ef700 10 client.25274 send_request client_request(unknown.0:20 mksnap #0x1//scheduled-2024-04-21-04_07_00_UTC 2024-04-21T04:07:00.375406+0000 caller_uid=0, caller_gid=0{}) to mds.0
2024-04-21T04:07:00.374+0000 7f48c60ef700  1 -- 172.21.15.80:0/1089533755 --> [v2:172.21.15.121:6838/3852179462,v1:172.21.15.121:6839/3852179462] -- client_request(unknown.0:20 mksnap #0x1//scheduled-2024-04-21-04_07_00_UTC 2024-04-21T04:07:00.375406+0000 caller_uid=0, caller_gid=0{}) -- 0x564333c0d500 con 0x564332f3ad80
2024-04-21T04:08:00.459+0000 7f48e1ce6700 10 client.25274 send_request client_request(unknown.0:30 mksnap #0x1//scheduled-2024-04-21-04_08_00_UTC 2024-04-21T04:08:00.460160+0000 caller_uid=0, caller_gid=0{}) to mds.0
2024-04-21T04:08:00.459+0000 7f48e1ce6700  1 -- 172.21.15.80:0/1089533755 --> [v2:172.21.15.121:6838/3852179462,v1:172.21.15.121:6839/3852179462] -- client_request(unknown.0:30 mksnap #0x1//scheduled-2024-04-21-04_08_00_UTC 2024-04-21T04:08:00.460160+0000 caller_uid=0, caller_gid=0{}) -- 0x564333f51180 con 0x564332f3ad80

From: /teuthology/pdonnell-2024-04-20_23:33:17-fs-wip-pdonnell-testing-20240420.180737-debug-distro-default-smithi/7665987/remote/smithi080/log/a63c2bca-ff91-11ee-bc93-c7b262605968/ceph-mgr.x.log.gz

Symptom:

2024-04-21T04:08:16.806 WARNING:tasks.check_counter:Counter 'mds.exported' not found on daemon mds.l
2024-04-21T04:08:16.807 WARNING:tasks.check_counter:Counter 'mds.imported' not found on daemon mds.l
2024-04-21T04:08:16.807 WARNING:tasks.check_counter:Counter 'mds_cache.dir_update' not found on daemon mds.l
2024-04-21T04:08:16.807 WARNING:tasks.check_counter:Counter 'mds_cache.dir_update_receipt' not found on daemon mds.l
2024-04-21T04:08:16.807 WARNING:tasks.check_counter:Counter 'mds.root_rsnaps' not found on daemon mds.l
2024-04-21T04:08:16.807 WARNING:tasks.check_counter:Counter 'mds_server.req_mksnap_latency.avgcount' not found on daemon mds.l
2024-04-21T04:08:16.807 WARNING:tasks.check_counter:Counter 'mds_server.req_rmsnap_latency.avgcount' not found on daemon mds.l
2024-04-21T04:08:16.807 WARNING:tasks.check_counter:Counter 'mds.dir_split' not found on daemon mds.l
2024-04-21T04:08:16.808 ERROR:teuthology.run_tasks:Manager failed: check-counter
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_9481b1d62f50e7d0a4f3dd83adf6945b08d5ff17/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/home/teuthworker/src/git.ceph.com_teuthology_9481b1d62f50e7d0a4f3dd83adf6945b08d5ff17/teuthology/task/__init__.py", line 132, in __exit__
    self.end()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_0b60fd01511511bc020e1a45638ede6ead9e38ec/qa/tasks/check_counter.py", line 125, in end
    raise RuntimeError("The following counters failed to be set " 
RuntimeError: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'}

Related issues 1 (1 open0 closed)

Related to CephFS - Fix #65617: qa: increase debugging for snap_schedulePending BackportPatrick Donnelly

Actions
Actions #1

Updated by Patrick Donnelly 11 days ago

  • Related to Fix #65617: qa: increase debugging for snap_schedule added
Actions #2

Updated by Venky Shankar 3 days ago

  • Status changed from New to Triaged
  • Assignee set to Milind Changire
Actions #3

Updated by Patrick Donnelly 3 days ago

  • Subject changed from pybind/mgr/snap_schedule: 1m scheduled snaps not reliably executed to pybind/mgr/snap_schedule: 1m scheduled snaps not reliably executed (RuntimeError: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'})
  • Description updated (diff)
Actions

Also available in: Atom PDF