Project

General

Profile

Actions

Bug #55377

closed

kclient: mds revoke Fwb caps stuck after the kclient tries writebcak once

Added by Xiubo Li about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Seen here: https://pulpito.ceph.com/vshankar-2022-04-07_05:07:33-fs-master-testing-default-smithi/6780578/

Its an upgrade test. cephadm disables standby-replay and reduced max_mds to 1 before upgrade (waits for a single active mds by checking mdsmap). The test above has 2 active MDSs, with each MDS configured with a standby-replay MDS daemon. Disabling standby-replay goes fine - MDSs transition to standby. However, after setting `max_mds = 1', one of the active MDS is stuck in `up:stopping' and never transitions to `down:stopped'. The test fails after hitting "max job timeout"

Problematic MDS log: ./remote/smithi176/log/aa4d8d7c-b63b-11ec-8c36-001a4aab830c/ceph-mds.cephfs.smithi176.ttvthc.log.gz

2022-04-07T06:41:43.092+0000 7fe6f18dc700 10 mds.cephfs.smithi176.ttvthc my gid is 24327
2022-04-07T06:41:43.092+0000 7fe6f18dc700 10 mds.cephfs.smithi176.ttvthc map says I am mds.1.8 state up:stopping
2022-04-07T06:41:43.092+0000 7fe6f18dc700 10 mds.cephfs.smithi176.ttvthc msgr says I am [v2:172.21.15.176:6824/2074966087,v1:172.21.15.176:6825/2074966087]
2022-04-07T06:41:43.092+0000 7fe6f18dc700 10 mds.cephfs.smithi176.ttvthc handle_mds_map: handling map as rank 1
2022-04-07T06:41:43.092+0000 7fe6f18dc700 10 notify_mdsmap: mds.metrics
2022-04-07T06:41:43.092+0000 7fe6f18dc700 10 notify_mdsmap: mds.metrics: rank0 is mds.cephfs.smithi116.vbikdi
2022-04-07T06:41:43.092+0000 7fe6ed0d3700  7 mds.1.8 mds has 1 queued contexts
2022-04-07T06:41:43.092+0000 7fe6ed0d3700 10 mds.1.8  finish 0x55c7560da940

The MDS seems to be waiting for some event to reach completion (maybe exporting a dir?).


Related issues 1 (0 open1 closed)

Copied from CephFS - Bug #55240: mds: stuck 2 seconds and keeps retrying to find ino from auth MDSResolvedXiubo Li

Actions
Actions

Also available in: Atom PDF