Project

General

Profile

Actions

Bug #58617

open

mds: "Failed to authpin,subtree is being exported" results in large number of blocked requests

Added by zhikuo du over 1 year ago. Updated 8 months ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A problem: the cluster(octopus 15.2.16) has large numbers of blocked requests. The error associated with the block is:

2023-01-02T15:59:10.078+0800 7f55b1734700  0 log_channel(cluster) log [WRN] : slow request 15364.865004 seconds old, received at 2023-01-02T11:43:05.214763+0800: client_request(client.450609:59264338 lookup #0x40004e86a72/halo_ce_grace_F4284.spec.pt 2023-01-02T11:43:05.153256+0800 caller_uid=0, caller_gid=0{}) currently failed to authpin, subtree is being exported
2023-01-02T15:59:10.078+0800 7f55b1734700 0 log_channel(cluster) log [WRN] : slow request 15360.774800 seconds old, received at 2023-01-02T11:43:09.304967+0800: client_request(client.450609:59265051 lookup #0x40004e86a72/halo_ce_grace_F3233.wav 2023-01-02T11:43:09.243256+0800 caller_uid=0, caller_gid=0{}) currently failed to authpin, subtree is being exported

Eventually, many requests are blocked for hours. We can restore the cluster by restarting the affected MDS.

The valuable log:

2023-01-02T18:38:32.319+0800 7f55b3738700 10 mds.11.mig show_exporting  exporting to 8: (6) warning 0x40004e86a72.001001100* [dir 0x40004e86a72.001001100* /data/46f/732/03237764b1b2b824550ff4e750/data/vits_data/generate/wavs/ [2,head] auth{0=2,1=1,2=1,3=1,6=1,8=2,10=1} v=116159 cv=116158/116158 dir_auth=11,11 state=1610875907|complete|frozentree|exporting f(v1690 m2022-12-16T15:05:36.373759+0800 1929=1929+0) n(v900 rc2022-12-16T15:05:  36.373759+0800 b822676763 1929=1929+0) hs=1929+0,ss=0+0 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=0 0x563e5ee71200]
2023-01-02T18:38:33.103+0800 7f55b3738700 10 mds.11.mig show_exporting exporting to 8: (6) warning 0x40004e86a72.001001100* [dir 0x40004e86a72.001001100* /data/46f/732/03237764b1b2b824550ff4e750/data/vits_data/generate/wavs/ [2,head] auth{0=2,1=1,2=1,3=1,6=1,8=2,10=1} v=116159 cv=116158/116158 dir_auth=11,11 state=1610875907|complete|frozentree|exporting f(v1690 m2022-12-16T15:05:36.373759+0800 1929=1929+0) n(v900 rc2022-12-16T15:05: 36.373759+0800 b822676763 1929=1929+0) hs=1929+0,ss=0+0 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=0 0x563e5ee71200]

After reading the code about migrating,I think the reason is:
When one or more CEPH_SESSION_FLUSHMSG or MSG_MDS_EXPORTDIRNOTIFY msgs is lost, for example, because session is reseted or underlying connection is reconected(I think we belong to this situation); the dir will donot export anymore, this dir will be freezed forever.

Actions

Also available in: Atom PDF