Project

General

Profile

Actions

Bug #57985

open

mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads

Added by Venky Shankar over 1 year ago. Updated 6 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

100%

Source:
Q/A
Tags:
backport_processed
Backport:
pacific,quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://bugzilla.redhat.com/show_bug.cgi?id=2134709

Generally seen when the MDS is heavily loaded with I/Os. Interesting thing is that the client-id in the warning message is (more often than not) the ceph-mgr daemon (libcephfs instance in ceph-mgr), although the libcephfs instance is not the one that is doing heaving I/Os (but there seems to be a possible in-direct relation somehow) - which needs RCA/explanation. Furthermore, it is possible that there are bugs lurking around client tid management which could cause these warnings.


Related issues 5 (1 open4 closed)

Related to CephFS - Backport #63339: reef: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloadsIn ProgressXiubo LiActions
Copied to CephFS - Backport #59021: quincy: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloadsResolvedVenky ShankarActions
Copied to CephFS - Backport #59022: pacific: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloadsDuplicateVenky ShankarActions
Copied to CephFS - Backport #59023: pacific: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloadsResolvedVenky ShankarActions
Copied to CephFS - Backport #59024: quincy: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloadsDuplicateVenky ShankarActions
Actions #1

Updated by Venky Shankar over 1 year ago

  • Status changed from New to Triaged
  • Assignee set to Ramana Raja
Actions #2

Updated by Venky Shankar over 1 year ago

  • Assignee changed from Ramana Raja to Venky Shankar
Actions #3

Updated by Venky Shankar over 1 year ago

Venky Shankar wrote:

https://bugzilla.redhat.com/show_bug.cgi?id=2134709

Generally seen when the MDS is heavily loaded with I/Os. Interesting thing is that the client-id in the warning message is (more often than not) the ceph-mgr daemon (libcephfs instance in ceph-mgr), although the libcephfs instance is not the one that is doing heaving I/Os (but there seems to be a possible in-direct relation somehow) - which needs RCA/explanation. Furthermore, it is possible that there are bugs lurking around client tid management which could cause these warnings.

I've started to look into this - it is not always the ceph-mgr daemon that gets reported in the warning. Maybe it was just coincidence. The tid management in the client and the mds looks sane too. What can be done for better debugability is to dump the oldest_tid that clients maintain. That would give some insights into which request was not acknowledged by the mds.

Actions #4

Updated by Venky Shankar over 1 year ago

Actions #5

Updated by Venky Shankar about 1 year ago

Enforce (stricter) client-id check in client limit test - https://github.com/ceph/ceph/pull/49844

Actions #6

Updated by Venky Shankar about 1 year ago

  • Status changed from Triaged to Pending Backport
Actions #7

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59021: quincy: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
Actions #8

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59022: pacific: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
Actions #9

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59023: pacific: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
Actions #10

Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed
Actions #11

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59024: quincy: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
Actions #12

Updated by Konstantin Shalygin 8 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions #13

Updated by Xiubo Li 6 months ago

Venky, this also should be backport to reef, else it caused the failures https://tracker.ceph.com/issues/63339.

Actions #14

Updated by Venky Shankar 6 months ago

  • Related to Backport #63339: reef: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
Actions #15

Updated by Venky Shankar 6 months ago

  • Status changed from Resolved to Pending Backport
  • Target version changed from v18.0.0 to v19.0.0
  • Backport changed from pacific,quincy to pacific,quincy,reef
Actions

Also available in: Atom PDF