Bug #57985
open
mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads
Added by Venky Shankar over 1 year ago.
Updated 6 months ago.
Category:
Correctness/Safety
Backport:
pacific,quincy,reef
Component(FS):
Client, MDS
Description
https://bugzilla.redhat.com/show_bug.cgi?id=2134709
Generally seen when the MDS is heavily loaded with I/Os. Interesting thing is that the client-id in the warning message is (more often than not) the ceph-mgr daemon (libcephfs instance in ceph-mgr), although the libcephfs instance is not the one that is doing heaving I/Os (but there seems to be a possible in-direct relation somehow) - which needs RCA/explanation. Furthermore, it is possible that there are bugs lurking around client tid management which could cause these warnings.
- Status changed from New to Triaged
- Assignee set to Ramana Raja
- Assignee changed from Ramana Raja to Venky Shankar
Venky Shankar wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=2134709
Generally seen when the MDS is heavily loaded with I/Os. Interesting thing is that the client-id in the warning message is (more often than not) the ceph-mgr daemon (libcephfs instance in ceph-mgr), although the libcephfs instance is not the one that is doing heaving I/Os (but there seems to be a possible in-direct relation somehow) - which needs RCA/explanation. Furthermore, it is possible that there are bugs lurking around client tid management which could cause these warnings.
I've started to look into this - it is not always the ceph-mgr daemon that gets reported in the warning. Maybe it was just coincidence. The tid management in the client and the mds looks sane too. What can be done for better debugability is to dump the oldest_tid that clients maintain. That would give some insights into which request was not acknowledged by the mds.
- Status changed from Triaged to Pending Backport
- Copied to Backport #59021: quincy: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
- Copied to Backport #59022: pacific: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
- Copied to Backport #59023: pacific: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
- Tags set to backport_processed
- Copied to Backport #59024: quincy: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100
- Related to Backport #63339: reef: mds: warning `clients failing to advance oldest client/flush tid` seen with some workloads added
- Status changed from Resolved to Pending Backport
- Target version changed from v18.0.0 to v19.0.0
- Backport changed from pacific,quincy to pacific,quincy,reef
Also available in: Atom
PDF