Project

General

Profile

Actions

Bug #64988

closed

qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds"

Added by Patrick Donnelly about 2 months ago. Updated 29 days ago.

Status:
Resolved
Priority:
High
Category:
Testing
Target version:
% Done:

0%

Source:
Q/A
Tags:
backport_processed
Backport:
squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Related issues 3 (1 open2 closed)

Related to CephFS - Bug #64985: qa: mgr logs do not include client debuggingPending BackportPatrick Donnelly

Actions
Copied to CephFS - Backport #65092: reef: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds"ResolvedPatrick DonnellyActions
Copied to CephFS - Backport #65093: squid: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds"ResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly about 2 months ago

  • Related to Bug #64985: qa: mgr logs do not include client debugging added
Actions #2

Updated by Patrick Donnelly about 1 month ago

  • Status changed from New to In Progress
  • Assignee set to Patrick Donnelly

Okay, so as expected this is a non-issue:

2024-03-20T18:59:44.324+0000 7ff1adba6700  1 -- 172.21.15.42:0/4057698876 <== mon.0 v2:172.21.15.42:3300/0 2621 ==== mgrmap(e 19) ==== 137871+0+0 (secure 0 0 0) 0x55bdef6bef00 con 0x55bdec7ec400
2024-03-20T18:59:44.324+0000 7ff1adba6700 10 mgr ms_dispatch2 active mgrmap(e 19)
2024-03-20T18:59:44.324+0000 7ff1adba6700  4 mgr handle_mgr_map received map epoch 19
2024-03-20T18:59:44.324+0000 7ff1adba6700  4 mgr handle_mgr_map active in map: 1 active is 14150
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr handle_mgr_map respawning because set of enabled modules changed!
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  e: '/usr/bin/ceph-mgr'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  0: '/usr/bin/ceph-mgr'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  1: '-n'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  2: 'mgr.x'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  3: '-f'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  4: '--setuser'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  5: 'ceph'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  6: '--setgroup'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  7: 'ceph'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  8: '--default-log-to-file=false'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  9: '--default-log-to-journald=true'
2024-03-20T18:59:44.324+0000 7ff1adba6700  1 mgr respawn  10: '--default-log-to-stderr=false'
2024-03-20T18:59:44.325+0000 7ff1adba6700  1 mgr respawn respawning with exe /usr/bin/ceph-mgr
2024-03-20T18:59:44.325+0000 7ff1adba6700  1 mgr respawn  exe_path /proc/self/exe

/teuthology/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612921/remote/smithi042/log/6efffee4-e6ea-11ee-95c9-87774f69a715/ceph-mgr.x.log.gz

The mgr modules changed so it rebooted and the client instance got evicted.

I'll work on a fix.

Actions #3

Updated by Patrick Donnelly about 1 month ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56354
Actions #4

Updated by Greg Farnum about 1 month ago

The mgr modules changed so it rebooted and the client instance got evicted.

o_0

Shouldn’t we do a polite unmount when rebooting? Leaving a hanging client session from the manager seems real bad…
I guess when the monitor fails it over, it does a blocklist entry so the mds cleans up faster? Otherwise there’d be disasters there, too.

Actions #5

Updated by Patrick Donnelly about 1 month ago

Greg Farnum wrote:

The mgr modules changed so it rebooted and the client instance got evicted.

o_0

Shouldn’t we do a polite unmount when rebooting? Leaving a hanging client session from the manager seems real bad…
I guess when the monitor fails it over, it does a blocklist entry so the mds cleans up faster? Otherwise there’d be disasters there, too.

It's not really a big deal and unlikely to happen in production. Again, it only happens when a failover occurs between when the session is established and the beacon with the client addr is sent to the mons. The mgr doesn't do anything with the mount until it has acknowledgement**.

https://github.com/ceph/ceph/pull/51169/files#diff-50ab66411d9293d402a15e00ed6843a4d37889c616873e69534e609c210f72ec

Actions #6

Updated by Patrick Donnelly about 1 month ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Backport Bot about 1 month ago

  • Copied to Backport #65092: reef: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds" added
Actions #8

Updated by Backport Bot about 1 month ago

  • Copied to Backport #65093: squid: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds" added
Actions #9

Updated by Backport Bot about 1 month ago

  • Tags set to backport_processed
Actions #10

Updated by Patrick Donnelly 29 days ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF