Bug #64988
closedqa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds"
0%
Description
Updated by Patrick Donnelly about 2 months ago
- Related to Bug #64985: qa: mgr logs do not include client debugging added
Updated by Patrick Donnelly about 1 month ago
- Status changed from New to In Progress
- Assignee set to Patrick Donnelly
Okay, so as expected this is a non-issue:
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 -- 172.21.15.42:0/4057698876 <== mon.0 v2:172.21.15.42:3300/0 2621 ==== mgrmap(e 19) ==== 137871+0+0 (secure 0 0 0) 0x55bdef6bef00 con 0x55bdec7ec400 2024-03-20T18:59:44.324+0000 7ff1adba6700 10 mgr ms_dispatch2 active mgrmap(e 19) 2024-03-20T18:59:44.324+0000 7ff1adba6700 4 mgr handle_mgr_map received map epoch 19 2024-03-20T18:59:44.324+0000 7ff1adba6700 4 mgr handle_mgr_map active in map: 1 active is 14150 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr handle_mgr_map respawning because set of enabled modules changed! 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn e: '/usr/bin/ceph-mgr' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 0: '/usr/bin/ceph-mgr' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 1: '-n' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 2: 'mgr.x' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 3: '-f' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 4: '--setuser' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 5: 'ceph' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 6: '--setgroup' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 7: 'ceph' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 8: '--default-log-to-file=false' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 9: '--default-log-to-journald=true' 2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 10: '--default-log-to-stderr=false' 2024-03-20T18:59:44.325+0000 7ff1adba6700 1 mgr respawn respawning with exe /usr/bin/ceph-mgr 2024-03-20T18:59:44.325+0000 7ff1adba6700 1 mgr respawn exe_path /proc/self/exe
/teuthology/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612921/remote/smithi042/log/6efffee4-e6ea-11ee-95c9-87774f69a715/ceph-mgr.x.log.gz
The mgr modules changed so it rebooted and the client instance got evicted.
I'll work on a fix.
Updated by Patrick Donnelly about 1 month ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 56354
Updated by Greg Farnum about 1 month ago
The mgr modules changed so it rebooted and the client instance got evicted.
o_0
Shouldn’t we do a polite unmount when rebooting? Leaving a hanging client session from the manager seems real bad…
I guess when the monitor fails it over, it does a blocklist entry so the mds cleans up faster? Otherwise there’d be disasters there, too.
Updated by Patrick Donnelly about 1 month ago
Greg Farnum wrote:
The mgr modules changed so it rebooted and the client instance got evicted.
o_0
Shouldn’t we do a polite unmount when rebooting? Leaving a hanging client session from the manager seems real bad…
I guess when the monitor fails it over, it does a blocklist entry so the mds cleans up faster? Otherwise there’d be disasters there, too.
It's not really a big deal and unlikely to happen in production. Again, it only happens when a failover occurs between when the session is established and the beacon with the client addr is sent to the mons. The mgr doesn't do anything with the mount until it has acknowledgement**.
- actually only after https://github.com/ceph/ceph/pull/51169 is merged. See:
Updated by Patrick Donnelly about 1 month ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot about 1 month ago
- Copied to Backport #65092: reef: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds" added
Updated by Backport Bot about 1 month ago
- Copied to Backport #65093: squid: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds" added
Updated by Patrick Donnelly 29 days ago
- Status changed from Pending Backport to Resolved