Bug #50834
MDS heartbeat timed out between during executing MDCache::start_files_to_recover()
0%
Description
This issue happens with v14.2.19 (also v14.2.16). We have also discussed it in the mailing list https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/VFY3A6CLBYLJ3MZSVCQA2Q5BTMFSHHZD/#YRK275TY52ZTZMDVIXMLOUHMH34D2QUK.
2021-05-12 23:04:54.464 7f4f58b79700 1 mds.1.2305 handle_mds_map state change up:rejoin --> up:clientreplay
2021-05-12 23:04:54.464 7f4f58b79700 1 mds.1.2305 recovery_done -- successful recovery!
2021-05-12 23:04:54.464 7f4f58b79700 1 mds.1.cache start_recovered_truncates
2021-05-12 23:04:54.465 7f4f58b79700 1 mds.1.cache start_recovered_truncates done
2021-05-12 23:04:54.465 7f4f58b79700 1 mds.1.cache start_files_to_recover()
2021-05-12 23:05:13.452 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2021-05-12 23:05:13.453 7f4f56374700 0 mds.beacon.mds003 Skipping beacon heartbeat to monitors (last acked 4.00015s ago); MDS internal heartbeat is not healthy!
2021-05-12 23:05:13.953 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2021-05-12 23:05:13.953 7f4f56374700 0 mds.beacon.mds003 Skipping beacon heartbeat to monitors (last acked 4.50116s ago); MDS internal heartbeat is not healthy!
2021-05-12 23:05:14.453 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2021-05-12 23:05:14.453 7f4f56374700 0 mds.beacon.mds003 Skipping beacon heartbeat to monitors (last acked 5.00118s ago); MDS internal heartbeat is not healthy!
2021-05-12 23:05:14.953 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
omitting ...
2021-05-12 23:05:41.408 7f4f58b79700 1 mds.1.cache start_files_to_recover done
2021-05-12 23:05:41.408 7f4f58b79700 1 mds.1.2305 recovery_done -- successful recovery! done
2021-05-12 23:05:41.408 7f4f58b79700 1 mds.1.2305 clientreplay_start
2021-05-12 23:05:41.408 7f4f58b79700 1 mds.beacon.mds003 MDS connection to Monitors appears to be laggy; 31.9562s since last acked beacon
2021-05-12 23:05:41.411 7f4f53b6f700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15
2021-05-12 23:05:41.412 7f4f5236c700 -1 MDSIOContextBase: blacklisted! Restarting...
2021-05-12 23:05:41.412 7f4f5236c700 1 mds.mds003 respawn!
Related issues
History
#1 Updated by Patrick Donnelly almost 3 years ago
- Project changed from Ceph to CephFS
- Status changed from New to Fix Under Review
- Assignee set to Yongseok Oh
- Target version set to v17.0.0
- Source set to Community (dev)
- Backport set to pacific,octopus
- Pull request ID set to 41358
- Component(FS) MDS added
#2 Updated by Patrick Donnelly almost 3 years ago
- Status changed from Fix Under Review to Pending Backport
#3 Updated by Backport Bot almost 3 years ago
- Copied to Backport #50913: pacific: MDS heartbeat timed out between during executing MDCache::start_files_to_recover() added
#4 Updated by Backport Bot almost 3 years ago
- Copied to Backport #50914: octopus: MDS heartbeat timed out between during executing MDCache::start_files_to_recover() added
#5 Updated by Backport Bot over 1 year ago
- Tags set to backport_processed
#6 Updated by Konstantin Shalygin over 1 year ago
- Status changed from Pending Backport to Resolved
- Tags deleted (
backport_processed)