Project

General

Profile

Bug #50834

MDS heartbeat timed out between during executing MDCache::start_files_to_recover()

Added by Yongsoek Oh about 1 month ago. Updated 30 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This issue happens with v14.2.19 (also v14.2.16). We have also discussed it in the mailing list https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/VFY3A6CLBYLJ3MZSVCQA2Q5BTMFSHHZD/#YRK275TY52ZTZMDVIXMLOUHMH34D2QUK.

2021-05-12 23:04:54.464 7f4f58b79700 1 mds.1.2305 handle_mds_map state change up:rejoin --> up:clientreplay
2021-05-12 23:04:54.464 7f4f58b79700 1 mds.1.2305 recovery_done -- successful recovery!
2021-05-12 23:04:54.464 7f4f58b79700 1 mds.1.cache start_recovered_truncates
2021-05-12 23:04:54.465 7f4f58b79700 1 mds.1.cache start_recovered_truncates done
2021-05-12 23:04:54.465 7f4f58b79700 1 mds.1.cache start_files_to_recover()
2021-05-12 23:05:13.452 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2021-05-12 23:05:13.453 7f4f56374700 0 mds.beacon.mds003 Skipping beacon heartbeat to monitors (last acked 4.00015s ago); MDS internal heartbeat is not healthy!
2021-05-12 23:05:13.953 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2021-05-12 23:05:13.953 7f4f56374700 0 mds.beacon.mds003 Skipping beacon heartbeat to monitors (last acked 4.50116s ago); MDS internal heartbeat is not healthy!
2021-05-12 23:05:14.453 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2021-05-12 23:05:14.453 7f4f56374700 0 mds.beacon.mds003 Skipping beacon heartbeat to monitors (last acked 5.00118s ago); MDS internal heartbeat is not healthy!
2021-05-12 23:05:14.953 7f4f56374700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15

omitting ...

2021-05-12 23:05:41.408 7f4f58b79700 1 mds.1.cache start_files_to_recover done
2021-05-12 23:05:41.408 7f4f58b79700 1 mds.1.2305 recovery_done -- successful recovery! done
2021-05-12 23:05:41.408 7f4f58b79700 1 mds.1.2305 clientreplay_start
2021-05-12 23:05:41.408 7f4f58b79700 1 mds.beacon.mds003 MDS connection to Monitors appears to be laggy; 31.9562s since last acked beacon
2021-05-12 23:05:41.411 7f4f53b6f700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15
2021-05-12 23:05:41.412 7f4f5236c700 -1 MDSIOContextBase: blacklisted! Restarting...
2021-05-12 23:05:41.412 7f4f5236c700 1 mds.mds003 respawn!


Related issues

Copied to CephFS - Backport #50913: pacific: MDS heartbeat timed out between during executing MDCache::start_files_to_recover() New
Copied to CephFS - Backport #50914: octopus: MDS heartbeat timed out between during executing MDCache::start_files_to_recover() New

History

#1 Updated by Patrick Donnelly about 1 month ago

  • Project changed from Ceph to CephFS
  • Status changed from New to Fix Under Review
  • Assignee set to Yongsoek Oh
  • Target version set to v17.0.0
  • Source set to Community (dev)
  • Backport set to pacific,octopus
  • Pull request ID set to 41358
  • Component(FS) MDS added

#2 Updated by Patrick Donnelly 30 days ago

  • Status changed from Fix Under Review to Pending Backport

#3 Updated by Backport Bot 30 days ago

  • Copied to Backport #50913: pacific: MDS heartbeat timed out between during executing MDCache::start_files_to_recover() added

#4 Updated by Backport Bot 30 days ago

  • Copied to Backport #50914: octopus: MDS heartbeat timed out between during executing MDCache::start_files_to_recover() added

Also available in: Atom PDF