Project

General

Profile

Actions

Bug #13067

closed

MDSRank unhealthy on hammer -> infernalis upgrade

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-09-11 21:58:47.474680 7f3230796700  1 mds.0.2 handle_mds_map i am now mds.0.2
2015-09-11 21:58:47.474683 7f3230796700  1 mds.0.2 handle_mds_map state change up:rejoin --> up:active
2015-09-11 21:58:47.474692 7f3230796700  1 mds.0.2 recovery_done -- successful recovery!
2015-09-11 21:58:47.474822 7f3230796700  1 mds.0.2 active_start
2015-09-11 21:58:47.474843 7f3230796700  1 mds.0.2 cluster recovered.
2015-09-11 22:12:26.319204 7f3230796700  0 monclient: hunting for new mon
2015-09-11 22:12:46.434194 7f322d68f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:12:46.434209 7f322d68f700  1 mds.beacon.a _send skipping beacon, heartbeat map not healthy
2015-09-11 22:12:50.119904 7f323279a700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:12:50.434380 7f322d68f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:12:50.434389 7f322d68f700  1 mds.beacon.a _send skipping beacon, heartbeat map not healthy
2015-09-11 22:12:54.434563 7f322d68f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:12:54.434579 7f322d68f700  1 mds.beacon.a _send skipping beacon, heartbeat map not healthy
2015-09-11 22:12:55.120096 7f323279a700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:12:58.434754 7f322d68f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:12:58.434770 7f322d68f700  1 mds.beacon.a _send skipping beacon, heartbeat map not healthy
2015-09-11 22:13:00.120301 7f323279a700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:13:02.434936 7f322d68f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:13:02.434952 7f322d68f700  1 mds.beacon.a _send skipping beacon, heartbeat map not healthy
2015-09-11 22:13:05.120505 7f323279a700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2015-09-11 22:13:06.435116 7f322d68f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15

meanwhile the mon says

2015-09-11T15:27:52.700 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN mds a is laggy
2015-09-11T15:27:59.702 INFO:teuthology.orchestra.run.vpm110:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-09-11T15:27:59.928 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN mds a is laggy
2015-09-11T15:28:06.929 INFO:teuthology.orchestra.run.vpm110:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-09-11T15:28:07.177 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN mds a is laggy
2015-09-11T15:28:14.177 INFO:teuthology.orchestra.run.vpm110:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'

/a/sage-2015-09-11_14:28:20-upgrade:hammer-x-master---basic-vps/1050825

Actions #1

Updated by John Spray over 8 years ago

Hmm, kinda interesting that it's happening at the point in the log where the mons are restarted. Something in monc blocking perhaps?

Actions #2

Updated by Zheng Yan over 8 years ago

  • Assignee set to Zheng Yan
Actions #3

Updated by Zheng Yan over 8 years ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by Zheng Yan over 8 years ago

  • Status changed from Fix Under Review to 7
Actions #5

Updated by Zheng Yan over 8 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF