Project

General

Profile

Actions

Bug #43514

closed

qa: test setUp may cause spurious MDS_INSUFFICIENT_STANDBY

Added by Patrick Donnelly over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-01-07T16:46:12.328 INFO:teuthology.orchestra.run:waiting for 300
2020-01-07T16:46:12.340 INFO:tasks.ceph.mds.a:Stopped
2020-01-07T16:46:12.340 DEBUG:teuthology.parallel:result is None
2020-01-07T16:46:12.373 INFO:tasks.ceph.mds.b:Stopped
2020-01-07T16:46:12.373 DEBUG:teuthology.parallel:result is None
2020-01-07T16:46:12.373 INFO:teuthology.orchestra.run.smithi093:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mds fail a
2020-01-07T16:46:12.374 INFO:teuthology.orchestra.run.smithi093:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mds fail b
...
2020-01-07T16:46:12.705 INFO:teuthology.orchestra.run.smithi093.stderr:2020-01-07T16:46:12.697+0000 7f40cf974700  1 -- 172.21.15.93:0/3162667050 --> [v2:172.21.15.93:3300/0,v1:172.21.15.93:6789/0] -- mon_command({"prefix": "mds fail", "role_or_gid": "b"} v 0) v1 -- 0x7f40c809be80 con 0x7f40c8136950
2020-01-07T16:46:12.706 INFO:teuthology.orchestra.run.smithi093.stderr:2020-01-07T16:46:12.698+0000 7fee475ec700  1 -- 172.21.15.93:0/3550511259 --> [v2:172.21.15.106:3300/0,v1:172.21.15.106:6789/0] -- mon_command({"prefix": "mds fail", "role_or_gid": "a"} v 0) v1 -- 0x7fee40098b80 con 0x7fee4007ebc0
2020-01-07T16:46:13.685 INFO:tasks.ceph.mon.a.smithi093.stderr:2020-01-07T16:46:13.682+0000 7fb42c1c8700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
...
2020-01-07T16:52:08.538 INFO:tasks.ceph:Checking cluster log for badness...
2020-01-07T16:52:08.538 INFO:teuthology.orchestra.run.smithi093:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v 'overall HEALTH_' | egrep -v '\(FS_DEGRADED\)' | egrep -v '\(MDS_FAILED\)' | egrep -v '\(MDS_DEGRADED\)' | egrep -v '\(FS_WITH_FAILED_MDS\)' | egrep -v '\(MDS_DAMAGE\)' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(FS_INLINE_DATA_DEPRECATED\)' | egrep -v 'overall HEALTH_' | egrep -v '\(OSD_DOWN\)' | egrep -v '\(OSD_' | egrep -v 'but it is still running' | egrep -v 'is not responding' | head -n 1
2020-01-07T16:52:08.585 INFO:teuthology.orchestra.run.smithi093.stdout:2020-01-07T16:46:13.683438+0000 mon.a (mon.0) 805 : cluster [WRN] Health check failed: insufficient standby MDS daemons available (MDS_INSUFFICIENT_STANDBY)

It's not necessary to fail the MDS individually. They will restart when removed from the MDSMap.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #43568: nautilus: qa: test setUp may cause spurious MDS_INSUFFICIENT_STANDBYResolvedNathan CutlerActions
Actions #1

Updated by Patrick Donnelly over 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32532
Actions #2

Updated by Patrick Donnelly over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #3

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #43568: nautilus: qa: test setUp may cause spurious MDS_INSUFFICIENT_STANDBY added
Actions #4

Updated by Nathan Cutler about 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF