Project

General

Profile

Actions

Bug #20622

closed

mds: takeover mds stuck in up:replay after thrashing rank 0

Added by Patrick Donnelly almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-07-07T23:10:20.597 INFO:teuthology.orchestra.run.smithi183.stdout:{"epoch":94,"compat":{"compat":{},"ro_compat":{},"incompat":{"feature_1":"base v0.20","feature_2":"client writeable ranges","feature_3":"default file layouts on dirs","feature_4":"dir inode in separate object","feature_5":"mds uses versioned encoding","feature_6":"dirfrag is stored in omap","feature_8":"file layout v2"}},"feature_flags":{"enable_multiple":false,"ever_enabled_multiple":false},"standbys":[],"filesystems":[{"mdsmap":{"epoch":94,"flags":14,"ever_allowed_features":10,"explicitly_allowed_features":10,"created":"2017-07-07 21:50:01.445427","modified":"2017-07-07 21:50:01.445427","tableserver":0,"root":0,"session_timeout":60,"session_autoclose":300,"max_file_size":1099511627776,"last_failure":0,"last_failure_osd_epoch":70,"compat":{"compat":{},"ro_compat":{},"incompat":{"feature_1":"base v0.20","feature_2":"client writeable ranges","feature_3":"default file layouts on dirs","feature_4":"dir inode in separate object","feature_5":"mds uses versioned encoding","feature_6":"dirfrag is stored in omap","feature_8":"file layout v2"}},"max_mds":1,"in":[0],"up":{"mds_0":4551},"failed":[],"damaged":[],"stopped":[],"info":{"gid_4551":{"gid":4551,"name":"b-s-a","rank":0,"incarnation":94,"state":"up:replay","state_seq":1,"addr":"172.21.15.183:6812/4224425049","standby_for_rank":0,"standby_for_fscid":1,"standby_for_name":"a","standby_replay":false,"export_targets":[],"features":1152323339925389307}},"data_pools":[3],"metadata_pool":2,"enabled":true,"fs_name":"cephfs","balancer":"","standby_count_wanted":0},"id":1}]}
2017-07-07T23:10:20.606 INFO:tasks.mds_thrash.fs.[cephfs]:mds.a down, removed from mdsmap
2017-07-07T23:10:20.614 INFO:tasks.mds_thrash.fs.[cephfs]:waiting for mds cluster to stabilize...

From: /ceph/teuthology-archive/pdonnell-2017-07-07_20:24:01-fs-wip-pdonnell-20170706-distro-basic-smithi/1372252/teuthology.log

MDS log: /ceph/teuthology-archive/pdonnell-2017-07-07_20:24:01-fs-wip-pdonnell-20170706-distro-basic-smithi/1372252/remote/smithi183/log/ceph-mds.b-s-a.log.gz

Actions #1

Updated by Patrick Donnelly almost 7 years ago

Zheng, please take a look.

Actions #2

Updated by Zheng Yan almost 7 years ago

  • Priority changed from Urgent to Normal

3 of 6 osds failed. But there is no clue why they failed.

2017-07-07 23:01:50.480385 mon.b mon.0 172.21.15.34:6789/0 789 : cluster [INF] HEALTH_ERR; 3 osds down; 3 pgs are stuck inactive for more than 300 seconds; 7 pgs peering; 4 pgs stale; 3 pgs stuck inactive; 3 pgs stuck unclean
2017-07-07 23:01:50.480385 mon.b mon.0 172.21.15.34:6789/0 789 : cluster [INF] HEALTH_ERR; 3 osds down; 3 pgs are stuck inactive for more than 300 seconds; 7 pgs peering; 4 pgs stale; 3 pgs stuck inactive; 3 pgs stuck unclean
2017-07-07 23:02:11.483590 mon.b mon.0 172.21.15.34:6789/0 790 : cluster [WRN] MDS health message (mds.0): 67 slow requests are blocked > 30 sec
2017-07-07 23:02:11.483590 mon.b mon.0 172.21.15.34:6789/0 790 : cluster [WRN] MDS health message (mds.0): 67 slow requests are blocked > 30 sec
Actions #3

Updated by Zheng Yan almost 7 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF