Project

General

Profile

Bug #12776

qa: standby MDS not shutting down, "reached maximum tries (50) after waiting for 300 seconds"

Added by Greg Farnum almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Low
Assignee:
-
Category:
Testing
Target version:
-
Start date:
08/25/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

http://pulpito.ceph.com/teuthology-2015-08-17_23:04:01-fs-master---basic-multi/1020415/

The standby MDS doesn't look to have ever been shut down. Logging this for posterity in case it reoccurs.

Associated revisions

Revision f420fe46 (diff)
Added by John Spray almost 4 years ago

mds: fix shutdown while in standby

Fixes: #12776
Signed-off-by: John Spray <>

History

#1 Updated by John Spray almost 4 years ago

It's getting the signal, but not making it through shutdown:

2015-08-20 19:41:55.531580 d0aa700 10 mds.beacon.a handle_mds_beacon up:standby seq 444 rtt 0.009156
2015-08-20 19:41:57.671119 10bb4700 -1 mds.a *** got signal Terminated ***
2015-08-20 19:41:57.675505 10bb4700  1 mds.a suicide.  wanted state up:standby
2015-08-20 19:41:57.713564 10bb4700 10 mds.beacon.a set_want_state: up:standby -> down:dne
2015-08-20 19:41:59.522483 103b3700 10 mds.beacon.a _send down:dne seq 445
2015-08-20 19:41:59.522808 103b3700  1 -- 10.214.132.9:6808/29038 --> 10.214.132.9:6789/0 -- mdsbeacon(4125/a down:dne seq 445 v7) v4 -- ?+0 0x10c25c00 con 0x408c160
2015-08-20 19:41:59.612962 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 469 ==== mdsbeacon(4125/a down:dne seq 445 v7) v4 ==== 113+0+0 (312469458 0 0) 0x10c1f800 con 0x408c160
2015-08-20 19:41:59.613305 d0aa700 10 mds.beacon.a handle_mds_beacon down:dne seq 445 rtt 0.090705
2015-08-20 19:41:59.887211 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 470 ==== mdsmap(e 8) v1 ==== 668+0+0 (2392490937 0 0) 0x40718c0 con 0x408c160
2015-08-20 19:41:59.888392 d0aa700 10 mds.a  stopping, discarding mdsmap(e 8) v1
2015-08-20 19:41:59.904605 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 471 ==== mdsbeacon(4125/a down:dne seq 445 v8) v4 ==== 113+0+0 (54781704 0 0) 0x10c1f500 con 0x408c160
2015-08-20 19:41:59.905728 d0aa700 10 mds.beacon.a handle_mds_beacon down:dne seq 445 dne
2015-08-20 19:42:03.522645 103b3700 10 mds.beacon.a _send down:dne seq 446
2015-08-20 19:42:03.522933 103b3700  1 -- 10.214.132.9:6808/29038 --> 10.214.132.9:6789/0 -- mdsbeacon(4125/a down:dne seq 446 v7) v4 -- ?+0 0x10c25900 con 0x408c160
2015-08-20 19:42:03.542928 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 472 ==== mdsmap(e 8) v1 ==== 668+0+0 (2392490937 0 0) 0x4070480 con 0x408c160
2015-08-20 19:42:03.543320 d0aa700 10 mds.a  stopping, discarding mdsmap(e 8) v1
2015-08-20 19:42:07.522827 103b3700 10 mds.beacon.a _send down:dne seq 447
2015-08-20 19:42:07.523097 103b3700  1 -- 10.214.132.9:6808/29038 --> 10.214.132.9:6789/0 -- mdsbeacon(4125/a down:dne seq 447 v7) v4 -- ?+0 0x10c25600 con 0x408c160
2015-08-20 19:42:07.530683 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 473 ==== mdsmap(e 8) v1 ==== 668+0+0 (2392490937 0 0) 0x4070900 con 0x408c160
2015-08-20 19:42:07.530995 d0aa700 10 mds.a  stopping, discarding mdsmap(e 8) v1
2015-08-20 19:42:11.523014 103b3700 10 mds.beacon.a _send down:dne seq 448
2015-08-20 19:42:11.523463 103b3700  1 -- 10.214.132.9:6808/29038 --> 10.214.132.9:6789/0 -- mdsbeacon(4125/a down:dne seq 448 v7) v4 -- ?+0 0x10c25300 con 0x408c160
2015-08-20 19:42:11.531122 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 474 ==== mdsmap(e 8) v1 ==== 668+0+0 (2392490937 0 0) 0x4071200 con 0x408c160
2015-08-20 19:42:11.531435 d0aa700 10 mds.a  stopping, discarding mdsmap(e 8) v1
2015-08-20 19:42:15.523201 103b3700 10 mds.beacon.a _send down:dne seq 449
2015-08-20 19:42:15.523491 103b3700  1 -- 10.214.132.9:6808/29038 --> 10.214.132.9:6789/0 -- mdsbeacon(4125/a down:dne seq 449 v7) v4 -- ?+0 0x10c25000 con 0x408c160
2015-08-20 19:42:15.531264 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 475 ==== mdsmap(e 8) v1 ==== 668+0+0 (2392490937 0 0) 0x4071200 con 0x408c160
2015-08-20 19:42:15.531578 d0aa700 10 mds.a  stopping, discarding mdsmap(e 8) v1
2015-08-20 19:42:19.523385 103b3700 10 mds.beacon.a _send down:dne seq 450
2015-08-20 19:42:19.523676 103b3700  1 -- 10.214.132.9:6808/29038 --> 10.214.132.9:6789/0 -- mdsbeacon(4125/a down:dne seq 450 v7) v4 -- ?+0 0x10c24d00 con 0x408c160
2015-08-20 19:42:19.531301 d0aa700  1 -- 10.214.132.9:6808/29038 <== mon.0 10.214.132.9:6789/0 476 ==== mdsmap(e 8) v1 ==== 668+0+0 (2392490937 0 0) 0x4070900 con 0x408c160
2015-08-20 19:42:19.531617 d0aa700 10 mds.a  stopping, discarding mdsmap(e 8) v1
2015-08-20 19:42:23.523557 103b3700 10 mds.beacon.a _send down:dne seq 451
2015-08-20 19:42:23.523835 103b3700  1 -- 10.214.132.9:6808/29038 --> 10.214.132.9:6789/0 -- mdsbeacon(4125/a down:dne seq 451 v7) v4 -- ?+0 0x10c24a00 con 0x408c160

#2 Updated by John Spray almost 4 years ago

Actually, I just tried sending SIGTERM to a standby mds here, and it's getting stuck too.

#3 Updated by John Spray almost 4 years ago

  • Status changed from New to Need Review

#4 Updated by Zheng Yan over 3 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF