Project

General

Profile

Actions

Bug #8651

closed

crashing mds in an active-active mds setup

Added by Kenneth Waegeman almost 10 years ago. Updated over 9 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2 active mds, crashing while writing 4 rsync streams to it with cephko

{ "mdsmap": { "epoch": 1428,
"flags": 0,
"created": "2014-06-17 16:18:31.237442",
"modified": "2014-06-24 10:55:46.211994",
"tableserver": 0,
"root": 0,
"session_timeout": 60,
"session_autoclose": 300,
"max_file_size": 1099511627776,
"last_failure": 1419,
"last_failure_osd_epoch": 814,
"compat": { "compat": {},
"ro_compat": {},
"incompat": { "feature_1": "base v0.20",
"feature_2": "client writeable ranges",
"feature_3": "default file layouts on dirs",
"feature_4": "dir inode in separate object",
"feature_5": "mds uses versioned encoding",
"feature_6": "dirfrag is stored in omap",
"feature_8": "no anchor table"}},
"max_mds": 2,
"in": [
0,
1],
"up": { "mds_0": 23863,
"mds_1": 24119},
"failed": [],
"stopped": [],
"info": { "gid_23863": { "gid": 23863,
"name": "ceph002.cubone.os",
"rank": 0,
"incarnation": 147,
"state": "up:active",
"state_seq": 82,
"addr": "10.141.8.181:6833\/49442",
"standby_for_rank": -1,
"standby_for_name": "",
"export_targets": []},
"gid_24119": { "gid": 24119,
"name": "ceph001.cubone.os",
"rank": 1,
"incarnation": 34,
"state": "up:active",
"state_seq": 19,
"addr": "10.141.8.180:6831\/37285",
"standby_for_rank": -1,
"standby_for_name": "",
"export_targets": []}},
"data_pools": [
0],
"metadata_pool": 1},
"mdsmap_first_committed": 751,
"mdsmap_last_committed": 1428}

     -69> 2014-06-24 10:49:07.213947 7f76628c6700 10 monclient(hunting): renew_subs
   -68> 2014-06-24 10:49:07.213952 7f76628c6700  1 -- 10.141.8.182:6832/43239 <== osd.10 10.141.8.181:6800/24877 10005 ==== osd_op_reply(7
0278 1000009f90c.00000000 [??? 1~0,omap-set-header 0~222,omap-set-vals 0~6417,omap-rm-keys 0~433] v811'232250 uv232250 ondisk = 0) v6 ====
 313+0+0 (4229282455 0 0) 0x2624b0c80 con 0x375a3c0
   -67> 2014-06-24 10:49:07.214029 7f76628c6700  5 mds.1.33 is_laggy 79.848239 > 15 since last acked beacon
   -66> 2014-06-24 10:49:07.214035 7f76628c6700  5 mds.1.33 initiating monitor reconnect; maybe we're not the slow one
   -65> 2014-06-24 10:49:07.214037 7f76628c6700 10 monclient(hunting): _reopen_session rank -1 name 
   -64> 2014-06-24 10:49:07.214038 7f76628c6700  1 -- 10.141.8.182:6832/43239 mark_down 0x1c277aaa0 -- 0x2624b0f00
   -63> 2014-06-24 10:49:07.214055 7f765a67c700  2 -- 10.141.8.182:6832/43239 >> 10.141.8.181:6789/0 pipe(0x2624b0f00 sd=103 :0 s=4 pgs=0 
cs=0 l=1 c=0x1c277aaa0).connect couldn't read banner, (0) Success
   -62> 2014-06-24 10:49:07.214066 7f76628c6700 10 monclient(hunting): picked mon.ceph001 con 0xb74461a0 addr 10.141.8.180:6789/0
   -61> 2014-06-24 10:49:07.214086 7f76628c6700 10 monclient(hunting): _send_mon_message to mon.ceph001 at 10.141.8.180:6789/0
   -60> 2014-06-24 10:49:07.214091 7f76628c6700  1 -- 10.141.8.182:6832/43239 --> 10.141.8.180:6789/0 -- auth(proto 0 42 bytes epoch 1) v1
 -- ?+0 0x21d300b40 con 0xb74461a0
   -59> 2014-06-24 10:49:07.214101 7f76628c6700 10 monclient(hunting): renew_subs
   -58> 2014-06-24 10:49:07.214081 7f765a67c700  3 -- 10.141.8.182:6832/43239 >> 10.141.8.181:6789/0 pipe(0x2624b0f00 sd=103 :0 s=4 pgs=0 
cs=0 l=1 c=0x1c277aaa0).connect fault, but state = closed != connecting, stopping
   -57> 2014-06-24 10:49:07.214111 7f76628c6700  1 -- 10.141.8.182:6832/43239 <== osd.21 10.141.8.182:6803/26308 10015 ==== osd_op_reply(6
5181 1000006b4f2.00000000 [??? 1~0,omap-set-header 0~222,omap-set-vals 0~399] v811'87242 uv87242 ondisk = 0) v6 ==== 271+0+0 (2860133359 0
 0) 0x1b57fd000 con 0x3759fa0
   -56> 2014-06-24 10:49:07.214136 7f76628c6700  5 mds.1.33 is_laggy 79.848346 > 15 since last acked beacon
   -55> 2014-06-24 10:49:07.214141 7f76628c6700  5 mds.1.33 initiating monitor reconnect; maybe we're not the slow one
   -54> 2014-06-24 10:49:07.214143 7f76628c6700 10 monclient(hunting): _reopen_session rank -1 name 
   -53> 2014-06-24 10:49:07.214152 7f76628c6700  1 -- 10.141.8.182:6832/43239 mark_down 0xb74461a0 -- 0x1b57fcd80
   -52> 2014-06-24 10:49:07.214179 7f76628c6700 10 monclient(hunting): picked mon.ceph003 con 0xb7447380 addr 10.141.8.182:6789/0
   -51> 2014-06-24 10:49:07.214167 7f765986e700  2 -- 10.141.8.182:6832/43239 >> 10.141.8.180:6789/0 pipe(0x1b57fcd80 sd=130 :0 s=4 pgs=0 
cs=0 l=1 c=0xb74461a0).connect couldn't read banner, (0) Success
   -50> 2014-06-24 10:49:07.214189 7f76628c6700 10 monclient(hunting): _send_mon_message to mon.ceph003 at 10.141.8.182:6789/0
   -49> 2014-06-24 10:49:07.214194 7f76628c6700  1 -- 10.141.8.182:6832/43239 --> 10.141.8.182:6789/0 -- auth(proto 0 42 bytes epoch 1) v1
 -- ?+0 0x21d303f00 con 0xb7447380
   -48> 2014-06-24 10:49:07.214206 7f76628c6700 10 monclient(hunting): renew_subs
   -47> 2014-06-24 10:49:07.214214 7f76628c6700  1 -- 10.141.8.182:6832/43239 <== osd.10 10.141.8.181:6800/24877 10006 ==== osd_op_reply(7
0249 200000ab00b.00000000 [??? 1~0,omap-set-header 0~222,omap-set-vals 0~196281,omap-rm-keys 0~13579] v811'250306 uv250306 ondisk = 0) v6 
==== 313+0+0 (1741933792 0 0) 0x26258fd00 con 0x375a3c0
   -46> 2014-06-24 10:49:07.214191 7f765986e700  3 -- 10.141.8.182:6832/43239 >> 10.141.8.180:6789/0 pipe(0x1b57fcd80 sd=130 :0 s=4 pgs=0 
cs=0 l=1 c=0xb74461a0).connect fault, but state = closed != connecting, stopping
   -45> 2014-06-24 10:49:07.216106 7f76628c6700  5 mds.1.33 is_laggy 79.850316 > 15 since last acked beacon
   -44> 2014-06-24 10:49:07.216117 7f76628c6700  5 mds.1.33 initiating monitor reconnect; maybe we're not the slow one
   -43> 2014-06-24 10:49:07.216118 7f76628c6700 10 monclient(hunting): _reopen_session rank -1 name 
   -42> 2014-06-24 10:49:07.216121 7f76628c6700  1 -- 10.141.8.182:6832/43239 mark_down 0xb7447380 -- 0x2624b0c80
   -41> 2014-06-24 10:49:07.216169 7f7658f65700  2 -- 10.141.8.182:6832/43239 >> 10.141.8.182:6789/0 pipe(0x2624b0c80 sd=103 :50609 s=4 pg
s=20558 cs=1 l=1 c=0xb7447380).reader couldn't read tag, (0) Success
   -40> 2014-06-24 10:49:07.216186 7f76628c6700 10 monclient(hunting): picked mon.ceph001 con 0x13dcd4ba0 addr 10.141.8.180:6789/0
   -39> 2014-06-24 10:49:07.216209 7f76628c6700 10 monclient(hunting): _send_mon_message to mon.ceph001 at 10.141.8.180:6789/0
   -38> 2014-06-24 10:49:07.216199 7f7658f65700  2 -- 10.141.8.182:6832/43239 >> 10.141.8.182:6789/0 pipe(0x2624b0c80 sd=103 :50609 s=4 pg
s=20558 cs=1 l=1 c=0xb7447380).fault (0) Success
   -37> 2014-06-24 10:49:07.216216 7f76628c6700  1 -- 10.141.8.182:6832/43239 --> 10.141.8.180:6789/0 -- auth(proto 0 42 bytes epoch 1) v1
 -- ?+0 0x21d304140 con 0x13dcd4ba0
   -36> 2014-06-24 10:49:07.216224 7f76628c6700 10 monclient(hunting): renew_subs
   -35> 2014-06-24 10:49:07.216232 7f76628c6700  5 mds.1.33 ms_handle_connect on 10.141.8.182:6789/0
   -34> 2014-06-24 10:49:07.216241 7f76628c6700  1 -- 10.141.8.182:6832/43239 <== osd.21 10.141.8.182:6803/26308 10016 ==== osd_op_reply(65216 1000006b900.00000000 [??? 1~0,omap-set-header 0~222,omap-set-vals 0~400] v811'93660 uv93660 ondisk = 0) v6 ==== 271+0+0 (2766708515 0 0) 0x1b57fd280 con 0x3759fa0
   -33> 2014-06-24 10:49:07.216280 7f76628c6700  5 mds.1.33 is_laggy 79.850489 > 15 since last acked beacon
   -32> 2014-06-24 10:49:07.216287 7f76628c6700  5 mds.1.33 initiating monitor reconnect; maybe we're not the slow one
   -31> 2014-06-24 10:49:07.216288 7f76628c6700 10 monclient(hunting): _reopen_session rank -1 name 
 -30> 2014-06-24 10:49:07.216292 7f76628c6700  1 -- 10.141.8.182:6832/43239 mark_down 0x13dcd4ba0 -- 0x1b57fd000
   -29> 2014-06-24 10:49:07.216322 7f76628c6700 10 monclient(hunting): picked mon.ceph003 con 0x2c119a20 addr 10.141.8.182:6789/0
   -28> 2014-06-24 10:49:07.216332 7f76628c6700 10 monclient(hunting): _send_mon_message to mon.ceph003 at 10.141.8.182:6789/0
   -27> 2014-06-24 10:49:07.216310 7f765986e700  2 -- 10.141.8.182:6832/43239 >> 10.141.8.180:6789/0 pipe(0x1b57fd000 sd=130 :0 s=4 pgs=0 
cs=0 l=1 c=0x13dcd4ba0).connect couldn't read banner, (0) Success
   -26> 2014-06-24 10:49:07.216341 7f76628c6700  1 -- 10.141.8.182:6832/43239 --> 10.141.8.182:6789/0 -- auth(proto 0 42 bytes epoch 1) v1
 -- ?+0 0x21d3072c0 con 0x2c119a20
   -25> 2014-06-24 10:49:07.216354 7f76628c6700 10 monclient(hunting): renew_subs
   -24> 2014-06-24 10:49:07.216350 7f765986e700  3 -- 10.141.8.182:6832/43239 >> 10.141.8.180:6789/0 pipe(0x1b57fd000 sd=130 :0 s=4 pgs=0 
cs=0 l=1 c=0x13dcd4ba0).connect fault, but state = closed != connecting, stopping
   -23> 2014-06-24 10:49:07.216361 7f76628c6700  1 -- 10.141.8.182:6832/43239 <== osd.10 10.141.8.181:6800/24877 10007 ==== osd_op_reply(7
0317 100000a0f12.00000000 [??? 1~0,omap-set-header 0~222,omap-set-vals 0~1594] v811'289159 uv289159 ondisk = 0) v6 ==== 271+0+0 (169687627
8 0 0) 0x26258fa80 con 0x375a3c0
   -22> 2014-06-24 10:49:07.216408 7f76628c6700  5 mds.1.33 is_laggy 79.850617 > 15 since last acked beacon
   -21> 2014-06-24 10:49:07.216411 7f76628c6700  5 mds.1.33 initiating monitor reconnect; maybe we're not the slow one
   -20> 2014-06-24 10:49:07.216412 7f76628c6700 10 monclient(hunting): _reopen_session rank -1 name 
   -19> 2014-06-24 10:49:07.216414 7f76628c6700  1 -- 10.141.8.182:6832/43239 mark_down 0x2c119a20 -- 0x26258fd00
   -18> 2014-06-24 10:49:07.216432 7f76628c6700 10 monclient(hunting): picked mon.ceph002 con 0x2c11d540 addr 10.141.8.181:6789/0
   -17> 2014-06-24 10:49:07.216438 7f76628c6700 10 monclient(hunting): _send_mon_message to mon.ceph002 at 10.141.8.181:6789/0
   -16> 2014-06-24 10:49:07.216440 7f76628c6700  1 -- 10.141.8.182:6832/43239 --> 10.141.8.181:6789/0 -- auth(proto 0 42 bytes epoch 1) v1
 -- ?+0 0x21ca686c0 con 0x2c11d540
   -15> 2014-06-24 10:49:07.216445 7f76628c6700 10 monclient(hunting): renew_subs
   -14> 2014-06-24 10:49:07.216451 7f76628c6700  1 -- 10.141.8.182:6832/43239 <== osd.1 10.141.8.180:6803/28178 7803 ==== osd_op_reply(268
39 1000009992c.00000000 [??? 1~0,omap-set-header 0~222,omap-set-vals 0~4859,omap-rm-keys 0~371] v813'77416 uv77416 ondisk = 0) v6 ==== 313
+0+0 (2412616183 0 0) 0x25bb8f800 con 0x375b2e0
   -13> 2014-06-24 10:49:07.216497 7f76628c6700  5 mds.1.33 is_laggy 79.850707 > 15 since last acked beacon
   -12> 2014-06-24 10:49:07.216503 7f76628c6700  5 mds.1.33 initiating monitor reconnect; maybe we're not the slow one
   -11> 2014-06-24 10:49:07.216504 7f76628c6700 10 monclient(hunting): _reopen_session rank -1 name 
   -10> 2014-06-24 10:49:07.216507 7f76628c6700  1 -- 10.141.8.182:6832/43239 mark_down 0x2c11d540 -- 0x1b57fd280
    -9> 2014-06-24 10:49:07.216484 7f7658f65700  2 -- 10.141.8.182:6832/43239 >> 10.141.8.182:6789/0 pipe(0x26258fd00 sd=103 :0 s=4 pgs=0 
cs=0 l=1 c=0x2c119a20).connect couldn't read banner, (0) Success
    -8> 2014-06-24 10:49:07.216534 7f7658f65700  3 -- 10.141.8.182:6832/43239 >> 10.141.8.182:6789/0 pipe(0x26258fd00 sd=103 :0 s=4 pgs=0 
cs=0 l=1 c=0x2c119a20).connect fault, but state = closed != connecting, stopping
    -7> 2014-06-24 10:49:07.216530 7f765986e700  2 -- 10.141.8.182:6832/43239 >> 10.141.8.181:6789/0 pipe(0x1b57fd280 sd=130 :0 s=4 pgs=0 
cs=0 l=1 c=0x2c11d540).connect couldn't read banner, (0) Success
    -6> 2014-06-24 10:49:07.216551 7f76628c6700 10 monclient(hunting): picked mon.ceph001 con 0x2c11d960 addr 10.141.8.180:6789/0
    -5> 2014-06-24 10:49:07.216569 7f76628c6700 10 monclient(hunting): _send_mon_message to mon.ceph001 at 10.141.8.180:6789/0
    -4> 2014-06-24 10:49:07.216577 7f76628c6700  1 -- 10.141.8.182:6832/43239 --> 10.141.8.180:6789/0 -- auth(proto 0 42 bytes epoch 1) v1
 -- ?+0 0x21ca6ca40 con 0x2c11d960
    -3> 2014-06-24 10:49:07.216554 7f765986e700  3 -- 10.141.8.182:6832/43239 >> 10.141.8.181:6789/0 pipe(0x1b57fd280 sd=130 :0 s=4 pgs=0 
cs=0 l=1 c=0x2c11d540).connect fault, but state = closed != connecting, stopping
    -2> 2014-06-24 10:49:07.216587 7f76628c6700 10 monclient(hunting): renew_subs
    -1> 2014-06-24 10:49:07.216594 7f76628c6700  1 -- 10.141.8.182:6832/43239 <== osd.0 10.141.8.180:6800/26720 7386 ==== osd_op_reply(697
77 200000a8488.00000000 [??? 1~0,omap-set-header 0~222,omap-set-vals 0~7618,omap-rm-keys 0~512] v0'0 uv0 ondisk = -108 ((108) Cannot send 
after transport endpoint shutdown)) v6 ==== 313+0+0 (1160011566 0 0) 0x258993480 con 0x3759e40
     0> 2014-06-24 10:49:07.235120 7f76628c6700 -1 mds/CDir.cc: In function 'virtual void C_Dir_Committed::finish(int)' thread 7f76628c670
0 time 2014-06-24 10:49:07.219091
mds/CDir.cc: 1767: FAILED assert(r == 0)

ceph version 0.81 (8de9501df275a5fe29f2c64cb44f195130e4a8fc)
 1: /usr/bin/ceph-mds() [0x75d361]
 2: (Context::complete(int)+0x9) [0x56a829]
 3: (C_Gather::sub_finish(Context*, int)+0x227) [0x56c3a7]
 4: (C_Gather::C_GatherSub::finish(int)+0x12) [0x56c4b2]
 5: (Context::complete(int)+0x9) [0x56a829]
 6: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xf56) [0x7cf306]
 7: (MDS::handle_core_message(Message*)+0xb27) [0x58c447]
 8: (MDS::_dispatch(Message*)+0x45) [0x58c655]
 9: (MDS::ms_dispatch(Message*)+0xc3) [0x58e0a3]
 10: (DispatchQueue::entry()+0x509) [0x9a9409]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x8c205d]
 12: (()+0x7df3) [0x7f766750fdf3]
 13: (clone()+0x6d) [0x7f76663e63dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.ceph003.cubone.os.log
 ceph version 0.81 (8de9501df275a5fe29f2c64cb44f195130e4a8fc)
 1: /usr/bin/ceph-mds() [0x7f4b76]
 2: (()+0xf130) [0x7f7667517130]
 3: (gsignal()+0x39) [0x7f7666325989]
 4: (abort()+0x148) [0x7f7666327098]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f7666c389d5]
 6: (()+0x5e946) [0x7f7666c36946]
 7: (()+0x5e973) [0x7f7666c36973]
 8: (()+0x5eb9f) [0x7f7666c36b9f]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ef) [0x8daddf]
 10: /usr/bin/ceph-mds() [0x75d361]
 11: (Context::complete(int)+0x9) [0x56a829]
 12: (C_Gather::sub_finish(Context*, int)+0x227) [0x56c3a7]
 13: (C_Gather::C_GatherSub::finish(int)+0x12) [0x56c4b2]
 14: (Context::complete(int)+0x9) [0x56a829]
 15: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xf56) [0x7cf306]
 16: (MDS::handle_core_message(Message*)+0xb27) [0x58c447]
 17: (MDS::_dispatch(Message*)+0x45) [0x58c655]
 18: (MDS::ms_dispatch(Message*)+0xc3) [0x58e0a3]
 19: (DispatchQueue::entry()+0x509) [0x9a9409]
 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x8c205d]
 21: (()+0x7df3) [0x7f766750fdf3]
 22: (clone()+0x6d) [0x7f76663e63dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2014-06-24 10:49:07.490111 7f76628c6700 -1 *** Caught signal (Aborted) **
 in thread 7f76628c6700

 ceph version 0.81 (8de9501df275a5fe29f2c64cb44f195130e4a8fc)
 1: /usr/bin/ceph-mds() [0x7f4b76]
 2: (()+0xf130) [0x7f7667517130]
 3: (gsignal()+0x39) [0x7f7666325989]
 4: (abort()+0x148) [0x7f7666327098]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f7666c389d5]
 6: (()+0x5e946) [0x7f7666c36946]
 7: (()+0x5e973) [0x7f7666c36973]
 8: (()+0x5eb9f) [0x7f7666c36b9f]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ef) [0x8daddf]
 10: /usr/bin/ceph-mds() [0x75d361]
 11: (Context::complete(int)+0x9) [0x56a829]
 12: (C_Gather::sub_finish(Context*, int)+0x227) [0x56c3a7]
 13: (C_Gather::C_GatherSub::finish(int)+0x12) [0x56c4b2]
 14: (Context::complete(int)+0x9) [0x56a829]
 15: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xf56) [0x7cf306]
 16: (MDS::handle_core_message(Message*)+0xb27) [0x58c447]
 17: (MDS::_dispatch(Message*)+0x45) [0x58c655]
 18: (MDS::ms_dispatch(Message*)+0xc3) [0x58e0a3]
 19: (DispatchQueue::entry()+0x509) [0x9a9409]
 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x8c205d]
 21: (()+0x7df3) [0x7f766750fdf3]
 22: (clone()+0x6d) [0x7f76663e63dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

ceph-log.tar.bz (121 KB) ceph-log.tar.bz Kenneth Waegeman, 06/24/2014 02:15 AM
Actions #1

Updated by Greg Farnum almost 10 years ago

  • Status changed from New to Need More Info

The MDS got blacklisted, presumably because it got overloaded and stopped heartbeating the monitor or its MDS peers. Restart it and things should go fine. Unless you've done so and this is a repeating issue, in which case maybe you've found some other problem?

Keep in mind that multiple active MDSes are significantly less stable than a single MDS is, and you might just break your filesystem in a way that can't be recovered. Once you get things quiescent after restart, you might want to just stop the second one while your data is still accessible. :)

Actions #2

Updated by Sage Weil over 9 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
Actions #3

Updated by Sage Weil over 9 years ago

  • Description updated (diff)
Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Won't Fix

this MDS got blacklisted. there is an open issues somewhere to make the shutdown more friendly, but the behavior is correct

Actions

Also available in: Atom PDF