Actions
Bug #7485
closedKilling MDS during 'creating' breaks subsequent startup (no snaptable)
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Pretty easy to reproduce: start MDS for first time on fresh cluster (I'm using vstart here), ctrl-c it promptly, try and start it up again, hit assertion on reading snaptable. Sure enough, rados ls says the object isn't there.
-50> 2014-02-20 06:17:44.869883 7fefe834e700 1 mds.0.7 handle_mds_map i am now mds.0.7 -49> 2014-02-20 06:17:44.869886 7fefe834e700 1 mds.0.7 handle_mds_map state change up:standby --> up:replay -48> 2014-02-20 06:17:44.869895 7fefe834e700 1 mds.0.7 replay_start -47> 2014-02-20 06:17:44.869906 7fefe834e700 7 mds.0.cache set_recovery_set -46> 2014-02-20 06:17:44.869917 7fefe834e700 1 mds.0.7 recovery set is -45> 2014-02-20 06:17:44.869919 7fefe834e700 1 mds.0.7 need osdmap epoch 15, have 14 -44> 2014-02-20 06:17:44.869921 7fefe834e700 1 mds.0.7 waiting for osdmap 15 (which blacklists prior instance) -43> 2014-02-20 06:17:44.869937 7fefe834e700 10 monclient: renew_subs -42> 2014-02-20 06:17:44.869946 7fefe834e700 10 monclient: _send_mon_message to mon.a at 172.16.79.181:6789/0 -41> 2014-02-20 06:17:44.869959 7fefe834e700 1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6789/0 -- mon_subscribe({mdsmap=18+,monmap=2+,osdmap=15}) v2 -- ?+0 0x3d338c0 con 0x3d2d580 -40> 2014-02-20 06:17:44.870848 7fefe834e700 1 -- 172.16.79.181:6815/7059 <== mon.0 172.16.79.181:6789/0 15 ==== osd_map(15..15 src has 1..15) v3 ==== 342+0+0 (3715003977 0 0) 0x3d3f480 con 0x3d2d580 -39> 2014-02-20 06:17:44.871028 7fefe834e700 2 mds.0.7 boot_start 1: opening inotable -38> 2014-02-20 06:17:44.871086 7fefe834e700 10 mds.0.inotable: load -37> 2014-02-20 06:17:44.871318 7fefe834e700 1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:1 mds0_inotable [read 0~0] 1.b852b893 ack+read e15) v4 -- ?+0 0x3d15d80 con 0x3d2dc60 -36> 2014-02-20 06:17:44.871387 7fefe834e700 2 mds.0.7 boot_start 1: opening sessionmap -35> 2014-02-20 06:17:44.871395 7fefe834e700 10 mds.0.sessionmap load -34> 2014-02-20 06:17:44.871437 7fefe834e700 1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:2 mds0_sessionmap [read 0~0] 1.3270c60b ack+read e15) v4 -- ?+0 0x3d15b40 con 0x3d2dc60 -33> 2014-02-20 06:17:44.871489 7fefe834e700 2 mds.0.7 boot_start 1: opening anchor table -32> 2014-02-20 06:17:44.871493 7fefe834e700 10 mds.0.anchortable: load -31> 2014-02-20 06:17:44.871521 7fefe834e700 1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:3 mds_anchortable [read 0~0] 1.a977f6a7 ack+read e15) v4 -- ?+0 0x3d3fb40 con 0x3d2dc60 -30> 2014-02-20 06:17:44.871532 7fefe834e700 2 mds.0.7 boot_start 1: opening snap table -29> 2014-02-20 06:17:44.871535 7fefe834e700 10 mds.0.snaptable: load -28> 2014-02-20 06:17:44.871687 7fefe5247700 10 mds.0.7 MDS::ms_get_authorizer type=osd -27> 2014-02-20 06:17:44.871691 7fefe834e700 1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6810/6696 -- osd_op(mds.0.7:4 mds_snaptable [read 0~0] 1.d90270ad ack+read e15) v4 -- ?+0 0x3d3f900 con 0x3d2db00 -26> 2014-02-20 06:17:44.871708 7fefe834e700 2 mds.0.7 boot_start 1: opening mds log -25> 2014-02-20 06:17:44.871710 7fefe834e700 5 mds.0.log open discovering log bounds -24> 2014-02-20 06:17:44.871726 7fefe834e700 1 mds.0.journaler(ro) recover start -23> 2014-02-20 06:17:44.871730 7fefe834e700 1 mds.0.journaler(ro) read_head -22> 2014-02-20 06:17:44.871765 7fefe834e700 1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:5 200.00000000 [read 0~0] 1.844f3494 ack+read e15) v4 -- ?+0 0x3d3f6c0 con 0x3d2dc60 -21> 2014-02-20 06:17:44.871858 7fefe834e700 1 -- 172.16.79.181:6815/7059 <== mon.0 172.16.79.181:6789/0 16 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3705050423 0 0) 0x3d46380 con 0x3d2d580 -20> 2014-02-20 06:17:44.871875 7fefe834e700 10 monclient: handle_subscribe_ack sent 2014-02-20 06:17:44.869940 renew after 2014-02-20 06:20:14.869940 -19> 2014-02-20 06:17:44.872070 7fefe5146700 10 mds.0.7 MDS::ms_get_authorizer type=osd -18> 2014-02-20 06:17:44.872293 7fefe834e700 5 mds.0.7 ms_handle_connect on 172.16.79.181:6805/6488 -17> 2014-02-20 06:17:44.872501 7fefe834e700 5 mds.0.7 ms_handle_connect on 172.16.79.181:6810/6696 -16> 2014-02-20 06:17:44.873581 7fefe834e700 1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 1 ==== osd_op_reply(1 mds0_inotable [read 0~0] v0'0 uv2 ondisk = 0) v6 ==== 180+0+34 (1740633793 0 1054821296) 0x3d44780 con 0x3d2dc60 -15> 2014-02-20 06:17:44.873668 7fefe834e700 10 mds.0.inotable: load_2 got 34 bytes -14> 2014-02-20 06:17:44.873673 7fefe834e700 10 mds.0.inotable: load_2 loaded v0 -13> 2014-02-20 06:17:44.874193 7fefe834e700 1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 2 ==== osd_op_reply(3 mds_anchortable [read 0~0] v0'0 uv5 ondisk = 0) v6 ==== 182+0+34 (271931579 0 2191498079) 0x3d44c80 con 0x3d2dc60 -12> 2014-02-20 06:17:44.874241 7fefe834e700 10 mds.0.anchortable: load_2 got 34 bytes -11> 2014-02-20 06:17:44.874245 7fefe834e700 10 mds.0.anchortable: load_2 loaded v0 -10> 2014-02-20 06:17:44.874269 7fefe834e700 1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 3 ==== osd_op_reply(2 mds0_sessionmap [read 0~0] v0'0 uv3 ondisk = 0) v6 ==== 182+0+22 (2821256041 0 3628653333) 0x3d44a00 con 0x3d2dc60 -9> 2014-02-20 06:17:44.874290 7fefe834e700 10 mds.0.sessionmap dump -8> 2014-02-20 06:17:44.874295 7fefe834e700 10 mds.0.sessionmap _load_finish v 0, 0 sessions, 22 bytes -7> 2014-02-20 06:17:44.874296 7fefe834e700 10 mds.0.sessionmap dump -6> 2014-02-20 06:17:44.874639 7fefe834e700 1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 4 ==== osd_op_reply(5 200.00000000 [read 0~0] v0'0 uv1 ondisk = 0) v6 ==== 179+0+84 (2808334475 0 2960925567) 0x3d44780 con 0x3d2dc60 -5> 2014-02-20 06:17:44.874712 7fefe834e700 1 mds.0.journaler(ro) _finish_read_head loghead(trim 4194304, expire 4194304, write 4194304). probing for end of log (from 4194304)... -4> 2014-02-20 06:17:44.874720 7fefe834e700 1 mds.0.journaler(ro) probing for end of the log -3> 2014-02-20 06:17:44.874808 7fefe834e700 1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:6 200.00000001 [stat] 1.6e5f474 ack+read+rwordered e15) v4 -- ?+0 0x3d3f480 con 0x3d2dc60 -2> 2014-02-20 06:17:44.875610 7fefe834e700 1 -- 172.16.79.181:6815/7059 <== osd.2 172.16.79.181:6810/6696 1 ==== osd_op_reply(4 mds_snaptable [read 0~0] v0'0 uv0 ack = -2 (No such file or directory)) v6 ==== 180+0+0 (2740985549 0 0) 0x3d44c80 con 0x3d2db00 -1> 2014-02-20 06:17:44.875689 7fefe834e700 10 mds.0.snaptable: load_2 could not read table: -2 0> 2014-02-20 06:17:44.883167 7fefe834e700 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread 7fefe834e700 time 2014-02-20 06:17:44.875695 mds/MDSTable.cc: 152: FAILED assert(r >= 0)
Log from the initial startup that was killed attached.
Files
Actions