Bug #7485: Killing MDS during 'creating' breaks subsequent startup (no snaptable) - CephFS - Ceph

Actions

Copy link

Bug #7485

closed

Killing MDS during 'creating' breaks subsequent startup (no snaptable)

Added by John Spray about 10 years ago. Updated about 10 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Pretty easy to reproduce: start MDS for first time on fresh cluster (I'm using vstart here), ctrl-c it promptly, try and start it up again, hit assertion on reading snaptable. Sure enough, rados ls says the object isn't there.

   -50> 2014-02-20 06:17:44.869883 7fefe834e700  1 mds.0.7 handle_mds_map i am now mds.0.7
   -49> 2014-02-20 06:17:44.869886 7fefe834e700  1 mds.0.7 handle_mds_map state change up:standby --> up:replay
   -48> 2014-02-20 06:17:44.869895 7fefe834e700  1 mds.0.7 replay_start
   -47> 2014-02-20 06:17:44.869906 7fefe834e700  7 mds.0.cache set_recovery_set
   -46> 2014-02-20 06:17:44.869917 7fefe834e700  1 mds.0.7  recovery set is
   -45> 2014-02-20 06:17:44.869919 7fefe834e700  1 mds.0.7  need osdmap epoch 15, have 14
   -44> 2014-02-20 06:17:44.869921 7fefe834e700  1 mds.0.7  waiting for osdmap 15 (which blacklists prior instance)
   -43> 2014-02-20 06:17:44.869937 7fefe834e700 10 monclient: renew_subs
   -42> 2014-02-20 06:17:44.869946 7fefe834e700 10 monclient: _send_mon_message to mon.a at 172.16.79.181:6789/0
   -41> 2014-02-20 06:17:44.869959 7fefe834e700  1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6789/0 -- mon_subscribe({mdsmap=18+,monmap=2+,osdmap=15}) v2 -- ?+0 0x3d338c0 con 0x3d2d580
   -40> 2014-02-20 06:17:44.870848 7fefe834e700  1 -- 172.16.79.181:6815/7059 <== mon.0 172.16.79.181:6789/0 15 ==== osd_map(15..15 src has 1..15) v3 ==== 342+0+0 (3715003977 0 0) 0x3d3f480 con 0x3d2d580
   -39> 2014-02-20 06:17:44.871028 7fefe834e700  2 mds.0.7 boot_start 1: opening inotable
   -38> 2014-02-20 06:17:44.871086 7fefe834e700 10 mds.0.inotable: load
   -37> 2014-02-20 06:17:44.871318 7fefe834e700  1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:1 mds0_inotable [read 0~0] 1.b852b893 ack+read e15) v4 -- ?+0 0x3d15d80 con 0x3d2dc60
   -36> 2014-02-20 06:17:44.871387 7fefe834e700  2 mds.0.7 boot_start 1: opening sessionmap
   -35> 2014-02-20 06:17:44.871395 7fefe834e700 10 mds.0.sessionmap load
   -34> 2014-02-20 06:17:44.871437 7fefe834e700  1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:2 mds0_sessionmap [read 0~0] 1.3270c60b ack+read e15) v4 -- ?+0 0x3d15b40 con 0x3d2dc60
   -33> 2014-02-20 06:17:44.871489 7fefe834e700  2 mds.0.7 boot_start 1: opening anchor table
   -32> 2014-02-20 06:17:44.871493 7fefe834e700 10 mds.0.anchortable: load
   -31> 2014-02-20 06:17:44.871521 7fefe834e700  1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:3 mds_anchortable [read 0~0] 1.a977f6a7 ack+read e15) v4 -- ?+0 0x3d3fb40 con 0x3d2dc60
   -30> 2014-02-20 06:17:44.871532 7fefe834e700  2 mds.0.7 boot_start 1: opening snap table
   -29> 2014-02-20 06:17:44.871535 7fefe834e700 10 mds.0.snaptable: load
   -28> 2014-02-20 06:17:44.871687 7fefe5247700 10 mds.0.7 MDS::ms_get_authorizer type=osd
   -27> 2014-02-20 06:17:44.871691 7fefe834e700  1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6810/6696 -- osd_op(mds.0.7:4 mds_snaptable [read 0~0] 1.d90270ad ack+read e15) v4 -- ?+0 0x3d3f900 con 0x3d2db00
   -26> 2014-02-20 06:17:44.871708 7fefe834e700  2 mds.0.7 boot_start 1: opening mds log
   -25> 2014-02-20 06:17:44.871710 7fefe834e700  5 mds.0.log open discovering log bounds
   -24> 2014-02-20 06:17:44.871726 7fefe834e700  1 mds.0.journaler(ro) recover start
   -23> 2014-02-20 06:17:44.871730 7fefe834e700  1 mds.0.journaler(ro) read_head
   -22> 2014-02-20 06:17:44.871765 7fefe834e700  1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:5 200.00000000 [read 0~0] 1.844f3494 ack+read e15) v4 -- ?+0 0x3d3f6c0 con 0x3d2dc60
   -21> 2014-02-20 06:17:44.871858 7fefe834e700  1 -- 172.16.79.181:6815/7059 <== mon.0 172.16.79.181:6789/0 16 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3705050423 0 0) 0x3d46380 con 0x3d2d580
   -20> 2014-02-20 06:17:44.871875 7fefe834e700 10 monclient: handle_subscribe_ack sent 2014-02-20 06:17:44.869940 renew after 2014-02-20 06:20:14.869940
   -19> 2014-02-20 06:17:44.872070 7fefe5146700 10 mds.0.7 MDS::ms_get_authorizer type=osd
   -18> 2014-02-20 06:17:44.872293 7fefe834e700  5 mds.0.7 ms_handle_connect on 172.16.79.181:6805/6488
   -17> 2014-02-20 06:17:44.872501 7fefe834e700  5 mds.0.7 ms_handle_connect on 172.16.79.181:6810/6696
   -16> 2014-02-20 06:17:44.873581 7fefe834e700  1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 1 ==== osd_op_reply(1 mds0_inotable [read 0~0] v0'0 uv2 ondisk = 0) v6 ==== 180+0+34 (1740633793 0 1054821296) 0x3d44780 con 0x3d2dc60
   -15> 2014-02-20 06:17:44.873668 7fefe834e700 10 mds.0.inotable: load_2 got 34 bytes
   -14> 2014-02-20 06:17:44.873673 7fefe834e700 10 mds.0.inotable: load_2 loaded v0
   -13> 2014-02-20 06:17:44.874193 7fefe834e700  1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 2 ==== osd_op_reply(3 mds_anchortable [read 0~0] v0'0 uv5 ondisk = 0) v6 ==== 182+0+34 (271931579 0 2191498079) 0x3d44c80 con 0x3d2dc60
   -12> 2014-02-20 06:17:44.874241 7fefe834e700 10 mds.0.anchortable: load_2 got 34 bytes
   -11> 2014-02-20 06:17:44.874245 7fefe834e700 10 mds.0.anchortable: load_2 loaded v0
   -10> 2014-02-20 06:17:44.874269 7fefe834e700  1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 3 ==== osd_op_reply(2 mds0_sessionmap [read 0~0] v0'0 uv3 ondisk = 0) v6 ==== 182+0+22 (2821256041 0 3628653333) 0x3d44a00 con 0x3d2dc60
    -9> 2014-02-20 06:17:44.874290 7fefe834e700 10 mds.0.sessionmap dump
    -8> 2014-02-20 06:17:44.874295 7fefe834e700 10 mds.0.sessionmap _load_finish v 0, 0 sessions, 22 bytes
    -7> 2014-02-20 06:17:44.874296 7fefe834e700 10 mds.0.sessionmap dump
    -6> 2014-02-20 06:17:44.874639 7fefe834e700  1 -- 172.16.79.181:6815/7059 <== osd.1 172.16.79.181:6805/6488 4 ==== osd_op_reply(5 200.00000000 [read 0~0] v0'0 uv1 ondisk = 0) v6 ==== 179+0+84 (2808334475 0 2960925567) 0x3d44780 con 0x3d2dc60
    -5> 2014-02-20 06:17:44.874712 7fefe834e700  1 mds.0.journaler(ro) _finish_read_head loghead(trim 4194304, expire 4194304, write 4194304).  probing for end of log (from 4194304)...
    -4> 2014-02-20 06:17:44.874720 7fefe834e700  1 mds.0.journaler(ro) probing for end of the log
    -3> 2014-02-20 06:17:44.874808 7fefe834e700  1 -- 172.16.79.181:6815/7059 --> 172.16.79.181:6805/6488 -- osd_op(mds.0.7:6 200.00000001 [stat] 1.6e5f474 ack+read+rwordered e15) v4 -- ?+0 0x3d3f480 con 0x3d2dc60
    -2> 2014-02-20 06:17:44.875610 7fefe834e700  1 -- 172.16.79.181:6815/7059 <== osd.2 172.16.79.181:6810/6696 1 ==== osd_op_reply(4 mds_snaptable [read 0~0] v0'0 uv0 ack = -2 (No such file or directory)) v6 ==== 180+0+0 (2740985549 0 0) 0x3d44c80 con 0x3d2db00
    -1> 2014-02-20 06:17:44.875689 7fefe834e700 10 mds.0.snaptable: load_2 could not read table: -2
     0> 2014-02-20 06:17:44.883167 7fefe834e700 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread 7fefe834e700 time 2014-02-20 06:17:44.875695
mds/MDSTable.cc: 152: FAILED assert(r >= 0)

Log from the initial startup that was killed attached.

Files

mds.a.log (19.3 KB) mds.a.log

John Spray, 02/20/2014 06:23 AM

Actions

Copy link

Updated by John Spray about 10 years ago

MDS -1 gid 1 starts in BOOTING, sends a beacon
MON prepare_beacon records its existence and puts it into state STANDBY rank -1
MDS -1 gid 1 goes into STANDBY
MON tick notices that in_mds < max_mds and selects gid 1 to expand the cluster, puts it into state CREATING and assigns rank 0
MDS 0 gid 1 goes into CREATING, starts creation
< terminate gid 1 here >
MON notices gid 1 has gone laggy
MDS -1 gid 2 starts in BOOTING, sends a beacon
MON prepare_beacon notices gid 2 has same name as gid 1, calls fail_mds_gid(1). rank 0 is now in the 'failed' list, gid 1 is erased from pending_mdsmap.info. At this point we've lost all memory that we were ever in the CREATING state.
MON prepare_beacon continues, puts gid 2 into state STANDBY
MDS -1 gid 2 goes into STANDBY
MON tick notices that failed is not empty, picks gid 2 as a replacement for rank 0. Sets state to REPLAY.

If we had disabled enforce_unique_name, or if the new gid had been from a different MDS daemon, then we would have hit and earlier laggy check in tick(), which does correctly carry forward the CREATING state from the gid it's replacing.

Actions

Copy link