Bug #18802: Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1))) - CephFS - Ceph

Actions

Copy link

Bug #18802

open

Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1)))

Added by Dan van der Ster about 7 years ago. Updated about 5 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v10.2.3

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

A user just did:

I issued multiple tmpwatch cmd on lnxmicv37 to clean the volume, processes involved:

root     11771  0.0  0.0   4468   744 pts/3 D    09:14   0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/jkaplon
root     11772  0.0  0.0   4628   744 pts/3    D    09:14   0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/jmurdzek
root     11774  0.0  0.0   4468   744 pts/3    D    09:14   0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/wbialas

(/simulation is ceph-fuse 10.2.5).

Then immediately the MDS crashed:

   -10> 2017-02-03 09:14:58.310527 7f91ac759700  5 -- op tracker -- seq: 1321901, time: 2017-02-03 09:14:58.310527, event: d
one, op: client_request(client.330048536:40019 setattr mtime=2017-01-27 19:09:43.000000 atime=2017-01-27 19:09:43.000000 #10
00026f2f2 2017-02-03 09:14:58.307381)
    -9> 2017-02-03 09:14:58.310602 7f91b1063700  1 -- 188.184.83.152:6800/26960 --> 188.184.80.28:0/961909603 -- client_caps
(revoke ino 100001bcc61 6921525 seq 6 caps=pLsXs dirty=- wanted=- follows 0 size 0/0 ts 1/18446744073709551615 mtime 2017-02
-03 09:14:58.307906) v8 -- ?+0 0x7f91e61ea400 con 0x7f91c4c07180
    -8> 2017-02-03 09:14:58.310626 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.310626, event: f
ailed to xlock, waiting, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30
 11:08:28.000000 #100001bcc61 2017-02-03 09:14:58.310346)
    -7> 2017-02-03 09:14:58.310972 7f91b1063700  1 -- 188.184.83.152:6800/26960 <== client.330048536 188.184.80.28:0/9619096
03 55217 ==== client_caps(update ino 100001bcc61 6921525 seq 6 caps=pLsXs dirty=- wanted=Fx follows 0 size 0/0 ts 1/18446744
073709551615 mtime 2017-02-03 09:14:58.307906) v8 ==== 216+0+0 (2241182905 0 0) 0x7f91e61ea400 con 0x7f91c4c07180
    -6> 2017-02-03 09:14:58.311017 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311016, event: a
cquired locks, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30 11:08:28.
000000 #100001bcc61 2017-02-03 09:14:58.310346)
    -5> 2017-02-03 09:14:58.311052 7f91b1063700  1 -- 188.184.83.152:6800/26960 --> 188.184.80.28:0/961909603 -- client_repl
y(???:40023 = 0 (0) Success unsafe) v1 -- ?+0 0x7f91f7c9c680 con 0x7f91c4c07180
    -4> 2017-02-03 09:14:58.311070 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311069, event: e
arly_replied, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30 11:08:28.0
00000 #100001bcc61 2017-02-03 09:14:58.310346)
    -3> 2017-02-03 09:14:58.311086 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311086, event: s
ubmit entry: journal_and_reply, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=201
6-11-30 11:08:28.000000 #100001bcc61 2017-02-03 09:14:58.310346)
    -2> 2017-02-03 09:14:58.312202 7f91b1063700  1 -- 188.184.83.152:6800/26960 <== client.330048536 188.184.80.28:0/9619096
03 55218 ==== client_request(client.330048536:40024 rmdir #100001bcc60/netlist 2017-02-03 09:14:58.312026) v3 ==== 217+0+0 (
3332271934 0 0) 0x7f91f7c9c680 con 0x7f91c4c07180
    -1> 2017-02-03 09:14:58.312275 7f91b1063700  5 -- op tracker -- seq: 1321906, time: 2017-02-03 09:14:58.312274, event: a
cquired locks, op: client_request(client.330048536:40024 rmdir #100001bcc60/netlist 2017-02-03 09:14:58.312026)
     0> 2017-02-03 09:14:58.315504 7f91b1063700 -1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&,
CInode*)' thread 7f91b1063700 time 2017-02-03 09:14:58.312289
mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1))

 ceph version 10.2.3-359-gae8e6fd (ae8e6fd21c045c6615249c7315cac7a90d22a3c5)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f91b76a2a75]
 2: (Server::_dir_is_nonempty(std::shared_ptr<MDRequestImpl>&, CInode*)+0xb0) [0x7f91b72ea650]
 3: (Server::handle_client_unlink(std::shared_ptr<MDRequestImpl>&)+0xe8c) [0x7f91b7314c4c]
 4: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0x8c3) [0x7f91b7333cc3]
 5: (Server::handle_client_request(MClientRequest*)+0x48c) [0x7f91b733444c]
 6: (Server::dispatch(Message*)+0x3eb) [0x7f91b73388bb]
 7: (MDSRank::handle_deferrable_message(Message*)+0x82c) [0x7f91b72bb59c]
 8: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f91b72c4b47]
 9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f91b72c5cd5]
 10: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f91b72ac413]
 11: (DispatchQueue::entry()+0x78a) [0x7f91b77a4a2a]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f91b76879bd]

We do not have a core dump. MDS log is attached.

ceph version 10.2.3-359-gae8e6fd (ae8e6fd21c045c6615249c7315cac7a90d22a3c5)

Files

ceph-mds.cephdwightmds0.log.gz (313 KB) ceph-mds.cephdwightmds0.log.gz

Dan van der Ster, 02/03/2017 09:26 AM

Actions

Copy link

Updated by Zheng Yan about 7 years ago

duplicate of http://tracker.ceph.com/issues/18578. The fix is pending backport

Actions

Copy link

Updated by Dan van der Ster about 7 years ago

Excellent! Sorry that my search of the tracker didn't find that issue.

We'll apply that backport when it's ready.

Actions

Copy link

Updated by Dan van der Ster about 7 years ago

Actually, one more thing: following this crash, the clients did not fail over to the standby MDS. Processes accessing cephfs were stuck, and df -h on the machine was also stuck.

After I failed the MDS back to the original, all clients reconnected and resumed normal operations.

In ceph.log we see:

2017-02-03 09:14:58.731928 mon.0 128.142.36.227:6790/0 9236965 : cluster [INF] mds.? 188.184.83.152:6801/6291 up:boot
2017-02-03 09:14:58.737073 mon.0 128.142.36.227:6790/0 9236967 : cluster [INF] fsmap e2994: 1/1/1 up {0=cephdwightmds1=up:replay}, 1 up:standby
2017-02-03 09:14:58.739275 mon.0 128.142.36.227:6790/0 9236968 : cluster [INF] fsmap e2995: 1/1/1 up {0=cephdwightmds1=up:replay}, 1 up:standby-replay
2017-02-03 09:14:59.781953 mon.0 128.142.36.227:6790/0 9236970 : cluster [INF] mds.0 188.184.95.158:6801/4471 up:reconnect
2017-02-03 09:14:59.782029 mon.0 128.142.36.227:6790/0 9236971 : cluster [INF] fsmap e2996: 1/1/1 up {0=cephdwightmds1=up:reconnect}, 1 up:standby-replay
2017-02-03 09:15:06.005984 mon.0 128.142.36.227:6790/0 9236996 : cluster [INF] mds.0 188.184.95.158:6801/4471 up:rejoin
2017-02-03 09:15:06.006059 mon.0 128.142.36.227:6790/0 9236997 : cluster [INF] fsmap e2997: 1/1/1 up {0=cephdwightmds1=up:rejoin}, 1 up:standby-replay
2017-02-03 09:15:06.948626 mon.0 128.142.36.227:6790/0 9236999 : cluster [INF] HEALTH_WARN; mds cluster is degraded
2017-02-03 09:15:07.012406 mon.0 128.142.36.227:6790/0 9237000 : cluster [INF] mds.0 188.184.95.158:6801/4471 up:clientreplay
2017-02-03 09:15:07.012494 mon.0 128.142.36.227:6790/0 9237001 : cluster [INF] fsmap e2998: 1/1/1 up {0=cephdwightmds1=up:clientreplay}, 1 up:standby-replay
2017-02-03 09:15:08.023846 mon.0 128.142.36.227:6790/0 9237004 : cluster [INF] mds.0 188.184.95.158:6801/4471 up:active
2017-02-03 09:15:08.023922 mon.0 128.142.36.227:6790/0 9237005 : cluster [INF] fsmap e2999: 1/1/1 up {0=cephdwightmds1=up:active}, 1 up:standby-replay

We also see lots of messages like this:

2017-02-03 09:15:07.952323 mds.0 188.184.95.158:6801/4471 105 : cluster [INF] closing stale session client.332303319 128.142.160.15:0/4281977287 after 234524.963289
2017-02-03 09:15:07.952384 mds.0 188.184.95.158:6801/4471 106 : cluster [INF] closing stale session client.332178733 128.142.194.42:0/1273573677 after 234524.963293

Clients are using 10.2.5 ceph-fuse.
Sounds similar to this? http://tracker.ceph.com/issues/18757

Actions

Copy link

Updated by John Spray about 7 years ago

Subject changed from mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1)) to Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1)))

Hmm, client's failing to participate in failover is probably not the same as #18757, as that one was the result of clients getting evicted, whereas eviction generally doesn't happen during failover unless the clients happen to lose connectivity to the MDS at the same time.

In this instance it seems more like maybe the clients sessions are going stale because they for some reason didn't get the memo about the new MDS -- a "debug client = 20" client log from when the failover happens would probably give us an idea of what the client thinks its doing.

Actions

Copy link