Project

General

Profile

Actions

Bug #18802

open

Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1)))

Added by Dan van der Ster about 7 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A user just did:

I issued multiple tmpwatch cmd on lnxmicv37 to clean the volume, processes involved:

root     11771  0.0  0.0   4468   744 pts/3 D    09:14   0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/jkaplon
root     11772  0.0  0.0   4628   744 pts/3    D    09:14   0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/jmurdzek
root     11774  0.0  0.0   4468   744 pts/3    D    09:14   0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/wbialas

(/simulation is ceph-fuse 10.2.5).

Then immediately the MDS crashed:

   -10> 2017-02-03 09:14:58.310527 7f91ac759700  5 -- op tracker -- seq: 1321901, time: 2017-02-03 09:14:58.310527, event: d
one, op: client_request(client.330048536:40019 setattr mtime=2017-01-27 19:09:43.000000 atime=2017-01-27 19:09:43.000000 #10
00026f2f2 2017-02-03 09:14:58.307381)
    -9> 2017-02-03 09:14:58.310602 7f91b1063700  1 -- 188.184.83.152:6800/26960 --> 188.184.80.28:0/961909603 -- client_caps
(revoke ino 100001bcc61 6921525 seq 6 caps=pLsXs dirty=- wanted=- follows 0 size 0/0 ts 1/18446744073709551615 mtime 2017-02
-03 09:14:58.307906) v8 -- ?+0 0x7f91e61ea400 con 0x7f91c4c07180
    -8> 2017-02-03 09:14:58.310626 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.310626, event: f
ailed to xlock, waiting, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30
 11:08:28.000000 #100001bcc61 2017-02-03 09:14:58.310346)
    -7> 2017-02-03 09:14:58.310972 7f91b1063700  1 -- 188.184.83.152:6800/26960 <== client.330048536 188.184.80.28:0/9619096
03 55217 ==== client_caps(update ino 100001bcc61 6921525 seq 6 caps=pLsXs dirty=- wanted=Fx follows 0 size 0/0 ts 1/18446744
073709551615 mtime 2017-02-03 09:14:58.307906) v8 ==== 216+0+0 (2241182905 0 0) 0x7f91e61ea400 con 0x7f91c4c07180
    -6> 2017-02-03 09:14:58.311017 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311016, event: a
cquired locks, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30 11:08:28.
000000 #100001bcc61 2017-02-03 09:14:58.310346)
    -5> 2017-02-03 09:14:58.311052 7f91b1063700  1 -- 188.184.83.152:6800/26960 --> 188.184.80.28:0/961909603 -- client_repl
y(???:40023 = 0 (0) Success unsafe) v1 -- ?+0 0x7f91f7c9c680 con 0x7f91c4c07180
    -4> 2017-02-03 09:14:58.311070 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311069, event: e
arly_replied, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30 11:08:28.0
00000 #100001bcc61 2017-02-03 09:14:58.310346)
    -3> 2017-02-03 09:14:58.311086 7f91b1063700  5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311086, event: s
ubmit entry: journal_and_reply, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=201
6-11-30 11:08:28.000000 #100001bcc61 2017-02-03 09:14:58.310346)
    -2> 2017-02-03 09:14:58.312202 7f91b1063700  1 -- 188.184.83.152:6800/26960 <== client.330048536 188.184.80.28:0/9619096
03 55218 ==== client_request(client.330048536:40024 rmdir #100001bcc60/netlist 2017-02-03 09:14:58.312026) v3 ==== 217+0+0 (
3332271934 0 0) 0x7f91f7c9c680 con 0x7f91c4c07180
    -1> 2017-02-03 09:14:58.312275 7f91b1063700  5 -- op tracker -- seq: 1321906, time: 2017-02-03 09:14:58.312274, event: a
cquired locks, op: client_request(client.330048536:40024 rmdir #100001bcc60/netlist 2017-02-03 09:14:58.312026)
     0> 2017-02-03 09:14:58.315504 7f91b1063700 -1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&,
CInode*)' thread 7f91b1063700 time 2017-02-03 09:14:58.312289
mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1))

 ceph version 10.2.3-359-gae8e6fd (ae8e6fd21c045c6615249c7315cac7a90d22a3c5)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f91b76a2a75]
 2: (Server::_dir_is_nonempty(std::shared_ptr<MDRequestImpl>&, CInode*)+0xb0) [0x7f91b72ea650]
 3: (Server::handle_client_unlink(std::shared_ptr<MDRequestImpl>&)+0xe8c) [0x7f91b7314c4c]
 4: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0x8c3) [0x7f91b7333cc3]
 5: (Server::handle_client_request(MClientRequest*)+0x48c) [0x7f91b733444c]
 6: (Server::dispatch(Message*)+0x3eb) [0x7f91b73388bb]
 7: (MDSRank::handle_deferrable_message(Message*)+0x82c) [0x7f91b72bb59c]
 8: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f91b72c4b47]
 9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f91b72c5cd5]
 10: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f91b72ac413]
 11: (DispatchQueue::entry()+0x78a) [0x7f91b77a4a2a]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f91b76879bd]

We do not have a core dump. MDS log is attached.

ceph version 10.2.3-359-gae8e6fd (ae8e6fd21c045c6615249c7315cac7a90d22a3c5)


Files

ceph-mds.cephdwightmds0.log.gz (313 KB) ceph-mds.cephdwightmds0.log.gz Dan van der Ster, 02/03/2017 09:26 AM
Actions

Also available in: Atom PDF