Actions
Bug #18802
openJewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1)))
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
A user just did:
I issued multiple tmpwatch cmd on lnxmicv37 to clean the volume, processes involved: root 11771 0.0 0.0 4468 744 pts/3 D 09:14 0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/jkaplon root 11772 0.0 0.0 4628 744 pts/3 D 09:14 0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/jmurdzek root 11774 0.0 0.0 4468 744 pts/3 D 09:14 0:00 /usr/sbin/tmpwatch -mc -X /simulation/*/SaveDir -X /simulation/*/toPreserve 15d /simulation/wbialas
(/simulation is ceph-fuse 10.2.5).
Then immediately the MDS crashed:
-10> 2017-02-03 09:14:58.310527 7f91ac759700 5 -- op tracker -- seq: 1321901, time: 2017-02-03 09:14:58.310527, event: d one, op: client_request(client.330048536:40019 setattr mtime=2017-01-27 19:09:43.000000 atime=2017-01-27 19:09:43.000000 #10 00026f2f2 2017-02-03 09:14:58.307381) -9> 2017-02-03 09:14:58.310602 7f91b1063700 1 -- 188.184.83.152:6800/26960 --> 188.184.80.28:0/961909603 -- client_caps (revoke ino 100001bcc61 6921525 seq 6 caps=pLsXs dirty=- wanted=- follows 0 size 0/0 ts 1/18446744073709551615 mtime 2017-02 -03 09:14:58.307906) v8 -- ?+0 0x7f91e61ea400 con 0x7f91c4c07180 -8> 2017-02-03 09:14:58.310626 7f91b1063700 5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.310626, event: f ailed to xlock, waiting, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30 11:08:28.000000 #100001bcc61 2017-02-03 09:14:58.310346) -7> 2017-02-03 09:14:58.310972 7f91b1063700 1 -- 188.184.83.152:6800/26960 <== client.330048536 188.184.80.28:0/9619096 03 55217 ==== client_caps(update ino 100001bcc61 6921525 seq 6 caps=pLsXs dirty=- wanted=Fx follows 0 size 0/0 ts 1/18446744 073709551615 mtime 2017-02-03 09:14:58.307906) v8 ==== 216+0+0 (2241182905 0 0) 0x7f91e61ea400 con 0x7f91c4c07180 -6> 2017-02-03 09:14:58.311017 7f91b1063700 5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311016, event: a cquired locks, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30 11:08:28. 000000 #100001bcc61 2017-02-03 09:14:58.310346) -5> 2017-02-03 09:14:58.311052 7f91b1063700 1 -- 188.184.83.152:6800/26960 --> 188.184.80.28:0/961909603 -- client_repl y(???:40023 = 0 (0) Success unsafe) v1 -- ?+0 0x7f91f7c9c680 con 0x7f91c4c07180 -4> 2017-02-03 09:14:58.311070 7f91b1063700 5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311069, event: e arly_replied, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=2016-11-30 11:08:28.0 00000 #100001bcc61 2017-02-03 09:14:58.310346) -3> 2017-02-03 09:14:58.311086 7f91b1063700 5 -- op tracker -- seq: 1321905, time: 2017-02-03 09:14:58.311086, event: s ubmit entry: journal_and_reply, op: client_request(client.330048536:40023 setattr mtime=2016-11-30 11:08:28.000000 atime=201 6-11-30 11:08:28.000000 #100001bcc61 2017-02-03 09:14:58.310346) -2> 2017-02-03 09:14:58.312202 7f91b1063700 1 -- 188.184.83.152:6800/26960 <== client.330048536 188.184.80.28:0/9619096 03 55218 ==== client_request(client.330048536:40024 rmdir #100001bcc60/netlist 2017-02-03 09:14:58.312026) v3 ==== 217+0+0 ( 3332271934 0 0) 0x7f91f7c9c680 con 0x7f91c4c07180 -1> 2017-02-03 09:14:58.312275 7f91b1063700 5 -- op tracker -- seq: 1321906, time: 2017-02-03 09:14:58.312274, event: a cquired locks, op: client_request(client.330048536:40024 rmdir #100001bcc60/netlist 2017-02-03 09:14:58.312026) 0> 2017-02-03 09:14:58.315504 7f91b1063700 -1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f91b1063700 time 2017-02-03 09:14:58.312289 mds/Server.cc: 6003: FAILED assert(in->filelock.can_read(-1)) ceph version 10.2.3-359-gae8e6fd (ae8e6fd21c045c6615249c7315cac7a90d22a3c5) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f91b76a2a75] 2: (Server::_dir_is_nonempty(std::shared_ptr<MDRequestImpl>&, CInode*)+0xb0) [0x7f91b72ea650] 3: (Server::handle_client_unlink(std::shared_ptr<MDRequestImpl>&)+0xe8c) [0x7f91b7314c4c] 4: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0x8c3) [0x7f91b7333cc3] 5: (Server::handle_client_request(MClientRequest*)+0x48c) [0x7f91b733444c] 6: (Server::dispatch(Message*)+0x3eb) [0x7f91b73388bb] 7: (MDSRank::handle_deferrable_message(Message*)+0x82c) [0x7f91b72bb59c] 8: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f91b72c4b47] 9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f91b72c5cd5] 10: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f91b72ac413] 11: (DispatchQueue::entry()+0x78a) [0x7f91b77a4a2a] 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f91b76879bd]
We do not have a core dump. MDS log is attached.
ceph version 10.2.3-359-gae8e6fd (ae8e6fd21c045c6615249c7315cac7a90d22a3c5)
Files
Actions