Bug #36029
openceph-fuse assert failed when try to do file lock
0%
Description
When I user ceph-fuse client to access cephfs, I found one client panic and print such stack as below.
The ceph cluster version is 13.2.1(I just upgrade it from 12.2.7 to 13.2.1 three days ago), ceph-fuse client version is 12.2.7;
I also upload the ceph-fuse client log file, find it in the attachment.
================
2018-09-17 14:28:06.134711 7f22332d7700 -1 /build/ceph-12.2.7/src/client/Client.cc: In function 'void Client::_update_lock_state(flock*, uint64_t, ceph_lock_state_t*)' thread 7f22332d7700 time 2018-09-17 14:28:06.132653
/build/ceph-12.2.7/src/client/Client.cc: 9971: FAILED assert(r)
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55aedc171072]
2: (Client::_update_lock_state(flock*, unsigned long, ceph_lock_state_t*)+0x161) [0x55aedc08f141]
3: (Client::_do_filelock(Inode*, Fh*, int, int, int, flock*, unsigned long, bool)+0x307) [0x55aedc0cae57]
4: (Client::_setlk(Fh*, flock*, unsigned long, int)+0x195) [0x55aedc0cd895]
5: (Client::ll_setlk(Fh*, flock*, unsigned long, int)+0x9d) [0x55aedc0cdaed]
6: (()+0x211f7e) [0x55aedc082f7e]
7: (()+0x13e6c) [0x7f224a382e6c]
8: (()+0x15679) [0x7f224a384679]
9: (()+0x11e38) [0x7f224a380e38]
10: (()+0x76ba) [0x7f22497b06ba]
11: (clone()+0x6d) [0x7f224861841d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Updated by John Spray over 5 years ago
- Project changed from Ceph to CephFS
- Category set to Correctness/Safety
- Component(FS) ceph-fuse added
Updated by Patrick Donnelly over 5 years ago
- Assignee set to Patrick Donnelly
- Target version set to v14.0.0
- Source set to Community (user)
- Backport set to mimic,luminous
Updated by Patrick Donnelly about 5 years ago
- Assignee deleted (
Patrick Donnelly)
Updated by Patrick Donnelly about 5 years ago
- Target version changed from v14.0.0 to v15.0.0
Updated by Xiaoxi Chen almost 5 years ago
We hit the bug as well, is there any PR targeting this bug somewhere? Seems like the related code _update_lock_state and mds/flock.cc are not getting changed even in master.
It crashed in Mimic 13.2.5
-16> 2019-06-27 23:16:13.221 7fc024f26700 10 monclient: _send_mon_message to mon.rnoaz01cephmon01 at 10.78.142.191:6789/0 -15> 2019-06-27 23:16:13.221 7fc024f26700 1 -- 10.20.75.98:0/332706481 --> 10.78.142.191:6789/0 -- statfs(6128798 pool -1 v22411697) v2 -- 0x55669c60e200 con 0 -14> 2019-06-27 23:16:13.221 7fc02c735700 5 -- 10.20.75.98:0/332706481 >> 10.78.142.191:6789/0 conn(0x55669b053200 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=144466 cs=1 l=1). rx mon.3 seq 73845 0x5566afa8fa00 statfs_reply(6128798) v1 -13> 2019-06-27 23:16:13.221 7fc02972f700 1 -- 10.20.75.98:0/332706481 <== mon.3 10.78.142.191:6789/0 73845 ==== statfs_reply(6128798) v1 ==== 56+0+0 (2981128256 0 0) 0x5566afa8fa00 con 0x55669b053200 -12> 2019-06-27 23:16:13.221 7fc024f26700 1 -- 10.20.75.98:0/332706481 --> 10.75.108.192:6801/1778692466 -- client_request(unknown.0:46283287 getattr - #0x10002372a42 2019-06-27 23:16:13.223464 caller_uid=506, caller_gid=21{21,2400,2403,2404,2405,2411,2412,2413,2414,2415,2416,2417,2418,}) v4 -- 0x5566ae84a000 con 0 -11> 2019-06-27 23:16:13.221 7fc02d737700 5 -- 10.20.75.98:0/332706481 >> 10.75.108.192:6801/1778692466 conn(0x55669b053e00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=974134 cs=1 l=0). rx mds.2 seq 115946 0x55669b962b00 client_reply(???:46283287 = 0 (0) Success) v1 -10> 2019-06-27 23:16:13.221 7fc02972f700 1 -- 10.20.75.98:0/332706481 <== mds.2 10.75.108.192:6801/1778692466 115946 ==== client_reply(???:46283287 = 0 (0) Success) v1 ==== 462+0+0 (4089035950 0 0) 0x55669b962b00 con 0x55669b053e00 -9> 2019-06-27 23:16:13.221 7fc021f20700 3 client.14826953 ll_setlk (fh) 0x5566b3acb040 0x400019b859c -8> 2019-06-27 23:16:13.221 7fc021f20700 1 -- 10.20.75.98:0/332706481 --> 10.174.192.122:6801/1638065680 -- client_request(unknown.0:46283288 setfilelock rule 1, type 2, owner 10157075143593042163, pid 18111, start 0, length 0, wait 0 #0x400019b859c 2019-06-27 23:16:13.224730 caller_uid=506, caller_gid=21{21,2400,2403,2404,2405,2411,2412,2413,2414,2415,2416,2417,2418,}) v4 -- 0x5566bc45a900 con 0 -7> 2019-06-27 23:16:13.221 7fc02cf36700 5 -- 10.20.75.98:0/332706481 >> 10.174.192.122:6801/1638065680 conn(0x5566c23b6000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2727 cs=1 l=0). rx mds.3 seq 13742835 0x5566a4ae6580 client_reply(???:46283288 = 0 (0) Success safe) v1 -6> 2019-06-27 23:16:13.221 7fc02972f700 1 -- 10.20.75.98:0/332706481 <== mds.3 10.174.192.122:6801/1638065680 13742835 ==== client_reply(???:46283288 = 0 (0) Success safe) v1 ==== 27+0+0 (1141010652 0 0) 0x5566a4ae6580 con 0x5566c23b6000 -5> 2019-06-27 23:16:13.221 7fc02d737700 1 -- 10.20.75.98:0/332706481 >> 10.75.36.38:6809/39593 conn(0x5566cfcb9800 :-1 s=STATE_OPEN pgs=300072 cs=1 l=1).read_bulk peer close file descriptor 2 -4> 2019-06-27 23:16:13.221 7fc02d737700 1 -- 10.20.75.98:0/332706481 >> 10.75.36.38:6809/39593 conn(0x5566cfcb9800 :-1 s=STATE_OPEN pgs=300072 cs=1 l=1).read_until read failed -3> 2019-06-27 23:16:13.221 7fc02d737700 1 -- 10.20.75.98:0/332706481 >> 10.75.36.38:6809/39593 conn(0x5566cfcb9800 :-1 s=STATE_OPEN pgs=300072 cs=1 l=1).process read tag failed -2> 2019-06-27 23:16:13.221 7fc02d737700 1 -- 10.20.75.98:0/332706481 >> 10.75.36.38:6809/39593 conn(0x5566cfcb9800 :-1 s=STATE_OPEN pgs=300072 cs=1 l=1).fault on lossy channel, failing -1> 2019-06-27 23:16:13.221 7fc02d737700 2 -- 10.20.75.98:0/332706481 >> 10.75.36.38:6809/39593 conn(0x5566cfcb9800 :-1 s=STATE_OPEN pgs=300072 cs=1 l=1)._stop 0> 2019-06-27 23:16:13.221 7fc021f20700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::_update_lock_state(flock*, uint64_t, ceph_lock_state_t*)' thread 7fc021f20700 time 2019-06-27 23:16:13.225473 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 10043: FAILED assert(r) ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7fc03535dfbf] 2: (()+0x26d187) [0x7fc03535e187] 3: (Client::_update_lock_state(flock*, unsigned long, ceph_lock_state_t*)+0x12e) [0x556699e5f3fe] 4: (Client::_do_filelock(Inode*, Fh*, int, int, int, flock*, unsigned long, bool)+0x348) [0x556699ea2038] 5: (Client::_setlk(Fh*, flock*, unsigned long, int)+0x54) [0x556699ea52d4] 6: (Client::ll_setlk(Fh*, flock*, unsigned long, int)+0x192) [0x556699ea56c2] 7: (()+0x52fdd) [0x556699e56fdd] 8: (()+0x153ac) [0x7fc03dd833ac] 9: (()+0x16b6b) [0x7fc03dd84b6b] 10: (()+0x13401) [0x7fc03dd81401] 11: (()+0x7dd5) [0x7fc033394dd5] 12: (clone()+0x6d) [0x7fc03226dead] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.