Bug #42365
closed
client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)
Added by Xiaoxi Chen over 4 years ago.
Updated over 3 years ago.
Backport:
octopus,nautilus
Description
2019-10-17 16:55:07.759 7f4553b80700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)' thread 7f4553b80700 time 2019-10-17 16:55:07.758889
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 1202: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f455f7aefbf]
2: (()+0x26d187) [0x7f455f7af187]
3: (Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)+0x1294) [0x5587b364db24]
4: (Client::insert_trace(MetaRequest*, MetaSession*)+0xd61) [0x5587b364eff1]
5: (Client::handle_client_reply(MClientReply*)+0x146) [0x5587b364f9a6]
6: (Client::ms_dispatch(Message*)+0x2ab) [0x5587b3659f5b]
7: (DispatchQueue::entry()+0xb7a) [0x7f455f86b39a]
8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f455f9092cd]
9: (()+0x7dd5) [0x7f455d7e5dd5]
10: (clone()+0x6d) [0x7f455c6beead]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
- Subject changed from Client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) to client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)
- Description updated (diff)
- Status changed from New to Need More Info
- Assignee deleted (
Zheng Yan)
- Priority changed from High to Normal
- Start date deleted (
10/18/2019)
- Affected Versions v13.2.5 added
- Component(FS) Client added
- Labels (FS) crash added
Can you also share any surrounding debug log messages.
-16> 2019-10-18 07:04:32.248 7fc23e902700 3 client.21378731 ll_lookup 0x10002372a42.head oracle -> 0 (1000239aa9b)
-15> 2019-10-18 07:04:32.248 7fc247113700 3 client.21378731 ll_readlink 0x1000239aa9b.head
-14> 2019-10-18 07:04:32.248 7fc247113700 3 client.21378731 ll_readlink 0x1000239aa9b.head = 23
-13> 2019-10-18 07:04:32.248 7fc23d0ff700 3 client.21378731 ll_lookup 0x10002372a42.head oracle
-12> 2019-10-18 07:04:32.248 7fc23d0ff700 3 client.21378731 may_lookup 0x564f55851080 = 0
-11> 2019-10-18 07:04:32.248 7fc23d0ff700 3 client.21378731 ll_lookup 0x10002372a42.head oracle -> 0 (1000239aa9b)
-10> 2019-10-18 07:04:32.248 7fc24390c700 3 client.21378731 ll_readlink 0x1000239aa9b.head
-9> 2019-10-18 07:04:32.248 7fc24390c700 3 client.21378731 ll_readlink 0x1000239aa9b.head = 23
-8> 2019-10-18 07:04:32.248 7fc241107700 3 client.21378731 ll_releasedir 0x564f6a862600
-7> 2019-10-18 07:04:32.260 7fc24f123700 5 -- 10.20.75.48:0/2350441790 >> 10.78.184.93:6801/4234655216 conn(0x564f55816400 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=6773 cs=1 l=0). rx mds.3 seq 2665334 0x564f6dbde580 client_reply(???:1778578 = 0 (0) Success) v1
-6> 2019-10-18 07:04:32.260 7fc24b91c700 1 -- 10.20.75.48:0/2350441790 <== mds.3 10.78.184.93:6801/4234655216 2665334 ==== client_reply(???:1778578 = 0 (0) Success) v1 ==== 116433+0+0 (3514735765 0 0) 0x564f6dbde580 con 0x564f55816400
-5> 2019-10-18 07:04:32.260 7fc24f924700 1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).read_bulk peer close file descriptor 2
-4> 2019-10-18 07:04:32.260 7fc24f924700 1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).read_until read failed
-3> 2019-10-18 07:04:32.260 7fc24f924700 1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).process read tag failed
-2> 2019-10-18 07:04:32.260 7fc24f924700 1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).fault on lossy channel, failing
-1> 2019-10-18 07:04:32.260 7fc24f924700 2 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1)._stop
0> 2019-10-18 07:04:32.264 7fc24b91c700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)' thread 7fc24b91c700 time 2019-10-18 07:04:32.267202
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 1202: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)
Logs before -16 are repeating of [-16, -11]
- Status changed from Need More Info to New
- Assignee set to Patrick Donnelly
hit 3 more times in 13.2.5, I had catch a coredump
ceph-post-file: 123801df-99cc-4c0a-a76c-9b6c8a614394
- Assignee deleted (
Patrick Donnelly)
14.2.9/15.2.2 are also the victims
- Affected Versions v13.2.6, v13.2.7, v13.2.8, v13.2.9, v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.2, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.2 added
- Status changed from New to Fix Under Review
- Backport set to octopus,nautilus
- Pull request ID set to 36672
- Assignee set to Zheng Yan
- Target version set to v16.0.0
- Affected Versions deleted (
v13.2.5, v13.2.6, v13.2.7, v13.2.8, v13.2.9, v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.2, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.2)
- Status changed from Fix Under Review to Pending Backport
- Copied to Backport #47259: nautilus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) added
- Copied to Backport #47260: octopus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) added
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Also available in: Atom
PDF