Project

General

Profile

Actions

Bug #42365

closed

client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)

Added by Xiaoxi Chen over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

 2019-10-17 16:55:07.759 7f4553b80700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)' thread 7f4553b80700 time 2019-10-17 16:55:07.758889
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 1202: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)

 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f455f7aefbf]
 2: (()+0x26d187) [0x7f455f7af187]
 3: (Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)+0x1294) [0x5587b364db24]
 4: (Client::insert_trace(MetaRequest*, MetaSession*)+0xd61) [0x5587b364eff1]
 5: (Client::handle_client_reply(MClientReply*)+0x146) [0x5587b364f9a6]
 6: (Client::ms_dispatch(Message*)+0x2ab) [0x5587b3659f5b]
 7: (DispatchQueue::entry()+0xb7a) [0x7f455f86b39a]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f455f9092cd]
 9: (()+0x7dd5) [0x7f455d7e5dd5]
 10: (clone()+0x6d) [0x7f455c6beead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

log.tar.gz (78.3 KB) log.tar.gz Xiaoxi Chen, 10/27/2019 03:58 PM

Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #47259: nautilus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)ResolvedWei-Chung ChengActions
Copied to CephFS - Backport #47260: octopus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)ResolvedNathan CutlerActions
Actions #1

Updated by Patrick Donnelly over 4 years ago

  • Subject changed from Client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) to client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)
  • Description updated (diff)
  • Status changed from New to Need More Info
  • Assignee deleted (Zheng Yan)
  • Priority changed from High to Normal
  • Start date deleted (10/18/2019)
  • Affected Versions v13.2.5 added
  • Component(FS) Client added
  • Labels (FS) crash added

Can you also share any surrounding debug log messages.

Actions #2

Updated by Xiaoxi Chen over 4 years ago

Actions #4

Updated by Xiaoxi Chen over 4 years ago

   -16> 2019-10-18 07:04:32.248 7fc23e902700  3 client.21378731 ll_lookup 0x10002372a42.head oracle -> 0 (1000239aa9b)
   -15> 2019-10-18 07:04:32.248 7fc247113700  3 client.21378731 ll_readlink 0x1000239aa9b.head
   -14> 2019-10-18 07:04:32.248 7fc247113700  3 client.21378731 ll_readlink 0x1000239aa9b.head = 23
   -13> 2019-10-18 07:04:32.248 7fc23d0ff700  3 client.21378731 ll_lookup 0x10002372a42.head oracle
   -12> 2019-10-18 07:04:32.248 7fc23d0ff700  3 client.21378731 may_lookup 0x564f55851080 = 0
   -11> 2019-10-18 07:04:32.248 7fc23d0ff700  3 client.21378731 ll_lookup 0x10002372a42.head oracle -> 0 (1000239aa9b)
   -10> 2019-10-18 07:04:32.248 7fc24390c700  3 client.21378731 ll_readlink 0x1000239aa9b.head
    -9> 2019-10-18 07:04:32.248 7fc24390c700  3 client.21378731 ll_readlink 0x1000239aa9b.head = 23
    -8> 2019-10-18 07:04:32.248 7fc241107700  3 client.21378731 ll_releasedir 0x564f6a862600
    -7> 2019-10-18 07:04:32.260 7fc24f123700  5 -- 10.20.75.48:0/2350441790 >> 10.78.184.93:6801/4234655216 conn(0x564f55816400 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=6773 cs=1 l=0). rx mds.3 seq 2665334 0x564f6dbde580 client_reply(???:1778578 = 0 (0) Success) v1
    -6> 2019-10-18 07:04:32.260 7fc24b91c700  1 -- 10.20.75.48:0/2350441790 <== mds.3 10.78.184.93:6801/4234655216 2665334 ==== client_reply(???:1778578 = 0 (0) Success) v1 ==== 116433+0+0 (3514735765 0 0) 0x564f6dbde580 con 0x564f55816400
    -5> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).read_bulk peer close file descriptor 2
    -4> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).read_until read failed
    -3> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).process read tag failed
    -2> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).fault on lossy channel, failing
    -1> 2019-10-18 07:04:32.260 7fc24f924700  2 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1)._stop
     0> 2019-10-18 07:04:32.264 7fc24b91c700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)' thread 7fc24b91c700 time 2019-10-18 07:04:32.267202
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 1202: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)

Logs before -16 are repeating of [-16, -11]

Actions #5

Updated by Xiaoxi Chen over 4 years ago

  • Status changed from Need More Info to New
Actions #6

Updated by Xiaoxi Chen over 4 years ago

  • Assignee set to Patrick Donnelly

hit 3 more times in 13.2.5, I had catch a coredump

Actions #7

Updated by Xiaoxi Chen over 4 years ago

ceph-post-file: 123801df-99cc-4c0a-a76c-9b6c8a614394

Actions #8

Updated by Patrick Donnelly over 4 years ago

  • Assignee deleted (Patrick Donnelly)
Actions #9

Updated by Xiaoxi Chen almost 4 years ago

14.2.9/15.2.2 are also the victims

Actions #10

Updated by Xiaoxi Chen almost 4 years ago

  • Affected Versions v13.2.6, v13.2.7, v13.2.8, v13.2.9, v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.2, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.2 added
Actions #11

Updated by Zheng Yan over 3 years ago

  • Status changed from New to Fix Under Review
  • Backport set to octopus,nautilus
  • Pull request ID set to 36672
Actions #12

Updated by Patrick Donnelly over 3 years ago

  • Assignee set to Zheng Yan
  • Target version set to v16.0.0
  • Affected Versions deleted (v13.2.5, v13.2.6, v13.2.7, v13.2.8, v13.2.9, v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.2, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.2)
Actions #13

Updated by Patrick Donnelly over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #14

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47259: nautilus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) added
Actions #15

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47260: octopus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) added
Actions #16

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF