Project

General

Profile

Bug #42365

client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)

Added by Xiaoxi Chen 11 months ago. Updated 19 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
crash
Pull request ID:
Crash signature:

Description

 2019-10-17 16:55:07.759 7f4553b80700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)' thread 7f4553b80700 time 2019-10-17 16:55:07.758889
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 1202: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)

 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f455f7aefbf]
 2: (()+0x26d187) [0x7f455f7af187]
 3: (Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)+0x1294) [0x5587b364db24]
 4: (Client::insert_trace(MetaRequest*, MetaSession*)+0xd61) [0x5587b364eff1]
 5: (Client::handle_client_reply(MClientReply*)+0x146) [0x5587b364f9a6]
 6: (Client::ms_dispatch(Message*)+0x2ab) [0x5587b3659f5b]
 7: (DispatchQueue::entry()+0xb7a) [0x7f455f86b39a]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f455f9092cd]
 9: (()+0x7dd5) [0x7f455d7e5dd5]
 10: (clone()+0x6d) [0x7f455c6beead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

log.tar.gz (78.3 KB) Xiaoxi Chen, 10/27/2019 03:58 PM


Related issues

Copied to fs - Backport #47259: nautilus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) In Progress
Copied to fs - Backport #47260: octopus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) New

History

#1 Updated by Patrick Donnelly 11 months ago

  • Subject changed from Client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) to client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)
  • Description updated (diff)
  • Status changed from New to Need More Info
  • Assignee deleted (Zheng Yan)
  • Priority changed from High to Normal
  • Start date deleted (10/18/2019)
  • Affected Versions v13.2.5 added
  • Component(FS) Client added
  • Labels (FS) crash added

Can you also share any surrounding debug log messages.

#2 Updated by Xiaoxi Chen 11 months ago

#4 Updated by Xiaoxi Chen 11 months ago

   -16> 2019-10-18 07:04:32.248 7fc23e902700  3 client.21378731 ll_lookup 0x10002372a42.head oracle -> 0 (1000239aa9b)
   -15> 2019-10-18 07:04:32.248 7fc247113700  3 client.21378731 ll_readlink 0x1000239aa9b.head
   -14> 2019-10-18 07:04:32.248 7fc247113700  3 client.21378731 ll_readlink 0x1000239aa9b.head = 23
   -13> 2019-10-18 07:04:32.248 7fc23d0ff700  3 client.21378731 ll_lookup 0x10002372a42.head oracle
   -12> 2019-10-18 07:04:32.248 7fc23d0ff700  3 client.21378731 may_lookup 0x564f55851080 = 0
   -11> 2019-10-18 07:04:32.248 7fc23d0ff700  3 client.21378731 ll_lookup 0x10002372a42.head oracle -> 0 (1000239aa9b)
   -10> 2019-10-18 07:04:32.248 7fc24390c700  3 client.21378731 ll_readlink 0x1000239aa9b.head
    -9> 2019-10-18 07:04:32.248 7fc24390c700  3 client.21378731 ll_readlink 0x1000239aa9b.head = 23
    -8> 2019-10-18 07:04:32.248 7fc241107700  3 client.21378731 ll_releasedir 0x564f6a862600
    -7> 2019-10-18 07:04:32.260 7fc24f123700  5 -- 10.20.75.48:0/2350441790 >> 10.78.184.93:6801/4234655216 conn(0x564f55816400 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=6773 cs=1 l=0). rx mds.3 seq 2665334 0x564f6dbde580 client_reply(???:1778578 = 0 (0) Success) v1
    -6> 2019-10-18 07:04:32.260 7fc24b91c700  1 -- 10.20.75.48:0/2350441790 <== mds.3 10.78.184.93:6801/4234655216 2665334 ==== client_reply(???:1778578 = 0 (0) Success) v1 ==== 116433+0+0 (3514735765 0 0) 0x564f6dbde580 con 0x564f55816400
    -5> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).read_bulk peer close file descriptor 2
    -4> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).read_until read failed
    -3> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).process read tag failed
    -2> 2019-10-18 07:04:32.260 7fc24f924700  1 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1).fault on lossy channel, failing
    -1> 2019-10-18 07:04:32.260 7fc24f924700  2 -- 10.20.75.48:0/2350441790 >> 10.20.77.54:6817/1946 conn(0x564f6b57ac00 :-1 s=STATE_OPEN pgs=412160 cs=1 l=1)._stop
     0> 2019-10-18 07:04:32.264 7fc24b91c700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)' thread 7fc24b91c700 time 2019-10-18 07:04:32.267202
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 1202: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn)

Logs before -16 are repeating of [-16, -11]

#5 Updated by Xiaoxi Chen 10 months ago

  • Status changed from Need More Info to New

#6 Updated by Xiaoxi Chen 9 months ago

  • Assignee set to Patrick Donnelly

hit 3 more times in 13.2.5, I had catch a coredump

#7 Updated by Xiaoxi Chen 9 months ago

ceph-post-file: 123801df-99cc-4c0a-a76c-9b6c8a614394

#8 Updated by Patrick Donnelly 8 months ago

  • Assignee deleted (Patrick Donnelly)

#9 Updated by Xiaoxi Chen 3 months ago

14.2.9/15.2.2 are also the victims

#10 Updated by Xiaoxi Chen 3 months ago

  • Affected Versions v13.2.6, v13.2.7, v13.2.8, v13.2.9, v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.2, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.2 added

#11 Updated by Zheng Yan about 1 month ago

  • Status changed from New to Fix Under Review
  • Backport set to octopus,nautilus
  • Pull request ID set to 36672

#12 Updated by Patrick Donnelly about 1 month ago

  • Assignee set to Zheng Yan
  • Target version set to v16.0.0
  • Affected Versions deleted (v13.2.5, v13.2.6, v13.2.7, v13.2.8, v13.2.9, v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.2, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.2)

#13 Updated by Patrick Donnelly 19 days ago

  • Status changed from Fix Under Review to Pending Backport

#14 Updated by Nathan Cutler 18 days ago

  • Copied to Backport #47259: nautilus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) added

#15 Updated by Nathan Cutler 18 days ago

  • Copied to Backport #47260: octopus: client: FAILED assert(dir->readdir_cache[dirp->cache_index] == dn) added

Also available in: Atom PDF