Project

General

Profile

Bug #22741

osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi

Added by Yuri Weinstein 12 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
01/20/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
powercycle
Component(FS):
osdc
Labels (FS):
Pull request ID:

Description

Run: http://pulpito.ceph.com/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/
Jobs: '2090918', '2090943', '2090929'
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/teuthology.log

2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr:/build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc: 1169: FAILED assert(bh->last_write_tid > tid)
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr:
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: ceph version 13.0.0-4965-g40d3bc0 (40d3bc073f558442887c5318e247a2daeeb98ea6) mimic (dev)
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7f01e8bfaf75]
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 2: (ObjectCacher::bh_write_commit(long, sobject_t, std::vector<std::pair<long, unsigned long>, std::allocator<std::pair<long, unsigned long> > >&, unsigned long, int)+0xd72) [0x55d0d6271092]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 3: (ObjectCacher::C_WriteCommit::finish(int)+0x6a) [0x55d0d627bb9a]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 4: (Context::complete(int)+0x9) [0x55d0d6226609]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 5: (C_Lock::finish(int)+0x29) [0x55d0d6227399]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 6: (Context::complete(int)+0x9) [0x55d0d6226609]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 7: (Finisher::finisher_thread_entry()+0x31f) [0x7f01e8bf958f]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 8: (()+0x76ba) [0x7f01e872e6ba]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 9: (clone()+0x6d) [0x7f01e7f573dd]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Related to RADOS - Bug #23827: osd sends op_reply out of order Resolved 04/23/2018
Copied to fs - Backport #22860: luminous: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi Resolved
Copied to fs - Backport #22861: jewel: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi Resolved

History

#1 Updated by John Spray 12 months ago

  • Project changed from Ceph to fs
  • Component(FS) osdc added

Assigned to CephFS because it's crashing in the ceph-fuse process (in the absence of a better home for ObjectCacher issues...)

#2 Updated by Patrick Donnelly 12 months ago

  • Subject changed from "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi to osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi
  • Assignee set to Patrick Donnelly

#3 Updated by Patrick Donnelly 12 months ago

Core: /ceph/teuthology-archive/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/remote/smithi082/coredump/1516452461.27619.core

On senta02, I've setup a docker image for debugging:

# sudo docker run -ti -v /ceph:/ceph:ro pdonnell/ceph-ci:master-ubuntu-xenial
# gdb -q /usr/bin/ceph-fuse
(gdb) core /ceph/teuthology-archive/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/remote/smithi082/coredump/1516452461.27619.core
(gdb) bt
#0  0x00007f01e8738269 in raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x000055d0d626247e in reraise_fatal (signum=6) at /build/ceph-13.0.0-4965-g40d3bc0/src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=6) at /build/ceph-13.0.0-4965-g40d3bc0/src/global/signal_handler.cc:138
#3  <signal handler called>
#4  0x00007f01e7e85428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00007f01e7e8702a in __GI_abort () at abort.c:89
#6  0x00007f01e8bfb0da in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib/ceph/libceph-common.so.0
#7  0x000055d0d6271092 in ObjectCacher::bh_write_commit (this=this@entry=0x55d0d81ade00, poolid=<optimized out>, oid=..., ranges=std::vector of length 3, capacity 3 = {...}, tid=<optimized out>, 
    tid@entry=40597, r=r@entry=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc:1169
#8  0x000055d0d627bb9a in ObjectCacher::C_WriteCommit::finish (this=0x55d0febd56b0, r=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc:1015
#9  0x000055d0d6226609 in Context::complete (this=0x55d0febd56b0, r=<optimized out>) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:77
#10 0x000055d0d6227399 in C_Lock::finish (this=0x55d0f4e6ffa0, r=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:211
#11 0x000055d0d6226609 in Context::complete (this=0x55d0f4e6ffa0, r=<optimized out>) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:77
#12 0x00007f01e8bf958f in Finisher::finisher_thread_entry() () from /usr/lib/ceph/libceph-common.so.0
#13 0x00007f01e872e6ba in start_thread (arg=0x7f01e0e37700) at pthread_create.c:333
#14 0x00007f01e7f573dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:74
#15 0x0000000000000000 in ?? ()

I don't see immediately what went wrong as I'm not familiar with this code. I'll wait for feedback from Zheng when he gets back from PTO.

#4 Updated by Zheng Yan 12 months ago

diff --git a/src/osdc/ObjectCacher.cc b/src/osdc/ObjectCacher.cc
index c1664718da..3f549f9663 100644
--- a/src/osdc/ObjectCacher.cc
+++ b/src/osdc/ObjectCacher.cc
@@ -1149,7 +1149,7 @@ void ObjectCacher::bh_write_commit(int64_t poolid, sobject_t oid,
         ++p) {
       BufferHead *bh = p->second;

-      if (bh->start() > start+(loff_t)length)
+      if (bh->start() >= start+(loff_t)length)
        break;

       if (bh->start() < start &&

does this change make sense?

#5 Updated by Zheng Yan 12 months ago

  • Status changed from New to Need Review

#6 Updated by Patrick Donnelly 12 months ago

  • Status changed from Need Review to Pending Backport
  • Assignee changed from Patrick Donnelly to Zheng Yan
  • Backport set to luminous

#7 Updated by Patrick Donnelly 12 months ago

  • Backport changed from luminous to luminous,jewel

#8 Updated by Nathan Cutler 12 months ago

  • Copied to Backport #22860: luminous: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi added

#9 Updated by Nathan Cutler 12 months ago

  • Copied to Backport #22861: jewel: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi added

#10 Updated by Patrick Donnelly 10 months ago

  • Status changed from Pending Backport to Resolved

#11 Updated by Patrick Donnelly 9 months ago

  • Related to Bug #23827: osd sends op_reply out of order added

Also available in: Atom PDF