Project

General

Profile

Actions

Bug #22741

closed

osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi

Added by Yuri Weinstein about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
powercycle
Component(FS):
osdc
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/
Jobs: '2090918', '2090943', '2090929'
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/teuthology.log

2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr:/build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc: 1169: FAILED assert(bh->last_write_tid > tid)
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr:
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: ceph version 13.0.0-4965-g40d3bc0 (40d3bc073f558442887c5318e247a2daeeb98ea6) mimic (dev)
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7f01e8bfaf75]
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 2: (ObjectCacher::bh_write_commit(long, sobject_t, std::vector<std::pair<long, unsigned long>, std::allocator<std::pair<long, unsigned long> > >&, unsigned long, int)+0xd72) [0x55d0d6271092]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 3: (ObjectCacher::C_WriteCommit::finish(int)+0x6a) [0x55d0d627bb9a]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 4: (Context::complete(int)+0x9) [0x55d0d6226609]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 5: (C_Lock::finish(int)+0x29) [0x55d0d6227399]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 6: (Context::complete(int)+0x9) [0x55d0d6226609]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 7: (Finisher::finisher_thread_entry()+0x31f) [0x7f01e8bf958f]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 8: (()+0x76ba) [0x7f01e872e6ba]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 9: (clone()+0x6d) [0x7f01e7f573dd]
2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 3 (0 open3 closed)

Related to RADOS - Bug #23827: osd sends op_reply out of orderResolvedNeha Ojha

Actions
Copied to CephFS - Backport #22860: luminous: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithiResolvedPrashant DActions
Copied to CephFS - Backport #22861: jewel: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithiResolvedPrashant DActions
Actions #1

Updated by John Spray about 6 years ago

  • Project changed from Ceph to CephFS
  • Component(FS) osdc added

Assigned to CephFS because it's crashing in the ceph-fuse process (in the absence of a better home for ObjectCacher issues...)

Actions #2

Updated by Patrick Donnelly about 6 years ago

  • Subject changed from "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi to osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi
  • Assignee set to Patrick Donnelly
Actions #3

Updated by Patrick Donnelly about 6 years ago

Core: /ceph/teuthology-archive/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/remote/smithi082/coredump/1516452461.27619.core

On senta02, I've setup a docker image for debugging:

# sudo docker run -ti -v /ceph:/ceph:ro pdonnell/ceph-ci:master-ubuntu-xenial
# gdb -q /usr/bin/ceph-fuse
(gdb) core /ceph/teuthology-archive/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/remote/smithi082/coredump/1516452461.27619.core
(gdb) bt
#0  0x00007f01e8738269 in raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x000055d0d626247e in reraise_fatal (signum=6) at /build/ceph-13.0.0-4965-g40d3bc0/src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=6) at /build/ceph-13.0.0-4965-g40d3bc0/src/global/signal_handler.cc:138
#3  <signal handler called>
#4  0x00007f01e7e85428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00007f01e7e8702a in __GI_abort () at abort.c:89
#6  0x00007f01e8bfb0da in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib/ceph/libceph-common.so.0
#7  0x000055d0d6271092 in ObjectCacher::bh_write_commit (this=this@entry=0x55d0d81ade00, poolid=<optimized out>, oid=..., ranges=std::vector of length 3, capacity 3 = {...}, tid=<optimized out>, 
    tid@entry=40597, r=r@entry=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc:1169
#8  0x000055d0d627bb9a in ObjectCacher::C_WriteCommit::finish (this=0x55d0febd56b0, r=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc:1015
#9  0x000055d0d6226609 in Context::complete (this=0x55d0febd56b0, r=<optimized out>) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:77
#10 0x000055d0d6227399 in C_Lock::finish (this=0x55d0f4e6ffa0, r=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:211
#11 0x000055d0d6226609 in Context::complete (this=0x55d0f4e6ffa0, r=<optimized out>) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:77
#12 0x00007f01e8bf958f in Finisher::finisher_thread_entry() () from /usr/lib/ceph/libceph-common.so.0
#13 0x00007f01e872e6ba in start_thread (arg=0x7f01e0e37700) at pthread_create.c:333
#14 0x00007f01e7f573dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:74
#15 0x0000000000000000 in ?? ()

I don't see immediately what went wrong as I'm not familiar with this code. I'll wait for feedback from Zheng when he gets back from PTO.

Actions #4

Updated by Zheng Yan about 6 years ago

diff --git a/src/osdc/ObjectCacher.cc b/src/osdc/ObjectCacher.cc
index c1664718da..3f549f9663 100644
--- a/src/osdc/ObjectCacher.cc
+++ b/src/osdc/ObjectCacher.cc
@@ -1149,7 +1149,7 @@ void ObjectCacher::bh_write_commit(int64_t poolid, sobject_t oid,
         ++p) {
       BufferHead *bh = p->second;

-      if (bh->start() > start+(loff_t)length)
+      if (bh->start() >= start+(loff_t)length)
        break;

       if (bh->start() < start &&

does this change make sense?

Actions #5

Updated by Zheng Yan about 6 years ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Patrick Donnelly about 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee changed from Patrick Donnelly to Zheng Yan
  • Backport set to luminous
Actions #7

Updated by Patrick Donnelly about 6 years ago

  • Backport changed from luminous to luminous,jewel
Actions #8

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #22860: luminous: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi added
Actions #9

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #22861: jewel: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi added
Actions #10

Updated by Patrick Donnelly about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions #11

Updated by Patrick Donnelly almost 6 years ago

  • Related to Bug #23827: osd sends op_reply out of order added
Actions

Also available in: Atom PDF