Bug #22741
closedosdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi
0%
Description
Run: http://pulpito.ceph.com/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/
Jobs: '2090918', '2090943', '2090929'
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/teuthology.log
2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr:/build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc: 1169: FAILED assert(bh->last_write_tid > tid) 2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: ceph version 13.0.0-4965-g40d3bc0 (40d3bc073f558442887c5318e247a2daeeb98ea6) mimic (dev) 2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7f01e8bfaf75] 2018-01-20T12:47:41.254 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 2: (ObjectCacher::bh_write_commit(long, sobject_t, std::vector<std::pair<long, unsigned long>, std::allocator<std::pair<long, unsigned long> > >&, unsigned long, int)+0xd72) [0x55d0d6271092] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 3: (ObjectCacher::C_WriteCommit::finish(int)+0x6a) [0x55d0d627bb9a] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 4: (Context::complete(int)+0x9) [0x55d0d6226609] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 5: (C_Lock::finish(int)+0x29) [0x55d0d6227399] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 6: (Context::complete(int)+0x9) [0x55d0d6226609] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 7: (Finisher::finisher_thread_entry()+0x31f) [0x7f01e8bf958f] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 8: (()+0x76ba) [0x7f01e872e6ba] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: 9: (clone()+0x6d) [0x7f01e7f573dd] 2018-01-20T12:47:41.255 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi082.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by John Spray over 6 years ago
- Project changed from Ceph to CephFS
- Component(FS) osdc added
Assigned to CephFS because it's crashing in the ceph-fuse process (in the absence of a better home for ObjectCacher issues...)
Updated by Patrick Donnelly over 6 years ago
- Subject changed from "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi to osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi
- Assignee set to Patrick Donnelly
Updated by Patrick Donnelly over 6 years ago
Core: /ceph/teuthology-archive/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/remote/smithi082/coredump/1516452461.27619.core
On senta02, I've setup a docker image for debugging:
# sudo docker run -ti -v /ceph:/ceph:ro pdonnell/ceph-ci:master-ubuntu-xenial # gdb -q /usr/bin/ceph-fuse (gdb) core /ceph/teuthology-archive/yuriw-2018-01-19_18:23:03-powercycle-wip-yuri-master-1.19.18-distro-basic-smithi/2090918/remote/smithi082/coredump/1516452461.27619.core (gdb) bt #0 0x00007f01e8738269 in raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/pt-raise.c:35 #1 0x000055d0d626247e in reraise_fatal (signum=6) at /build/ceph-13.0.0-4965-g40d3bc0/src/global/signal_handler.cc:74 #2 handle_fatal_signal (signum=6) at /build/ceph-13.0.0-4965-g40d3bc0/src/global/signal_handler.cc:138 #3 <signal handler called> #4 0x00007f01e7e85428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #5 0x00007f01e7e8702a in __GI_abort () at abort.c:89 #6 0x00007f01e8bfb0da in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib/ceph/libceph-common.so.0 #7 0x000055d0d6271092 in ObjectCacher::bh_write_commit (this=this@entry=0x55d0d81ade00, poolid=<optimized out>, oid=..., ranges=std::vector of length 3, capacity 3 = {...}, tid=<optimized out>, tid@entry=40597, r=r@entry=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc:1169 #8 0x000055d0d627bb9a in ObjectCacher::C_WriteCommit::finish (this=0x55d0febd56b0, r=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/osdc/ObjectCacher.cc:1015 #9 0x000055d0d6226609 in Context::complete (this=0x55d0febd56b0, r=<optimized out>) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:77 #10 0x000055d0d6227399 in C_Lock::finish (this=0x55d0f4e6ffa0, r=0) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:211 #11 0x000055d0d6226609 in Context::complete (this=0x55d0f4e6ffa0, r=<optimized out>) at /build/ceph-13.0.0-4965-g40d3bc0/src/include/Context.h:77 #12 0x00007f01e8bf958f in Finisher::finisher_thread_entry() () from /usr/lib/ceph/libceph-common.so.0 #13 0x00007f01e872e6ba in start_thread (arg=0x7f01e0e37700) at pthread_create.c:333 #14 0x00007f01e7f573dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:74 #15 0x0000000000000000 in ?? ()
I don't see immediately what went wrong as I'm not familiar with this code. I'll wait for feedback from Zheng when he gets back from PTO.
Updated by Zheng Yan over 6 years ago
diff --git a/src/osdc/ObjectCacher.cc b/src/osdc/ObjectCacher.cc index c1664718da..3f549f9663 100644 --- a/src/osdc/ObjectCacher.cc +++ b/src/osdc/ObjectCacher.cc @@ -1149,7 +1149,7 @@ void ObjectCacher::bh_write_commit(int64_t poolid, sobject_t oid, ++p) { BufferHead *bh = p->second; - if (bh->start() > start+(loff_t)length) + if (bh->start() >= start+(loff_t)length) break; if (bh->start() < start &&
does this change make sense?
Updated by Zheng Yan over 6 years ago
- Status changed from New to Fix Under Review
Updated by Patrick Donnelly over 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Assignee changed from Patrick Donnelly to Zheng Yan
- Backport set to luminous
Updated by Patrick Donnelly over 6 years ago
- Backport changed from luminous to luminous,jewel
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #22860: luminous: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi added
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #22861: jewel: osdc: "FAILED assert(bh->last_write_tid > tid)" in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi added
Updated by Patrick Donnelly about 6 years ago
- Status changed from Pending Backport to Resolved
Updated by Patrick Donnelly about 6 years ago
- Related to Bug #23827: osd sends op_reply out of order added