Project

General

Profile

Actions

Bug #42828

closed

rbd journal err assert(ictx->journal != __null) when release exclusive_lock

Added by liuzhong chen over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version: 12.2.12
Reproduction:
1. create a rbd with journaling feature.
2. test rbd with two fio client,fio command as follows:

fio --name test --rw=randwrite --bs=4k --runtime=3600 --ioengine=rbd --clientname=admin --pool=poolclz --rbdname=rbdclz --iodepth=128 --numjobs=1 --direct=1 --group_reporting --time_based=1  --eta-newline 1 --fsync=1

3. wait for while, coredump happen.
/root/rpmbuild/BUILD/ceph-12.2.12-1/src/librbd/io/AioCompletion.cc: 86: FAILED assert(ictx->journal != __null)

coredump file:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/qemu-kvm -name guest=instance-00000082,debug-threads=on -S -object'.
Program terminated with signal 6, Aborted.
#0  0x00007f42a6f261f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install qemu-kvm-2.9.0-16.el7.centos.es_4.5.1.2.x86_64
(gdb) bt
#0  0x00007f42a6f261f7 in raise () from /lib64/libc.so.6
#1  0x00007f42a6f278e8 in abort () from /lib64/libc.so.6
#2  0x00007f429c02b744 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib64/ceph/libceph-common.so.0
#3  0x00007f42ac35c4d3 in librbd::io::AioCompletion::complete() () from /lib64/librbd.so.1
#4  0x00007f42ac35d14f in librbd::io::AioCompletion::complete_request(long) () from /lib64/librbd.so.1
#5  0x00007f42ac35e511 in librbd::io::(anonymous namespace)::C_FlushJournalCommit<librbd::ImageCtx>::finish(int) () from /lib64/librbd.so.1
#6  0x00007f42ac29c829 in Context::complete(int) () from /lib64/librbd.so.1
#7  0x00007f42ac2aaac4 in ContextWQ::process(Context*) () from /lib64/librbd.so.1
#8  0x00007f429c03385e in ThreadPool::worker(ThreadPool::WorkThread*) () from /usr/lib64/ceph/libceph-common.so.0
#9  0x00007f429c034780 in ThreadPool::WorkThread::entry() () from /usr/lib64/ceph/libceph-common.so.0
#10 0x00007f42a72bbe25 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f42a6fe934d in clone () from /lib64/libc.so.6 

The reason is prerelase exclusive_lock not wait for all aio complete. when one aio flush complete and just in time the journal has been closed cause assert(ictx->journal != __null) err.
I wonder if we need to wait for all aio complete in prerelease exclusive_lock or anything else need to consider.


Files

rbd.log.bak (629 KB) rbd.log.bak liuzhong chen, 11/19/2019 05:29 AM
Actions

Also available in: Atom PDF