Bug #49819: [pwl ssd] assert in retire_entries() during QEMU xfstest workload - rbd - Ceph

Actions

Copy link

Bug #49819

closed

[pwl ssd] assert in retire_entries() during QEMU xfstest workload

Added by Jason Dillaman about 3 years ago. Updated over 2 years ago.

Status:

Closed

Priority:

High

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2021-03-15T15:02:43.611 INFO:tasks.qemu.client.0.smithi082.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-1920-g529f3adb/rpm/el8/BUILD/ceph-17.0.0-1920-g529f3adb/src/librbd/cache/pwl/ssd/WriteLog.cc: In function 'bool librbd::cache::pwl::ssd::WriteLog<ImageCtxT>::retire_entries(long unsigned int) [with ImageCtxT = librbd::ImageCtx]' thread 7f7e2d7fa700 time 2021-03-15T15:02:43.608811+0000
2021-03-15T15:02:43.612 INFO:tasks.qemu.client.0.smithi082.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-1920-g529f3adb/rpm/el8/BUILD/ceph-17.0.0-1920-g529f3adb/src/librbd/cache/pwl/ssd/WriteLog.cc: 618: FAILED ceph_assert((*it)->log_entry_index == (control_block_pos + data_length + MIN_WRITE_ALLOC_SSD_SIZE) % this->m_log_pool_config_size + DATA_RING_BUFFER_OFFSET)
2021-03-15T15:02:43.613 INFO:tasks.qemu.client.0.smithi082.stderr: ceph version 17.0.0-1920-g529f3adb (529f3adb0ed98a7b55ea4e9483365c0b27e408d1) quincy (dev)
2021-03-15T15:02:43.613 INFO:tasks.qemu.client.0.smithi082.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f7fcb5a3238]
2021-03-15T15:02:43.613 INFO:tasks.qemu.client.0.smithi082.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x276452) [0x7f7fcb5a3452]
2021-03-15T15:02:43.614 INFO:tasks.qemu.client.0.smithi082.stderr: 3: (librbd::cache::pwl::ssd::WriteLog<librbd::ImageCtx>::retire_entries(unsigned long)+0x13d5) [0x7f7fb840d075]
2021-03-15T15:02:43.614 INFO:tasks.qemu.client.0.smithi082.stderr: 4: (librbd::cache::pwl::ssd::WriteLog<librbd::ImageCtx>::process_work()+0x31a) [0x7f7fb84090fa]
2021-03-15T15:02:43.615 INFO:tasks.qemu.client.0.smithi082.stderr: 5: (LambdaContext<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::wake_up()::{lambda(int)#3}>::finish(int)+0x12) [0x7f7fb83c0982]
2021-03-15T15:02:43.615 INFO:tasks.qemu.client.0.smithi082.stderr: 6: (ThreadPool::PointerWQ<Context>::_void_process(void*, ThreadPool::TPHandle&)+0x148) [0x7f7fb83c1448]
2021-03-15T15:02:43.616 INFO:tasks.qemu.client.0.smithi082.stderr: 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xe9a) [0x7f7fcb6a241a]
2021-03-15T15:02:43.616 INFO:tasks.qemu.client.0.smithi082.stderr: 8: (ThreadPool::WorkThread::entry()+0x15) [0x7f7fcb6a2c85]
2021-03-15T15:02:43.617 INFO:tasks.qemu.client.0.smithi082.stderr: 9: (Thread::_entry_func(void*)+0xd) [0x7f7fcb68a3ad]
2021-03-15T15:02:43.617 INFO:tasks.qemu.client.0.smithi082.stderr: 10: /lib64/libpthread.so.0(+0x814a) [0x7f7fd997714a]
2021-03-15T15:02:43.617 INFO:tasks.qemu.client.0.smithi082.stderr: 11: clone()
2021-03-15T15:04:08.463 INFO:tasks.qemu.client.0.smithi082.stderr:daemon-helper: command crashed with signal 6
2021-03-15T15:04:08.495 DEBUG:teuthology.orchestra.run:got remote process result: 1

http://qa-proxy.ceph.com/teuthology/jdillaman-2021-03-15_10:35:18-rbd-wip-jd-testing-distro-basic-smithi/5967329/teuthology.log

Actions

Copy link

Updated by Jason Dillaman about 3 years ago

Subject changed from [pwl ssd] trash during QEMU xfstest workload to [pwl ssd] crash during QEMU xfstest workload

Actions

Copy link

Updated by Jason Dillaman about 3 years ago

fio is also crashing under the SSD cache: http://qa-proxy.ceph.com/teuthology/jdillaman-2021-03-17_18:18:23-rbd-wip-jd-testing-distro-basic-smithi/5974560/teuthology.log

Actions

Copy link

Updated by Ilya Dryomov almost 3 years ago

Subject changed from [pwl ssd] crash during QEMU xfstest workload to [pwl ssd] assert in retire_entries() during QEMU xfstest workload

fio segfault (appears to be in bufferlist::splice() but not sure yet) tracked in #50832.

Actions

Copy link

Updated by Ilya Dryomov over 2 years ago

Related to Bug #52081: rbd persistent SSD cache crash at retire_entries added

Actions

Copy link

Updated by Ilya Dryomov over 2 years ago

Related to deleted (Bug #52081: rbd persistent SSD cache crash at retire_entries)

Actions

Copy link

Updated by CONGMIN YIN over 2 years ago

Maybe the bug has been solved? The segfault in bufferlist::splice() maybe https://tracker.ceph.com/issues/51419, which is also resovled. In the master test, this problem has not appeared.

Actions

Copy link

Updated by Deepika Upadhyay over 2 years ago

Status changed from New to Closed

Not observed as of now, feel free to open if otherwise

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #49819

[pwl ssd] assert in retire_entries() during QEMU xfstest workload

Updated by Jason Dillaman about 3 years ago

Updated by Jason Dillaman about 3 years ago

Updated by Ilya Dryomov almost 3 years ago

Updated by Ilya Dryomov over 2 years ago

Updated by Ilya Dryomov over 2 years ago

Updated by CONGMIN YIN over 2 years ago

Updated by Deepika Upadhyay over 2 years ago