Bug #57147: qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure - RADOS - Ceph

Actions

Copy link

Bug #57147

open

qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure

Added by Kotresh Hiremath Ravishankar over 1 year ago. Updated over 1 year ago.

Status:

New

Priority:

Normal

Assignee:

Neha Ojha

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The teuthology link https://pulpito.ceph.com/yuriw-2022-08-11_16:57:01-fs-wip-yuri3-testing-2022-08-11-0809-pacific-distro-default-smithi/6968267

The mds didn't become healthy and the test timed out.

2022-08-11T23:44:41.536 INFO:tasks.cephfs_test_runner:======================================================================
2022-08-11T23:44:41.536 INFO:tasks.cephfs_test_runner:ERROR: test_full_fsync (tasks.cephfs.test_full.TestClusterFull)
2022-08-11T23:44:41.536 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-08-11T23:44:41.537 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-08-11T23:44:41.537 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_eb4319a2b19ca3fba01742173e97dd5b50b2f291/qa/tasks/cephfs/test_full.py", line 395, in setUp
2022-08-11T23:44:41.537 INFO:tasks.cephfs_test_runner:    super(TestClusterFull, self).setUp()
2022-08-11T23:44:41.537 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_eb4319a2b19ca3fba01742173e97dd5b50b2f291/qa/tasks/cephfs/test_full.py", line 32, in setUp
2022-08-11T23:44:41.538 INFO:tasks.cephfs_test_runner:    CephFSTestCase.setUp(self)
2022-08-11T23:44:41.538 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_eb4319a2b19ca3fba01742173e97dd5b50b2f291/qa/tasks/cephfs/cephfs_test_case.py", line 169, in setUp
2022-08-11T23:44:41.538 INFO:tasks.cephfs_test_runner:    self.fs.wait_for_daemons()
2022-08-11T23:44:41.539 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_eb4319a2b19ca3fba01742173e97dd5b50b2f291/qa/tasks/cephfs/filesystem.py", line 1108, in wait_for_daemons
2022-08-11T23:44:41.539 INFO:tasks.cephfs_test_runner:    raise RuntimeError("Timed out waiting for MDS daemons to become healthy")
2022-08-11T23:44:41.539 INFO:tasks.cephfs_test_runner:RuntimeError: Timed out waiting for MDS daemons to become healthy
2022-08-11T23:44:41.539 INFO:tasks.cephfs_test_runner:
2022-08-11T23:44:41.540 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

I think the osd which backed the mds crashed causing mds to stuck in up:creating state.

 ceph version 16.2.10-668-geb4319a2 (eb4319a2b19ca3fba01742173e97dd5b50b2f291) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7f19659eeb20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x556d542ce711]
 5: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x198) [0x556d5477b758]
 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a8) [0x556d5477d8f8]
 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x556d545ad242]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x556d545509fe]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x556d543d7b39]
 10: (ceph::osd::scheduler::PGRecoveryMsg::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x556d54637328]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x556d543f51b8]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x556d54a74a64]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x556d54a77944]
 14: /lib64/libpthread.so.0(+0x814a) [0x7f19659e414a]
 15: clone()

The crash log can be found on teuthology at `/a/yuriw-2022-08-11_16:57:01-fs-wip-yuri3-testing-2022-08-11-0809-pacific-distro-default-smithi/6968267/remote/smithi163/crash/posted/2022-08-11T23:56:04.325332Z_e45de76b-08f9-4145-bc3c-5dd9acb3942d`

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #57147

qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure

Updated by Venky Shankar over 1 year ago

Updated by Neha Ojha over 1 year ago

Updated by Kotresh Hiremath Ravishankar over 1 year ago

Updated by Kotresh Hiremath Ravishankar over 1 year ago