Project

General

Profile

Actions

Bug #61831

open

qa: test_mirroring_init_failure_with_recovery failure

Added by Rishabh Dave 10 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This failure was first reported here - https://tracker.ceph.com/issues/50224#note-13.

Seeing this failure again - https://pulpito.ceph.com/rishabh-2023-06-23_17:37:30-fs-wip-rishabh-improvements-authmon-distro-default-smithi/7313862

2023-06-23T20:20:27.098 INFO:tasks.cephfs_test_runner:======================================================================
2023-06-23T20:20:27.099 INFO:tasks.cephfs_test_runner:ERROR: test_mirroring_init_failure_with_recovery (tasks.cephfs.test_mirroring.TestMirroring)
2023-06-23T20:20:27.099 INFO:tasks.cephfs_test_runner:Test if the mirror daemon can recover from a init failure
2023-06-23T20:20:27.099 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2023-06-23T20:20:27.099 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2023-06-23T20:20:27.100 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_8627af3e0adcb765a3c249fcc209cba9f4873e1b/qa/tasks/cephfs/test_mirroring.py", line 742, in test_mirroring_init_failure_with_recovery
2023-06-23T20:20:27.100 INFO:tasks.cephfs_test_runner:    while proceed():
2023-06-23T20:20:27.100 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_076bbebc42a14f7d568aaa78eabb0038327bcb23/teuthology/contextutil.py", line 134, in __call__
2023-06-23T20:20:27.100 INFO:tasks.cephfs_test_runner:    raise MaxWhileTries(error_msg)
2023-06-23T20:20:27.101 INFO:tasks.cephfs_test_runner:teuthology.exceptions.MaxWhileTries: 'wait for failed state' reached maximum tries (21) after waiting for 100 seconds
2023-06-23T20:20:27.101 INFO:tasks.cephfs_test_runner:
2023-06-23T18:30:29.202 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.202 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.202 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.202 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.202 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.202 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.203 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.203 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.203 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.203 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.197+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.214 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.209+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.214 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.214 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.215 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.215 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.215 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac6474700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.215 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.215 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.215 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:29.215 INFO:tasks.ceph.mgr.x.smithi111.stderr:2023-06-23T18:30:29.210+0000 7f1ac2c6d700 -1 client.0 error registering admin socket command: (17) File exists
2023-06-23T18:30:31.138 INFO:teuthology.orchestra.run.smithi111.stdout:{}

Related issues 1 (0 open1 closed)

Related to CephFS - Bug #62936: Test failure: test_mirroring_init_failure_with_recovery (tasks.cephfs.test_mirroring.TestMirroring)Duplicate

Actions
Actions #1

Updated by Rishabh Dave 10 months ago

  • Description updated (diff)
Actions #2

Updated by Venky Shankar 10 months ago

  • Category set to Correctness/Safety
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v19.0.0
  • Backport set to reef,quincy,pacific
Actions #3

Updated by Venky Shankar 10 months ago

Kotresh said that he saw no active MDSs. Please RCA, Kotresh.

Actions #4

Updated by Kotresh Hiremath Ravishankar 10 months ago

Looks like mds were down

Looks like mds were down

<pre>
2023-06-23T20:20:22.560 DEBUG:teuthology.orchestra.run.smithi111:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs fail backup_fs
2023-06-23T20:20:23.062 INFO:tasks.ceph.mon.a.smithi111.stderr:2023-06-23T20:20:23.059+0000 7f1b5fe6f700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
2023-06-23T20:20:23.064 INFO:tasks.ceph.mds.d.smithi111.stderr:  -687> 2023-06-23T20:18:19.059+0000 7f0cc71da700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2023-06-23T20:20:23.067 INFO:teuthology.orchestra.run.smithi111.stderr:backup_fs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
</pre>
Actions #5

Updated by Kotresh Hiremath Ravishankar 10 months ago

I think the mds is down as part of cleanup. But the mirror status is failed. Need to debug further on it.

2023-06-23T20:18:31.588 INFO:teuthology.orchestra.run:Running command with timeout 30
2023-06-23T20:18:31.589 DEBUG:teuthology.orchestra.run.smithi111:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@56
2023-06-23T20:18:31.692 INFO:teuthology.orchestra.run.smithi111.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2023-06-23T20:18:31.693 DEBUG:teuthology.orchestra.run:got remote process result: 22
2023-06-23T20:18:31.694 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" failed: Command failed (mirror status for fs: cephfs) on smithi111 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@56'

Actions #6

Updated by Venky Shankar 8 months ago

  • Related to Bug #62936: Test failure: test_mirroring_init_failure_with_recovery (tasks.cephfs.test_mirroring.TestMirroring) added
Actions

Also available in: Atom PDF