Project

General

Profile

Actions

Bug #57064

open

qa: test_add_ancestor_and_child_directory failure

Added by Kotresh Hiremath Ravishankar over 1 year ago. Updated 7 months ago.

Status:
Need More Info
Priority:
Low
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/mirroring
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seen in recent quincy run https://pulpito.ceph.com/yuriw-2022-08-04_11:54:20-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default-smithi/6957891/


2022-08-04T12:33:31.314 INFO:tasks.cephfs_test_runner:======================================================================
2022-08-04T12:33:31.315 INFO:tasks.cephfs_test_runner:ERROR: test_add_ancestor_and_child_directory (tasks.cephfs.test_mirroring.TestMirroring)
2022-08-04T12:33:31.315 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-08-04T12:33:31.316 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-08-04T12:33:31.316 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_7b44bb0aa67129dc73e5f574788b0693d5771e8b/qa/tasks/cephfs/test_mirroring.py", line 400, in test_add_ancestor_and_child_directory
2022-08-04T12:33:31.316 INFO:tasks.cephfs_test_runner:    self.enable_mirroring(self.primary_fs_name, self.primary_fs_id)
2022-08-04T12:33:31.317 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_7b44bb0aa67129dc73e5f574788b0693d5771e8b/qa/tasks/cephfs/test_mirroring.py", line 47, in enable_mirroring
2022-08-04T12:33:31.317 INFO:tasks.cephfs_test_runner:    'fs', 'mirror', 'status', f'{fs_name}@{fs_id}')
2022-08-04T12:33:31.317 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_7b44bb0aa67129dc73e5f574788b0693d5771e8b/qa/tasks/cephfs/test_mirroring.py", line 227, in mirror_daemon_command
2022-08-04T12:33:31.318 INFO:tasks.cephfs_test_runner:    check_status=True, label=cmd_label)
2022-08-04T12:33:31.318 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/remote.py", line 510, in run
2022-08-04T12:33:31.318 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2022-08-04T12:33:31.319 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/run.py", line 455, in run
2022-08-04T12:33:31.319 INFO:tasks.cephfs_test_runner:    r.wait()
2022-08-04T12:33:31.319 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/run.py", line 161, in wait
2022-08-04T12:33:31.320 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2022-08-04T12:33:31.320 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/run.py", line 183, in _raise_for_status
2022-08-04T12:33:31.320 INFO:tasks.cephfs_test_runner:    node=self.hostname, label=self.label
2022-08-04T12:33:31.321 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed (mirror status for fs: cephfs) on smithi037 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2'

Actions #1

Updated by Kotresh Hiremath Ravishankar over 1 year ago

  • Description updated (diff)
Actions #2

Updated by Venky Shankar over 1 year ago

  • Category set to Correctness/Safety
  • Assignee set to Dhairya Parmar
  • Target version set to v18.0.0
  • Backport set to pacific,quincy
Actions #3

Updated by Dhairya Parmar about 1 year ago

I tried digging into this failure, while looking at teuthology log, I see

client.0 error registering admin socket command: (17) File exists

I checked the MDS logs and could see hooks getting register successfully

register_command XXX hook

but the admin_socker::register_command is sending errno 17, there should've been lines like this but I found none:

register_command XXX cmddesc YYY hook EEXIST

Checked OSD logs and found:

2022-08-04T12:32:59.559+0000 7efdf4d5b700 10 osd.2 pg_epoch: 34 pg[2.0( v 11'97 (0'0,11'97] local-lis/les=10/11 n=2 ec=10/10 lis/c=10/10 les/c/f=11/11/0 sis=10) [2,1] r=0 lpr=10 crt=11'97 lcod 11'96 mlcod 11'96 active+clean] do_osd_ops error: (17) File exists

looks like it's hitting:

if (obs.exists && !oi.is_whiteout() &&
    (op.flags & CEPH_OSD_OP_FLAG_EXCL)) {
        result = -EEXIST; /* this is an exclusive create */
}

at https://github.com/ceph/ceph/blob/main/src/osd/PrimaryLogPG.cc#L6863-L6865

and then suddenly the asok command fails saying 'No such file or directory'

2022-08-04T12:33:11.570 DEBUG:teuthology.orchestra.run.smithi037:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2
2022-08-04T12:33:11.685 DEBUG:teuthology.orchestra.run:got remote process result: 22
2022-08-04T12:33:11.686 INFO:teuthology.orchestra.run.smithi037.stderr:admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
2022-08-04T12:33:11.686 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" failed: Command failed (mirror status for fs: cephfs) on smithi037 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2'

Again checked OSD logs here and it says:

93616:2022-08-04T12:32:48.487+0000 7efdf3d59700 10 osd.2 pg_epoch: 32 pg[7.1a( empty local-lis/les=30/31 n=0 ec=30/30 lis/c=30/30 les/c/f=31/31/0 sis=30) [2,0] r=0 lpr=30 crt=0'0 mlcod 0'0 active+clean] do_osd_ops error: (2) No such file or directory

somehow the object nomore exists, strange.

Haven't seen this issue anytime recently, if anyone finds/experiences this, do link the info here.

Actions #4

Updated by Venky Shankar about 1 year ago

Dhairya Parmar wrote:

I tried digging into this failure, while looking at teuthology log, I see
[...]

I checked the MDS logs and could see hooks getting register successfully
[...]

but the admin_socker::register_command is sending errno 17, there should've been lines like this but I found none:
[...]

Checked OSD logs and found:
[...]

looks like it's hitting:

[...]

at https://github.com/ceph/ceph/blob/main/src/osd/PrimaryLogPG.cc#L6863-L6865

and then suddenly the asok command fails saying 'No such file or directory'
[...]

Again checked OSD logs here and it says:
[...]

somehow the object nomore exists, strange.

The mirror daemon is most likely not running. Could you check the mirror daemon logs and see if you spot something?

Actions #5

Updated by Dhairya Parmar about 1 year ago

  • Status changed from New to Need More Info
  • Priority changed from Normal to Low

Looking at the logs, mirror daemon is missing and thus the command failed

2022-08-04T12:33:11.570 DEBUG:teuthology.orchestra.run.smithi037:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2
2022-08-04T12:33:11.685 DEBUG:teuthology.orchestra.run:got remote process result: 22

Mirror logs might tell something but unfortunately most of the logs are now gone, however a week before they were available and I did take backup of them but I could not find mirror log(s) in them, indicating that the mirror daemon didn't ran. This could be due to valgrind that makes everything really slow, but that surely doesn't much explain the truth, therefore, due to lack of logs there isn't much that I can do, I'll be looking for any relevant failures or if anyone finds it, do share it here.

Actions #6

Updated by Patrick Donnelly 7 months ago

  • Target version deleted (v18.0.0)
Actions

Also available in: Atom PDF