Bug #57064
openqa: test_add_ancestor_and_child_directory failure
0%
Description
Seen in recent quincy run https://pulpito.ceph.com/yuriw-2022-08-04_11:54:20-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default-smithi/6957891/
2022-08-04T12:33:31.314 INFO:tasks.cephfs_test_runner:====================================================================== 2022-08-04T12:33:31.315 INFO:tasks.cephfs_test_runner:ERROR: test_add_ancestor_and_child_directory (tasks.cephfs.test_mirroring.TestMirroring) 2022-08-04T12:33:31.315 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2022-08-04T12:33:31.316 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2022-08-04T12:33:31.316 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_7b44bb0aa67129dc73e5f574788b0693d5771e8b/qa/tasks/cephfs/test_mirroring.py", line 400, in test_add_ancestor_and_child_directory 2022-08-04T12:33:31.316 INFO:tasks.cephfs_test_runner: self.enable_mirroring(self.primary_fs_name, self.primary_fs_id) 2022-08-04T12:33:31.317 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_7b44bb0aa67129dc73e5f574788b0693d5771e8b/qa/tasks/cephfs/test_mirroring.py", line 47, in enable_mirroring 2022-08-04T12:33:31.317 INFO:tasks.cephfs_test_runner: 'fs', 'mirror', 'status', f'{fs_name}@{fs_id}') 2022-08-04T12:33:31.317 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_7b44bb0aa67129dc73e5f574788b0693d5771e8b/qa/tasks/cephfs/test_mirroring.py", line 227, in mirror_daemon_command 2022-08-04T12:33:31.318 INFO:tasks.cephfs_test_runner: check_status=True, label=cmd_label) 2022-08-04T12:33:31.318 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/remote.py", line 510, in run 2022-08-04T12:33:31.318 INFO:tasks.cephfs_test_runner: r = self._runner(client=self.ssh, name=self.shortname, **kwargs) 2022-08-04T12:33:31.319 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/run.py", line 455, in run 2022-08-04T12:33:31.319 INFO:tasks.cephfs_test_runner: r.wait() 2022-08-04T12:33:31.319 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/run.py", line 161, in wait 2022-08-04T12:33:31.320 INFO:tasks.cephfs_test_runner: self._raise_for_status() 2022-08-04T12:33:31.320 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e4bf63f00ab3f82a26635c2779f4c3f1b73fb53/teuthology/orchestra/run.py", line 183, in _raise_for_status 2022-08-04T12:33:31.320 INFO:tasks.cephfs_test_runner: node=self.hostname, label=self.label 2022-08-04T12:33:31.321 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed (mirror status for fs: cephfs) on smithi037 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2'
Updated by Kotresh Hiremath Ravishankar over 1 year ago
- Description updated (diff)
Updated by Venky Shankar over 1 year ago
- Category set to Correctness/Safety
- Assignee set to Dhairya Parmar
- Target version set to v18.0.0
- Backport set to pacific,quincy
Updated by Dhairya Parmar about 1 year ago
I tried digging into this failure, while looking at teuthology log, I see
client.0 error registering admin socket command: (17) File exists
I checked the MDS logs and could see hooks getting register successfully
register_command XXX hook
but the admin_socker::register_command is sending errno 17, there should've been lines like this but I found none:
register_command XXX cmddesc YYY hook EEXIST
Checked OSD logs and found:
2022-08-04T12:32:59.559+0000 7efdf4d5b700 10 osd.2 pg_epoch: 34 pg[2.0( v 11'97 (0'0,11'97] local-lis/les=10/11 n=2 ec=10/10 lis/c=10/10 les/c/f=11/11/0 sis=10) [2,1] r=0 lpr=10 crt=11'97 lcod 11'96 mlcod 11'96 active+clean] do_osd_ops error: (17) File exists
looks like it's hitting:
if (obs.exists && !oi.is_whiteout() &&
(op.flags & CEPH_OSD_OP_FLAG_EXCL)) {
result = -EEXIST; /* this is an exclusive create */
}
at https://github.com/ceph/ceph/blob/main/src/osd/PrimaryLogPG.cc#L6863-L6865
and then suddenly the asok command fails saying 'No such file or directory'
2022-08-04T12:33:11.570 DEBUG:teuthology.orchestra.run.smithi037:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2
2022-08-04T12:33:11.685 DEBUG:teuthology.orchestra.run:got remote process result: 22
2022-08-04T12:33:11.686 INFO:teuthology.orchestra.run.smithi037.stderr:admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
2022-08-04T12:33:11.686 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" failed: Command failed (mirror status for fs: cephfs) on smithi037 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2'
Again checked OSD logs here and it says:
93616:2022-08-04T12:32:48.487+0000 7efdf3d59700 10 osd.2 pg_epoch: 32 pg[7.1a( empty local-lis/les=30/31 n=0 ec=30/30 lis/c=30/30 les/c/f=31/31/0 sis=30) [2,0] r=0 lpr=30 crt=0'0 mlcod 0'0 active+clean] do_osd_ops error: (2) No such file or directory
somehow the object nomore exists, strange.
Haven't seen this issue anytime recently, if anyone finds/experiences this, do link the info here.
Updated by Venky Shankar about 1 year ago
Dhairya Parmar wrote:
I tried digging into this failure, while looking at teuthology log, I see
[...]I checked the MDS logs and could see hooks getting register successfully
[...]but the admin_socker::register_command is sending errno 17, there should've been lines like this but I found none:
[...]Checked OSD logs and found:
[...]looks like it's hitting:
[...]
at https://github.com/ceph/ceph/blob/main/src/osd/PrimaryLogPG.cc#L6863-L6865
and then suddenly the asok command fails saying 'No such file or directory'
[...]Again checked OSD logs here and it says:
[...]somehow the object nomore exists, strange.
The mirror daemon is most likely not running. Could you check the mirror daemon logs and see if you spot something?
Updated by Dhairya Parmar about 1 year ago
- Status changed from New to Need More Info
- Priority changed from Normal to Low
Looking at the logs, mirror daemon is missing and thus the command failed
2022-08-04T12:33:11.570 DEBUG:teuthology.orchestra.run.smithi037:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2
2022-08-04T12:33:11.685 DEBUG:teuthology.orchestra.run:got remote process result: 22
Mirror logs might tell something but unfortunately most of the logs are now gone, however a week before they were available and I did take backup of them but I could not find mirror log(s) in them, indicating that the mirror daemon didn't ran. This could be due to valgrind that makes everything really slow, but that surely doesn't much explain the truth, therefore, due to lack of logs there isn't much that I can do, I'll be looking for any relevant failures or if anyone finds it, do share it here.