Project

General

Profile

Actions

Bug #64534

open

qa: test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite

Added by Jos Collin 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Testing
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/

test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/

This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.

Actions #1

Updated by Jos Collin about 2 months ago

  • Project changed from Ceph to CephFS
  • Category changed from qa to Testing
Actions #2

Updated by Milind Changire about 2 months ago

  • Assignee set to Jos Collin
Actions #3

Updated by Venky Shankar about 1 month ago

Jos Collin wrote:

test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/

test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/

This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.

Do you have details as in why test test was hung?

Actions #4

Updated by Jos Collin about 1 month ago

Venky Shankar wrote:

Jos Collin wrote:

test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/

test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/

This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.

Do you have details as in why test test was hung?

In the previously executed test: test_cephfs_mirror_cancel_mirroring_and_readd, below failure occurs during after the last 'counter dump'. Then it seems success in the next trial of the command.

 55117 2024-02-21T01:42:12.466 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph fs snapshot mirror disable cephfs
 55118 2024-02-21T01:42:12.907 INFO:teuthology.orchestra.run.smithi007.stdout:{}
 55119 2024-02-21T01:42:22.930 INFO:teuthology.orchestra.run:Running command with timeout 30
 55120 2024-02-21T01:42:22.930 DEBUG:teuthology.orchestra.run.smithi007:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ce\
       ph/cephfs-mirror.asok fs mirror status cephfs@14
 55121 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:no valid command found; 10 closest matches:
 55122 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:0
 55123 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:1
 55124 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:2
 55125 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:abort
 55126 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:assert
 55127 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:config diff
 55128 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config diff get <var>
 55129 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config get <var>
 55130 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config help [<var>]
 55131 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config set <var> <val>...
 55132 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:admin_socket: invalid command
 55133 2024-02-21T01:42:23.048 DEBUG:teuthology.orchestra.run:got remote process result: 22
 55134 2024-02-21T01:42:23.048 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" fa\
       iled: Command failed (mirror status for fs: cephfs) on smithi007 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mir\
       ror.asok fs mirror status cephfs@14'

Then after self.disable_mirroring call, the test starts a series of 'osd dump's and 'fs dump's with the following error, which doesn't seems normal and the test should be hanging in this activity.

 55894 2024-02-21T01:42:24.477 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:24.476+0000 7f1da70d9180 -1 mgr[py] Module ios\
       tat has missing NOTIFY_TYPES member
 55895 2024-02-21T01:42:24.641 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:24.636+0000 7f1da70d9180 -1 mgr[py] Module tel\
       emetry has missing NOTIFY_TYPES member
 55896 2024-02-21T01:42:24.740 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 1
 55897 2024-02-21T01:42:24.758 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_metadata pg_num
 55898 2024-02-21T01:42:25.047 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 32
 55899 2024-02-21T01:42:25.066 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_data pg_num
 55900 2024-02-21T01:42:25.086 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.084+0000 7f1da70d9180 -1 mgr[py] Module osd\
       _support has missing NOTIFY_TYPES member
 55901 2024-02-21T01:42:25.140 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.136+0000 7f1da70d9180 -1 mgr[py] Module sel\
       ftest has missing NOTIFY_TYPES member
 55902 2024-02-21T01:42:25.195 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.192+0000 7f1da70d9180 -1 mgr[py] Module pro\
       gress has missing NOTIFY_TYPES member
 55903 2024-02-21T01:42:25.275 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.272+0000 7f1da70d9180 -1 mgr[py] Module cra\
       sh has missing NOTIFY_TYPES member
 55904 2024-02-21T01:42:25.327 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.323+0000 7f1da70d9180 -1 mgr[py] Module tel\
       egraf has missing NOTIFY_TYPES member
 55905 2024-02-21T01:42:25.358 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 64
 55906 2024-02-21T01:42:25.376 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph osd pool get backup_fs_metadata pg_num
 55907 2024-02-21T01:42:25.407 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.403+0000 7f1da70d9180 -1 mgr[py] Module pg_\
       autoscaler has missing NOTIFY_TYPES member
 55908 2024-02-21T01:42:25.459 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.455+0000 7f1da70d9180 -1 mgr[py] Module zab\
       bix has missing NOTIFY_TYPES member
 55909 2024-02-21T01:42:25.608 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.603+0000 7f1da70d9180 -1 mgr[py] Module nfs\
        has missing NOTIFY_TYPES member
 55910 2024-02-21T01:42:25.661 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.659+0000 7f1da70d9180 -1 mgr[py] Module bal\
       ancer has missing NOTIFY_TYPES member
55949 2024-02-21T01:42:26.940 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph osd pool get .mgr pg_num
 55950 2024-02-21T01:42:26.947 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.943+0000 7f1da70d9180 -1 mgr[py] Module sta\
       tus has missing NOTIFY_TYPES member
 55951 2024-02-21T01:42:27.104 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.099+0000 7f1da70d9180 -1 mgr[py] Module vol\
       umes has missing NOTIFY_TYPES member
 55952 2024-02-21T01:42:27.182 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.179+0000 7f1da70d9180 -1 mgr[py] Module osd\
       _perf_query has missing NOTIFY_TYPES member
 55953 2024-02-21T01:42:27.230 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 1
 55954 2024-02-21T01:42:27.249 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_metadata pg_num
 55955 2024-02-21T01:42:27.483 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.479+0000 7f1da70d9180 -1 mgr[py] Module pro\
       metheus has missing NOTIFY_TYPES member
 55956 2024-02-21T01:42:27.540 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.535+0000 7f1da70d9180 -1 mgr[py] Module sna\
       p_schedule has missing NOTIFY_TYPES member
 55957 2024-02-21T01:42:27.590 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 32
 55958 2024-02-21T01:42:27.608 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_data pg_num
 55959 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55960 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55961 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55962 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55963 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55964 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55965 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55966 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55967 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55968 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55969 2024-02-21T01:42:27.658 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.655+0000 7f1d2f5eb640 -1 client.0 error reg\
       istering admin socket command: (17) File exists
 55970 2024-02-21T01:42:27.658 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.655+0000 7f1d2f5eb640 -1 client.0 error reg\
       istering admin socket command: (17) File exists

blocklist ls shows:

55919 2024-02-21T01:42:26.064 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
       hive/coverage timeout 120 ceph --cluster ceph osd blocklist ls
 55920 2024-02-21T01:42:26.153 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.151+0000 7f1da70d9180 -1 mgr[py] Module rbd\
       _support has missing NOTIFY_TYPES member
 55921 2024-02-21T01:42:26.219 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.215+0000 7f1da70d9180 -1 mgr[py] Module ale\
       rts has missing NOTIFY_TYPES member
 55922 2024-02-21T01:42:26.359 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.355+0000 7f1da70d9180 -1 mgr[py] Module tes\
       t_orchestrator has missing NOTIFY_TYPES member
 55923 2024-02-21T01:42:26.364 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1325339273 2024-02-21T02:42:15.181435+000\
       0
 55924 2024-02-21T01:42:26.364 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/574585810 2024-02-22T01:37:57.704079+0000
 55925 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/3959480242 2024-02-22T01:37:57.704079+000\
       0
 55926 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1985957039 2024-02-22T01:37:57.704079+000\
       0
 55927 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/2806878632 2024-02-22T01:37:57.704079+000\
       0
 55928 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/840312218 2024-02-22T01:37:57.704079+0000
 55929 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1395945597 2024-02-22T01:37:57.704079+000\
       0
 55930 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/2862425469 2024-02-22T01:37:57.704079+000\
       0
 55931 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/4005849246 2024-02-22T01:37:57.704079+000\
       0
 55932 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:6825/841721431 2024-02-22T01:37:57.704079+0\
       000
 55933 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:6824/841721431 2024-02-22T01:37:57.704079+0\
       000
 55934 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stderr:listed 11 entries

Actions

Also available in: Atom PDF