Bug #64534
openqa: test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
0%
Description
test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/
test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/
This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.
Updated by Jos Collin about 2 months ago
- Project changed from Ceph to CephFS
- Category changed from qa to Testing
Updated by Venky Shankar about 2 months ago
Jos Collin wrote:
test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.
Do you have details as in why test test was hung?
Updated by Jos Collin about 1 month ago
Venky Shankar wrote:
Jos Collin wrote:
test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.
Do you have details as in why test test was hung?
In the previously executed test: test_cephfs_mirror_cancel_mirroring_and_readd, below failure occurs during after the last 'counter dump'. Then it seems success in the next trial of the command.
55117 2024-02-21T01:42:12.466 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph fs snapshot mirror disable cephfs 55118 2024-02-21T01:42:12.907 INFO:teuthology.orchestra.run.smithi007.stdout:{} 55119 2024-02-21T01:42:22.930 INFO:teuthology.orchestra.run:Running command with timeout 30 55120 2024-02-21T01:42:22.930 DEBUG:teuthology.orchestra.run.smithi007:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ce\ ph/cephfs-mirror.asok fs mirror status cephfs@14 55121 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:no valid command found; 10 closest matches: 55122 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:0 55123 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:1 55124 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:2 55125 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:abort 55126 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:assert 55127 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:config diff 55128 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config diff get <var> 55129 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config get <var> 55130 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config help [<var>] 55131 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config set <var> <val>... 55132 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:admin_socket: invalid command 55133 2024-02-21T01:42:23.048 DEBUG:teuthology.orchestra.run:got remote process result: 22 55134 2024-02-21T01:42:23.048 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" fa\ iled: Command failed (mirror status for fs: cephfs) on smithi007 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mir\ ror.asok fs mirror status cephfs@14'
Then after self.disable_mirroring call, the test starts a series of 'osd dump's and 'fs dump's with the following error, which doesn't seems normal and the test should be hanging in this activity.
55894 2024-02-21T01:42:24.477 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:24.476+0000 7f1da70d9180 -1 mgr[py] Module ios\ tat has missing NOTIFY_TYPES member 55895 2024-02-21T01:42:24.641 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:24.636+0000 7f1da70d9180 -1 mgr[py] Module tel\ emetry has missing NOTIFY_TYPES member 55896 2024-02-21T01:42:24.740 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 1 55897 2024-02-21T01:42:24.758 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_metadata pg_num 55898 2024-02-21T01:42:25.047 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 32 55899 2024-02-21T01:42:25.066 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_data pg_num 55900 2024-02-21T01:42:25.086 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.084+0000 7f1da70d9180 -1 mgr[py] Module osd\ _support has missing NOTIFY_TYPES member 55901 2024-02-21T01:42:25.140 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.136+0000 7f1da70d9180 -1 mgr[py] Module sel\ ftest has missing NOTIFY_TYPES member 55902 2024-02-21T01:42:25.195 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.192+0000 7f1da70d9180 -1 mgr[py] Module pro\ gress has missing NOTIFY_TYPES member 55903 2024-02-21T01:42:25.275 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.272+0000 7f1da70d9180 -1 mgr[py] Module cra\ sh has missing NOTIFY_TYPES member 55904 2024-02-21T01:42:25.327 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.323+0000 7f1da70d9180 -1 mgr[py] Module tel\ egraf has missing NOTIFY_TYPES member 55905 2024-02-21T01:42:25.358 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 64 55906 2024-02-21T01:42:25.376 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph osd pool get backup_fs_metadata pg_num 55907 2024-02-21T01:42:25.407 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.403+0000 7f1da70d9180 -1 mgr[py] Module pg_\ autoscaler has missing NOTIFY_TYPES member 55908 2024-02-21T01:42:25.459 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.455+0000 7f1da70d9180 -1 mgr[py] Module zab\ bix has missing NOTIFY_TYPES member 55909 2024-02-21T01:42:25.608 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.603+0000 7f1da70d9180 -1 mgr[py] Module nfs\ has missing NOTIFY_TYPES member 55910 2024-02-21T01:42:25.661 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.659+0000 7f1da70d9180 -1 mgr[py] Module bal\ ancer has missing NOTIFY_TYPES member
55949 2024-02-21T01:42:26.940 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph osd pool get .mgr pg_num 55950 2024-02-21T01:42:26.947 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.943+0000 7f1da70d9180 -1 mgr[py] Module sta\ tus has missing NOTIFY_TYPES member 55951 2024-02-21T01:42:27.104 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.099+0000 7f1da70d9180 -1 mgr[py] Module vol\ umes has missing NOTIFY_TYPES member 55952 2024-02-21T01:42:27.182 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.179+0000 7f1da70d9180 -1 mgr[py] Module osd\ _perf_query has missing NOTIFY_TYPES member 55953 2024-02-21T01:42:27.230 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 1 55954 2024-02-21T01:42:27.249 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_metadata pg_num 55955 2024-02-21T01:42:27.483 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.479+0000 7f1da70d9180 -1 mgr[py] Module pro\ metheus has missing NOTIFY_TYPES member 55956 2024-02-21T01:42:27.540 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.535+0000 7f1da70d9180 -1 mgr[py] Module sna\ p_schedule has missing NOTIFY_TYPES member 55957 2024-02-21T01:42:27.590 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 32 55958 2024-02-21T01:42:27.608 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_data pg_num 55959 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55960 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55961 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55962 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55963 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55964 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55965 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55966 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55967 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55968 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55969 2024-02-21T01:42:27.658 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.655+0000 7f1d2f5eb640 -1 client.0 error reg\ istering admin socket command: (17) File exists 55970 2024-02-21T01:42:27.658 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.655+0000 7f1d2f5eb640 -1 client.0 error reg\ istering admin socket command: (17) File exists
blocklist ls shows:
55919 2024-02-21T01:42:26.064 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\ hive/coverage timeout 120 ceph --cluster ceph osd blocklist ls 55920 2024-02-21T01:42:26.153 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.151+0000 7f1da70d9180 -1 mgr[py] Module rbd\ _support has missing NOTIFY_TYPES member 55921 2024-02-21T01:42:26.219 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.215+0000 7f1da70d9180 -1 mgr[py] Module ale\ rts has missing NOTIFY_TYPES member 55922 2024-02-21T01:42:26.359 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.355+0000 7f1da70d9180 -1 mgr[py] Module tes\ t_orchestrator has missing NOTIFY_TYPES member 55923 2024-02-21T01:42:26.364 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1325339273 2024-02-21T02:42:15.181435+000\ 0 55924 2024-02-21T01:42:26.364 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/574585810 2024-02-22T01:37:57.704079+0000 55925 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/3959480242 2024-02-22T01:37:57.704079+000\ 0 55926 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1985957039 2024-02-22T01:37:57.704079+000\ 0 55927 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/2806878632 2024-02-22T01:37:57.704079+000\ 0 55928 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/840312218 2024-02-22T01:37:57.704079+0000 55929 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1395945597 2024-02-22T01:37:57.704079+000\ 0 55930 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/2862425469 2024-02-22T01:37:57.704079+000\ 0 55931 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/4005849246 2024-02-22T01:37:57.704079+000\ 0 55932 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:6825/841721431 2024-02-22T01:37:57.704079+0\ 000 55933 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:6824/841721431 2024-02-22T01:37:57.704079+0\ 000 55934 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stderr:listed 11 entries