Bug #62245
openqa/workunits/libcephfs/test.sh failed: [ FAILED ] LibCephFS.DelegTimeout
0%
Description
/teuthology/vshankar-2023-07-25_11:29:34-fs-wip-vshankar-testing-20230725.043804-testing-default-smithi/7350738
2023-07-27T19:46:25.988 INFO:tasks.workunit.client.0.smithi005.stdout:[ RUN ] LibCephFS.DelegTimeout 2023-07-27T19:46:26.062 INFO:tasks.workunit.client.0.smithi005.stdout:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.0.0-5128-ga5d48e8c/rpm/el8/BUILD/ceph-18.0.0-5128-ga5d48e8c/src/test/libcephfs/deleg.cc:331: Failure 2023-07-27T19:46:26.063 INFO:tasks.workunit.client.0.smithi005.stdout:Expected equality of these values: 2023-07-27T19:46:26.063 INFO:tasks.workunit.client.0.smithi005.stdout: ceph_ll_getattr(cmount, root, &stx, 0, 0, perms) 2023-07-27T19:46:26.063 INFO:tasks.workunit.client.0.smithi005.stdout: Which is: 0 2023-07-27T19:46:26.063 INFO:tasks.workunit.client.0.smithi005.stdout: -107 2023-07-27T19:46:26.063 INFO:tasks.workunit.client.0.smithi005.stdout:[ FAILED ] LibCephFS.DelegTimeout (75 ms) ... 2023-07-27T19:56:05.458 INFO:teuthology.run:Summary data: description: fs/libcephfs/{begin/{0-install 1-ceph 2-logrotate} clusters/1-mds-1-client-coloc conf/{client mds mon osd} distro/{rhel_8} objectstore/bluestore-bitmap overrides/{ignorelist_health ignorelist_wrongly_marked_down no_client_pidfile} tasks/libcephfs/{frag test}} duration: 1900.9459915161133 failure_reason: 'Command failed (workunit test libcephfs/test.sh) on smithi005 with status 1: ''mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=a5d48e8c688c2543997a98b8acbae2ce9635578c TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/libcephfs/test.sh''' flavor: default mon.a-kernel-sha1: bbae19e0824be3daf9282aaf5330af10cdf81109 mon.b-kernel-sha1: bbae19e0824be3daf9282aaf5330af10cdf81109 owner: scheduled_vshankar@teuthology sentry_event: https://sentry.ceph.com/organizations/ceph/?query=9ece941c976c4976b4267936221dea06 status: fail success: false 2023-07-27T19:56:05.459 DEBUG:teuthology.report:Pushing job info to https://paddles.front.sepia.ceph.com/ 2023-07-27T19:56:05.547 INFO:teuthology.run:FAIL
Updated by Xiubo Li 9 months ago
The DelegTimeout test case itself it's buggy:
299 TEST(LibCephFS, DelegTimeout) { 300 struct ceph_mount_info *cmount; 301 ASSERT_EQ(ceph_create(&cmount, NULL), 0); 302 ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0); 303 ASSERT_EQ(0, ceph_conf_parse_env(cmount, NULL)); 304 ASSERT_EQ(ceph_mount(cmount, "/"), 0); 305 // tweak timeout to run quickly, since we don't plan to return it anyway 306 ASSERT_EQ(ceph_set_deleg_timeout(cmount, 2), 0); 307 308 Inode *root, *file; 309 ASSERT_EQ(ceph_ll_lookup_root(cmount, &root), 0); 310 311 char filename[32]; 312 sprintf(filename, "delegtimeo%x", getpid()); 313 314 Fh *fh; 315 struct ceph_statx stx; 316 UserPerm *perms = ceph_mount_perms(cmount); 317 318 ASSERT_EQ(ceph_ll_create(cmount, root, filename, 0666, 319 O_RDWR|O_CREAT|O_EXCL, &file, &fh, &stx, 0, 0, perms), 0); 320 321 /* Reopen read-only */ 322 ASSERT_EQ(ceph_ll_close(cmount, fh), 0); 323 ASSERT_EQ(ceph_ll_open(cmount, file, O_RDONLY, &fh, perms), 0); 324 325 std::atomic_bool recalled(false); 326 ASSERT_EQ(ceph_ll_delegation_wait(cmount, fh, CEPH_DELEGATION_RD, dummy_deleg_cb, &recalled), 0); 327 std::atomic_bool opened(false); 328 std::thread breaker1(open_breaker_func, nullptr, filename, O_RDWR, &opened); 329 breaker1.join(); 330 ASSERT_EQ(recalled.load(), true); 331 ASSERT_EQ(ceph_ll_getattr(cmount, root, &stx, 0, 0, perms), -CEPHFS_ENOTCONN); 332 ceph_release(cmount); 333 }
Because the MDS Locker allow multiple clients to open the file with rw and the lock state will be:
[LOCK_MIX] = { 0, false, LOCK_MIX, 0, 0, REQ, ANY, 0, 0, 0, CEPH_CAP_GRD|CEPH_CAP_GWR|CEPH_CAP_GLAZYIO,0,0,CEPH_CAP_GRD },
So the second mounter in child open_breaker_func thread when opening the file with rw mode, the client won't break the delegation in main thread and the child thread could finish very fast in 2 seconds, which is from 306 ASSERT_EQ(ceph_set_deleg_timeout(cmount, 2), 0); in main thread.
That means when in main thread Line#331, the delegation hasn't timedout yet. When the delegation times out it will umount the mounter directly and then when accessing the mounter later it should return -CEPHFS_ENOTCONN.
Updated by Milind Changire about 1 month ago
2024-03-22T10:37:17.172 INFO:tasks.workunit.client.0.smithi077.stdout:[ FAILED ] TestClient.AlternateNameSymlink 2024-03-22T10:37:17.172 INFO:tasks.workunit.client.0.smithi077.stdout:[ FAILED ] TestClient.AlternateNameLink
2024-03-22T10:42:16.714 INFO:tasks.workunit.client.0.smithi089.stdout:[ FAILED ] LibCephFS.LayoutVerifyDefaultLayout 2024-03-22T10:42:16.714 INFO:tasks.workunit.client.0.smithi089.stdout:[ FAILED ] LibCephFS.LayoutSetAndVerifyNewAndInheritedLayout 2024-03-22T10:42:16.714 INFO:tasks.workunit.client.0.smithi089.stdout:[ FAILED ] LibCephFS.GetAndSetDirDistribution 2024-03-22T10:42:16.714 INFO:tasks.workunit.client.0.smithi089.stdout:[ FAILED ] LibCephFS.GetAndSetDirRandom
Updated by Xiubo Li about 1 month ago
Milind Changire wrote:
Hi Milind,
This should be a different issue, not failed from [ FAILED ] LibCephFS.DelegTimeout (75 ms).