Revision 38 - History - Quincy - CephFS - Ceph

Quincy » History » Revision 38

Revision 37 (Venky Shankar, 09/09/2022 09:57 AM) → Revision 38/90 (Venky Shankar, 09/29/2022 09:16 AM)

h1. Quincy 

 h2. On-call Schedule 

 * Feb: Patrick 
 * Mar: Jeff 
 * Apr: Jos Collin 
 * May: Ramana 
 * Jun: Xiubo 
 * Jul: Rishabh 
 * Aug: Kotresh 
 * Sep: Venky 
 * Oct: Milind 

 h2. 2022 Sep 29 

 http://pulpito.front.sepia.ceph.com/?branch=wip-yuri6-testing-2022-09-23-1008-quincy 

 * https://tracker.ceph.com/issues/57205 
     Test failure: test_subvolume_group_ls_filter_internal_directories (tasks.cephfs.test_volumes.TestSubvolumeGroups) 
 * https://tracker.ceph.com/issues/57446 
     qa: test_subvolume_snapshot_info_if_orphan_clone fails 
 * https://tracker.ceph.com/issues/50224 
     Test failure: test_mirroring_init_failure_with_recovery (tasks.cephfs.test_mirroring.TestMirroring) 
 * https://tracker.ceph.com/issues/57280 
     qa: tasks/kernel_cfuse_workunits_untarbuild_blogbench fails - Failed to fetch package version from shaman 
 * https://tracker.ceph.com/issues/50223 
     cluster [WRN] client.xxxx isn't responding to mclientcaps(revoke) 

 h2. 2022 Sep 09 

 http://pulpito.front.sepia.ceph.com/yuriw-2022-09-08_18:29:21-fs-wip-yuri6-testing-2022-09-08-0859-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/51282 
     cluster [WRN] Health check failed: Degraded data redundancy: 1 pg degraded (PG_DEGRADED)" in cluster log 
 * https://tracker.ceph.com/issues/50223 
     cluster [WRN] client.xxxx isn't responding to mclientcaps(revoke) 
 * https://tracker.ceph.com/issues/57205 
     Test failure: test_subvolume_group_ls_filter_internal_directories (tasks.cephfs.test_volumes.TestSubvolumeGroups) 
 * https://tracker.ceph.com/issues/57446 
     qa: test_subvolume_snapshot_info_if_orphan_clone fails 
 * https://tracker.ceph.com/issues/51964 
     Test failure: test_cephfs_mirror_restart_sync_on_blocklist (tasks.cephfs.test_mirroring.TestMirroring) 
 * https://tracker.ceph.com/issues/57280 
     Failed to fetch package version from https://shaman.ceph.com/api/search/?status=ready&project=kernel&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&ref=testing  

 h2. 2022 Sep 02 

 https://pulpito.ceph.com/yuriw-2022-09-01_18:27:02-fs-wip-yuri11-testing-2022-09-01-0804-quincy-distro-default-smithi/ 

 and 

 https://pulpito.ceph.com/?branch=wip-lflores-testing-2-2022-08-26-2240-quincy 

 * https://tracker.ceph.com/issues/57280 
     Failed to fetch package version from https://shaman.ceph.com/api/search/?status=ready&project=kernel&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&ref=testing  
 * https://tracker.ceph.com/issues/50223 
     cluster [WRN] client.xxxx isn't responding to mclientcaps(revoke) 
 * http://tracker.ceph.com/issues/52624 
   cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/48773 
     error during scrub thrashing: Command failed on smithi085 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub status' 
 * https://tracker.ceph.com/issues/54462 
   Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128 

 h2. 2022 Aug 31 

 https://pulpito.ceph.com/?branch=wip-yuri-testing-2022-08-23-1120-quincy 

 * https://tracker.ceph.com/issues/51964 
     Test failure: test_cephfs_mirror_restart_sync_on_blocklist (tasks.cephfs.test_mirroring.TestMirroring) 
 * https://tracker.ceph.com/issues/57280 
     Failed to fetch package version from https://shaman.ceph.com/api/search/?status=ready&project=kernel&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&ref=testing  
 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/48773 
     error during scrub thrashing: Command failed on smithi085 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub status' 
 * https://tracker.ceph.com/issues/50223 
     cluster [WRN] client.xxxx isn't responding to mclientcaps(revoke) 

 h2. 2022 Aug 17 

 https://pulpito.ceph.com/yuriw-2022-08-17_18:46:04-fs-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/ 

 There were following errors not related to tests which fixed in rerun: 

 * Command failed on smithi161 with status 127: "sudo /home/ubuntu/cephtest/cephadm --image docker.io/ceph/ceph:v16.2.4 shell -c ... -- bash -c 'ceph fs dump'" 
 * Failed to fetch package version from https://shaman.ceph.com/api/search/?status=ready&project=kernel&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&ref=testing  
 * reached maximum tries (90) after waiting for 540 seconds - DEBUG:teuthology.misc:7 of 8 OSDs are up 
 * https://tracker.ceph.com/issues/56697 - qa: fs/snaps fails for fuse - Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithi150 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp ..." 
 * SSH connection to smithi077 was lost: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'  

 Rerun: https://pulpito.ceph.com/yuriw-2022-08-18_15:08:53-fs-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624 
   cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log  
 * https://tracker.ceph.com/issues/51282 
   cluster [WRN] Health check failed: Degraded data redundancy: 1 pg degraded (PG_DEGRADED)" in cluster log 

 h2. 2022 Aug 10 

 http://pulpito.front.sepia.ceph.com/yuriw-2022-08-11_02:21:28-fs-wip-yuri-testing-2022-08-10-1103-quincy-distro-default-smithi/ 
 * Most of the failures are passed in re-run. Please check rerun failures below. 
   - tasks/{1-thrash/mon 2-workunit/fs/snaps - reached maximum tries (90) after waiting for 540 seconds - DEBUG:teuthology.misc:7 of 8 OSDs are up   
   - tasks/{1-thrash/osd 2-workunit/suites/iozone - reached maximum tries (90) after waiting for 540 seconds - DEBUG:teuthology.misc:7 of 8 OSDs are up 
   - tasks/metrics - cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
   - tasks/scrub - No module named 'tasks.cephfs.fuse_mount'  
   - tasks/{0-check-counter workunit/suites/iozone} wsync/{no}} - No module named 'tasks.fs'  
   - tasks/snap-schedule - cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
   - tasks/volumes/{overrides test/clone}} - No module named 'tasks.ceph' 
   - tasks/snapshots - CommandFailedError: Command failed on smithi035 with status 100: 'sudo DEBIAN_FRONTEND=noninteractive apt-get -y --force-yes - INFO:teuthology.orchestra.run.smithi035.stderr:E: Version '17.2.3-414-ge5c30ac2-1focal' for 'python-ceph' was not found - INFO:teuthology.orchestra.run.smithi035.stderr:E: Unable to locate package libcephfs1 
   - tasks/{0-octopus 1-client 2-upgrade 3-compat_client/no}} - No module named 'tasks.ceph' 
   - tasks/{1-thrash/osd 2-workunit/suites/pjd}} - No module named 'tasks.ceph' 
   - tasks/cfuse_workunit_suites_fsstress traceless/50pc} - No module named 'tasks' 
   - tasks/{0-octopus 1-upgrade}} - No module named 'tasks' 
   - tasks/{1-thrash/osd 2-workunit/fs/snaps}} - cluster [WRN] client.4520 isn't responding to mclientcaps(revoke), 
   - tasks/{1-thrash/mds 2-workunit/cfuse_workunit_snaptests}} - reached maximum tries (90) after waiting for 540 seconds - teuthology.misc:7 of 8 OSDs are up 

 Re-run1: http://pulpito.front.sepia.ceph.com/yuriw-2022-08-11_14:24:26-fs-wip-yuri-testing-2022-08-10-1103-quincy-distro-default-smithi/ 

 * tasks/{1-thrash/mon 2-workunit/fs/snaps - reached maximum tries (90) after waiting for 540 seconds  
   DEBUG:teuthology.misc:7 of 8 OSDs are up  
 * http://tracker.ceph.com/issues/52624 
   cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log  
 * https://tracker.ceph.com/issues/50223 
   cluster [WRN] client.xxxx isn't responding to mclientcaps(revoke) 
 * tasks/{1-thrash/mds 2-workunit/cfuse_workunit_snaptests}} - reached maximum tries (90) after waiting for 540 seconds  	
   DEBUG:teuthology.misc:7 of 8 OSDs are up 

 Re-run2: http://pulpito.front.sepia.ceph.com/yuriw-2022-08-16_14:46:15-fs-wip-yuri-testing-2022-08-10-1103-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624 
   cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log  

 h2. 2022 Aug 03 

 https://pulpito.ceph.com/yuriw-2022-08-04_11:54:20-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default-smithi/ 
 Re-run: https://pulpito.ceph.com/yuriw-2022-08-09_15:36:21-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default-smithi 

 * No module named 'tasks' - Fixed in re-run 

 * https://tracker.ceph.com/issues/51282 
   cluster [WRN] Health check failed: Degraded data redundancy: 1 pg degraded (PG_DEGRADED)" in cluster log 
 
 * https://tracker.ceph.com/issues/57064 
   qa: test_add_ancestor_and_child_directory failure 
  
 * http://tracker.ceph.com/issues/52624 
   cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 

 * https://tracker.ceph.com/issues/50223 
   cluster [WRN] client.xxxx isn't responding to mclientcaps(revoke) 

 h2. 2022 Jul 22 

 https://pulpito.ceph.com/yuriw-2022-07-11_13:37:40-fs-wip-yuri5-testing-2022-07-06-1020-quincy-distro-default-smithi/ 
 re-run: https://pulpito.ceph.com/yuriw-2022-07-12_13:37:44-fs-wip-yuri5-testing-2022-07-06-1020-quincy-distro-default-smithi/ 
 Most failure weren't seen in re-run. 

 * http://tracker.ceph.com/issues/52624 
   Health check failed: Reduced data availability 
 * https://tracker.ceph.com/issues/50223 
   client.xxxx isn't responding to mclientcaps(revoke) 
 * https://tracker.ceph.com/issues/54462 
   Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128 

 h2. 2022 Jul 13 

 https://pulpito.ceph.com/yuriw-2022-07-08_17:05:01-fs-wip-yuri2-testing-2022-07-08-0453-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624  
     cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/51964 
   Test failure: test_cephfs_mirror_restart_sync_on_blocklist (tasks.cephfs.test_mirroring.TestMirroring) 
 * https://tracker.ceph.com/issues/48773 
   error during scrub thrashing: Command failed on smithi085 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub status'  

 h2. 2022 Jun 08 

 http://pulpito.front.sepia.ceph.com/yuriw-2022-06-07_22:29:43-fs-wip-yuri3-testing-2022-06-07-0722-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 


 h2. 2022 Jun 07 

 http://pulpito.front.sepia.ceph.com/yuriw-2022-06-02_20:32:25-fs-wip-yuri5-testing-2022-06-02-0825-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 

 h2. 2022 Jun 03 

 https://pulpito.ceph.com/?branch=wip-yuri-testing-2022-06-02-0810-quincy 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/50223 
     qa: "client.4737 isn't responding to mclientcaps(revoke)" 
 * https://tracker.ceph.com/issues/54462 
     Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128 

 h2. 2022 May 31 

 http://pulpito.front.sepia.ceph.com/yuriw-2022-05-27_21:58:39-fs-wip-yuri2-testing-2022-05-27-1033-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 

 h2. 2022 May 26 

 https://pulpito.ceph.com/?branch=wip-yuri-testing-2022-05-10-1027-quincy 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/50223 
     qa: "client.4737 isn't responding to mclientcaps(revoke)" 
 * https://tracker.ceph.com/issues/54462 
     Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128 

 h2. 2022 May 10 

 http://pulpito.front.sepia.ceph.com/?branch=wip-yuri-testing-2022-05-05-0838-quincy 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/50223 
     qa: "client.4737 isn't responding to mclientcaps(revoke)" 

 h2. 2022 April 29 

 https://pulpito.ceph.com/?branch=wip-yuri3-testing-2022-04-22-0534-quincy 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/50223 
     qa: "client.4737 isn't responding to mclientcaps(revoke)" 
 * https://tracker.ceph.com/issues/54462 
     Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128 

 h2. 2022 April 13 

 http://pulpito.front.sepia.ceph.com/?branch=wip-yuri3-testing-2022-04-11-0746-quincy 

 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/50223 
     qa: "client.4737 isn't responding to mclientcaps(revoke)" 
 * https://tracker.ceph.com/issues/52438 
    qa: ffsb timeout 

 h2. 2022 March 31 

 http://pulpito.front.sepia.ceph.com/yuriw-2022-03-29_20:09:22-fs-wip-yuri-testing-2022-03-29-0741-quincy-distro-default-smithi/ 
 http://pulpito.front.sepia.ceph.com/yuriw-2022-03-30_14:35:58-fs-wip-yuri-testing-2022-03-29-0741-quincy-distro-default-smithi/ 


 * http://tracker.ceph.com/issues/52624 
     cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/54460 
     snaptest-multiple-capsnaps.sh test failure 
 * https://tracker.ceph.com/issues/50223 
     qa: "client.4737 isn't responding to mclientcaps(revoke)" 
 * http://tracker.ceph.com/issues/54606 
    check-counter task runs till max job timeout 

 Handful of failed jobs due to: 
 <pre> 
 Command failed on smithi055 with status 1: 'sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:c5bb4e7d582f118c1093d94fbfedfb197eaa03b4 -v bootstrap --fsid 44e07f86-b03b-11ec-8c35-001a4aab830c --config /home/ubuntu/cephtest/seed.ceph.conf --output-config /etc/ceph/ceph.conf --output-keyring /etc/ceph/ceph.client.admin.keyring --output-pub-ssh-key /home/ubuntu/cephtest/ceph.pub --mon-id a --mgr-id x --orphan-initial-daemons --skip-monitoring-stack --mon-ip 172.21.15.55 --skip-admin-label && sudo chmod +r /etc/ceph/ceph.client.admin.keyring' 
 </pre> 

 h2. 2022 March 17 

 http://pulpito.front.sepia.ceph.com/yuriw-2022-03-14_18:57:01-fs-wip-yuri2-testing-2022-03-14-0946-quincy-distro-default-smithi/ 

 * http://tracker.ceph.com/issues/52624 
    cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * http://tracker.ceph.com/issues/54461 
    ffsb.sh test failure 
 * http://tracker.ceph.com/issues/54606 
    check-counter task runs till max job timeout 

 Couple of jobs that are dead with: 

 <pre> 
     2022-03-15T05:15:22.447 ERROR:paramiko.transport:Socket exception: No route to host (113) 
     2022-03-15T05:15:22.452 DEBUG:teuthology.orchestra.run:got remote process result: None 
     2022-03-15T05:15:22.453 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.0... 
 </pre> 

 h2. 2022 March 1 

 * https://tracker.ceph.com/issues/51282 (maybe?) 
    cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster log 
 * https://tracker.ceph.com/issues/52624 
    cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log 
 * https://tracker.ceph.com/issues/54460 
    Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithi152 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=465157b30605a0c958df893de628c923386baa8e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-multiple-capsnaps.sh' 
 * https://tracker.ceph.com/issues/50223 
 cluster [WRN] client.14480 isn't responding to mclientcaps(revoke), ino 0x1000000f3fd pending pAsLsXsFsc issued pAsLsXsFscb, sent 304.933510 seconds ago" in cluster log 
 * https://tracker.ceph.com/issues/54461 
   Command failed (workunit test suites/ffsb.sh) on smithi124 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=465157b30605a0c958df893de628c923386baa8e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/ffsb.sh' 
 * https://tracker.ceph.com/issues/54462 
   Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=465157b30605a0c958df893de628c923386baa8e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-git-ceph.sh'

Project

General

Profile

Ceph » CephFS

Quincy » History » Revision 38