Project

General

Profile

Main » History » Version 89

Version 88 (Patrick Donnelly, 09/27/2022 02:26 AM) → Version 89/108 (Patrick Donnelly, 09/27/2022 12:57 PM)

h1. MAIN

h3. 2022 Sep 26

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20220923.171109

* https://tracker.ceph.com/issues/55804
qa failure: pjd link tests failed
* https://tracker.ceph.com/issues/57676
qa: error during scrub thrashing: rank damage found: {'backtrace'}
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/57580
Test failure: test_newops_getvxattr (tasks.cephfs.test_newops.TestNewOps)
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/57299
qa: test_dump_loads fails with JSONDecodeError
* https://tracker.ceph.com/issues/57280
qa: tasks/kernel_cfuse_workunits_untarbuild_blogbench fails - Failed to fetch package version from shaman
* https://tracker.ceph.com/issues/57205
Test failure: test_subvolume_group_ls_filter_internal_directories (tasks.cephfs.test_volumes.TestSubvolumeGroups)
* https://tracker.ceph.com/issues/57656
[testing] dbench: write failed on handle 10009 (Resource temporarily unavailable)
* https://tracker.ceph.com/issues/57677
qa: "1 MDSs behind on trimming (MDS_TRIM)"
* https://tracker.ceph.com/issues/57206
libcephfs/test.sh: ceph_test_libcephfs_reclaim
* https://tracker.ceph.com/issues/57446
qa: test_subvolume_snapshot_info_if_orphan_clone fails
* https://tracker.ceph.com/issues/57655 [Exist in main as well]
qa: fs:mixed-clients kernel_untar_build failure
* https://tracker.ceph.com/issues/57682
client: ERROR: test_reconnect_after_blocklisted


h3. 2022 Sep 22

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20220920.234701

* https://tracker.ceph.com/issues/57299
qa: test_dump_loads fails with JSONDecodeError
* https://tracker.ceph.com/issues/57205
Test failure: test_subvolume_group_ls_filter_internal_directories (tasks.cephfs.test_volumes.TestSubvolumeGroups)
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/57580
Test failure: test_newops_getvxattr (tasks.cephfs.test_newops.TestNewOps)
* https://tracker.ceph.com/issues/57280
qa: tasks/kernel_cfuse_workunits_untarbuild_blogbench fails - Failed to fetch package version from shaman
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/56446
Test failure: test_client_cache_size (tasks.cephfs.test_client_limits.TestClientLimits)
* https://tracker.ceph.com/issues/57206
libcephfs/test.sh: ceph_test_libcephfs_reclaim
* https://tracker.ceph.com/issues/51267
CommandFailedError: Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithi096 with status 1:...

NEW:

* https://tracker.ceph.com/issues/57656
[testing] dbench: write failed on handle 10009 (Resource temporarily unavailable)
* https://tracker.ceph.com/issues/57655 [Exist in main as well]
qa: fs:mixed-clients kernel_untar_build failure
* https://tracker.ceph.com/issues/57657
mds: scrub locates mismatch between child accounted_rstats and self rstats

Segfault probably caused by: https://github.com/ceph/ceph/pull/47795#issuecomment-1255724799

h3. 2022 Sep 16

https://pulpito.ceph.com/?branch=wip-vshankar-testing1-20220905-132828

* https://tracker.ceph.com/issues/57446
qa: test_subvolume_snapshot_info_if_orphan_clone fails
* https://tracker.ceph.com/issues/57299
qa: test_dump_loads fails with JSONDecodeError
* https://tracker.ceph.com/issues/50223
client.xxxx isn't responding to mclientcaps(revoke)
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/57205
Test failure: test_subvolume_group_ls_filter_internal_directories (tasks.cephfs.test_volumes.TestSubvolumeGroups)
* https://tracker.ceph.com/issues/57280
qa: tasks/kernel_cfuse_workunits_untarbuild_blogbench fails - Failed to fetch package version from shaman
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/48203
https://tracker.ceph.com/issues/36593
qa: quota failure
qa: quota failure caused by clients stepping on each other
* https://tracker.ceph.com/issues/57580
Test failure: test_newops_getvxattr (tasks.cephfs.test_newops.TestNewOps)

h3. 2022 Aug 26

http://pulpito.front.sepia.ceph.com/rishabh-2022-08-22_17:49:59-fs-wip-rishabh-testing-2022Aug19-testing-default-smithi/
http://pulpito.front.sepia.ceph.com/rishabh-2022-08-24_11:56:51-fs-wip-rishabh-testing-2022Aug19-testing-default-smithi/

* https://tracker.ceph.com/issues/57206
libcephfs/test.sh: ceph_test_libcephfs_reclaim
* https://tracker.ceph.com/issues/56632
Test failure: test_subvolume_snapshot_clone_quota_exceeded (tasks.cephfs.test_volumes.TestSubvolumeSnapshotClones)
* https://tracker.ceph.com/issues/56446
Test failure: test_client_cache_size (tasks.cephfs.test_client_limits.TestClientLimits)
* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/53859
qa: Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)

* https://tracker.ceph.com/issues/54460
Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithixxx with status 1
* https://tracker.ceph.com/issues/54462
Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128
* https://tracker.ceph.com/issues/54460
Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithixxx with status 1
* https://tracker.ceph.com/issues/36593
Command failed (workunit test fs/quota/quota.sh) on smithixxx with status 1

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/55804
Command failed (workunit test suites/pjd.sh)
* https://tracker.ceph.com/issues/50223
client.xxxx isn't responding to mclientcaps(revoke)

h3. 2022 Aug 22

https://pulpito.ceph.com/vshankar-2022-08-12_09:34:24-fs-wip-vshankar-testing1-20220812-072441-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2022-08-18_04:30:42-fs-wip-vshankar-testing1-20220818-082047-testing-default-smithi/ (drop problematic PR and re-run)

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/56446
Test failure: test_client_cache_size (tasks.cephfs.test_client_limits.TestClientLimits)
* https://tracker.ceph.com/issues/55804
Command failed (workunit test suites/pjd.sh)
* https://tracker.ceph.com/issues/51278
mds: "FAILED ceph_assert(!segments.empty())"
* https://tracker.ceph.com/issues/54460
Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithixxx with status 1
* https://tracker.ceph.com/issues/57205
Test failure: test_subvolume_group_ls_filter_internal_directories (tasks.cephfs.test_volumes.TestSubvolumeGroups)
* https://tracker.ceph.com/issues/57206
ceph_test_libcephfs_reclaim crashes during test
* https://tracker.ceph.com/issues/53859
Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)
* https://tracker.ceph.com/issues/50223
client.xxxx isn't responding to mclientcaps(revoke)

h3. 2022 Aug 12

https://pulpito.ceph.com/vshankar-2022-08-10_04:06:00-fs-wip-vshankar-testing-20220805-190751-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2022-08-11_12:16:58-fs-wip-vshankar-testing-20220811-145809-testing-default-smithi/ (drop problematic PR and re-run)

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/56446
Test failure: test_client_cache_size (tasks.cephfs.test_client_limits.TestClientLimits)
* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/55804
Command failed (workunit test suites/pjd.sh)
* https://tracker.ceph.com/issues/50223
client.xxxx isn't responding to mclientcaps(revoke)
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/54460
Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithixxx with status 1

h3. 2022 Aug 04

https://pulpito.ceph.com/?branch=wip-vshankar-testing1-20220804-123835 (only mgr/volumes, mgr/stats)

Unrealted teuthology failure on rhel

h3. 2022 Jul 25

http://pulpito.front.sepia.ceph.com/rishabh-2022-07-22_11:34:20-fs-wip-rishabh-testing-2022Jul22-1400-testing-default-smithi/

1st re-run: http://pulpito.front.sepia.ceph.com/rishabh-2022-07-24_03:51:19-fs-wip-rishabh-testing-2022Jul22-1400-testing-default-smithi
2nd re-run: http://pulpito.front.sepia.ceph.com/rishabh-2022-07-24_08:53:36-fs-wip-rishabh-testing-2022Jul22-1400-testing-default-smithi/
3rd re-run: http://pulpito.front.sepia.ceph.com/rishabh-2022-07-24_08:53:36-fs-wip-rishabh-testing-2022Jul22-1400-testing-default-smithi/
4th (final) re-run: http://pulpito.front.sepia.ceph.com/rishabh-2022-07-28_03:59:01-fs-wip-rishabh-testing-2022Jul28-0143-testing-default-smithi/

* https://tracker.ceph.com/issues/55804
Command failed (workunit test suites/pjd.sh)
* https://tracker.ceph.com/issues/50223
client.xxxx isn't responding to mclientcaps(revoke)

* https://tracker.ceph.com/issues/54460
Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithixxx with status 1
* https://tracker.ceph.com/issues/36593
Command failed (workunit test fs/quota/quota.sh) on smithixxx with status 1
* https://tracker.ceph.com/issues/54462
Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128~

h3. 2022 July 22

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20220721.235756

MDS_HEALTH_DUMMY error in log fixed by followup commit.
transient selinux ping failure

* https://tracker.ceph.com/issues/56694
qa: avoid blocking forever on hung umount
* https://tracker.ceph.com/issues/56695
[RHEL stock] pjd test failures
* https://tracker.ceph.com/issues/56696
admin keyring disappears during qa run
* https://tracker.ceph.com/issues/56697
qa: fs/snaps fails for fuse
* https://tracker.ceph.com/issues/50222
osd: 5.2s0 deep-scrub : stat mismatch
* https://tracker.ceph.com/issues/56698
client: FAILED ceph_assert(_size == 0)
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"

h3. 2022 Jul 15

http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-testing-2022Jul08-1820-testing-default-smithi/

re-run: http://pulpito.front.sepia.ceph.com/rishabh-2022-07-15_06:42:04-fs-wip-rishabh-testing-2022Jul08-1820-testing-default-smithi/

* https://tracker.ceph.com/issues/53859
Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)
* https://tracker.ceph.com/issues/55804
Command failed (workunit test suites/pjd.sh)
* https://tracker.ceph.com/issues/50223
client.xxxx isn't responding to mclientcaps(revoke)
* https://tracker.ceph.com/issues/50222
osd: deep-scrub : stat mismatch

* https://tracker.ceph.com/issues/56632
Test failure: test_subvolume_snapshot_clone_quota_exceeded (tasks.cephfs.test_volumes.TestSubvolumeSnapshotClones)
* https://tracker.ceph.com/issues/56634
workunit test fs/snaps/snaptest-intodir.sh
* https://tracker.ceph.com/issues/56644
Test failure: test_rapid_creation (tasks.cephfs.test_fragment.TestFragmentation)

h3. 2022 July 05

http://pulpito.front.sepia.ceph.com/rishabh-2022-07-02_14:14:52-fs-wip-rishabh-testing-20220702-1631-testing-default-smithi/

On 1st re-run some jobs passed - http://pulpito.front.sepia.ceph.com/rishabh-2022-07-03_15:10:28-fs-wip-rishabh-testing-20220702-1631-distro-default-smithi/

On 2nd re-run only few jobs failed -
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-06_05:24:29-fs-wip-rishabh-testing-20220705-2132-distro-default-smithi/
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-06_05:24:29-fs-wip-rishabh-testing-20220705-2132-distro-default-smithi/

* https://tracker.ceph.com/issues/56446
Test failure: test_client_cache_size (tasks.cephfs.test_client_limits.TestClientLimits)
* https://tracker.ceph.com/issues/55804
Command failed (workunit test suites/pjd.sh) on smithi047 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/

* https://tracker.ceph.com/issues/56445
Command failed on smithi080 with status 123: "find /home/ubuntu/cephtest/archive/syslog -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --"
* https://tracker.ceph.com/issues/51267
Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithi098 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1
* https://tracker.ceph.com/issues/50224
Test failure: test_mirroring_init_failure_with_recovery (tasks.cephfs.test_mirroring.TestMirroring)

h3. 2022 July 04

https://pulpito.ceph.com/vshankar-2022-06-29_09:19:00-fs-wip-vshankar-testing-20220627-100931-testing-default-smithi/
(rhel runs were borked due to: https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/JSZQFUKVLDND4W33PXDGCABPHNSPT6SS/, tests ran with --filter-out=rhel)

* https://tracker.ceph.com/issues/56445
Command failed on smithi162 with status 123: "find /home/ubuntu/cephtest/archive/syslog -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --"
* https://tracker.ceph.com/issues/56446
Test failure: test_client_cache_size (tasks.cephfs.test_client_limits.TestClientLimits)
* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"

h3. 2022 June 20

https://pulpito.ceph.com/vshankar-2022-06-15_04:03:39-fs-wip-vshankar-testing1-20220615-072516-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2022-06-19_08:22:46-fs-wip-vshankar-testing1-20220619-102531-testing-default-smithi/

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/55804
qa failure: pjd link tests failed
* https://tracker.ceph.com/issues/54108
qa: iogen workunit: "The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'}"
* https://tracker.ceph.com/issues/55332
Failure in snaptest-git-ceph.sh (it's an async unlink/create bug)

h3. 2022 June 13

https://pulpito.ceph.com/pdonnell-2022-06-12_05:08:12-fs:workload-wip-pdonnell-testing-20220612.004943-distro-default-smithi/

* https://tracker.ceph.com/issues/56024
cephadm: removes ceph.conf during qa run causing command failure
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/56012
mds: src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_ any_replay())

h3. 2022 Jun 13

https://pulpito.ceph.com/vshankar-2022-06-07_00:25:50-fs-wip-vshankar-testing-20220606-223254-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2022-06-10_01:04:46-fs-wip-vshankar-testing-20220609-175550-testing-default-smithi/

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/53859
qa: Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)
* https://tracker.ceph.com/issues/55804
qa failure: pjd link tests failed
* https://tracker.ceph.com/issues/56003
client: src/include/xlist.h: 81: FAILED ceph_assert(_size == 0)
* https://tracker.ceph.com/issues/56011
fs/thrash: snaptest-snap-rm-cmp.sh fails in mds5sum comparison
* https://tracker.ceph.com/issues/56012
mds: src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_ any_replay())

h3. 2022 Jun 07

https://pulpito.ceph.com/vshankar-2022-06-06_21:25:41-fs-wip-vshankar-testing1-20220606-230129-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2022-06-07_10:53:31-fs-wip-vshankar-testing1-20220607-104134-testing-default-smithi/ (rerun after dropping a problematic PR)

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/50224
qa: test_mirroring_init_failure_with_recovery failure

h3. 2022 May 12

https://pulpito.ceph.com/?branch=wip-vshankar-testing-20220509-125847
https://pulpito.ceph.com/vshankar-2022-05-13_17:09:16-fs-wip-vshankar-testing-20220513-120051-testing-default-smithi/ (drop prs + rerun)

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/55332
Failure in snaptest-git-ceph.sh
* https://tracker.ceph.com/issues/53859
qa: Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)
* https://tracker.ceph.com/issues/55538
Test failure: test_flush (tasks.cephfs.test_readahead.TestReadahead)
* https://tracker.ceph.com/issues/55258
lots of "heartbeat_check: no reply from X.X.X.X" in OSD logs (cropss up again, though very infrequent)

h3. 2022 May 04

https://pulpito.ceph.com/vshankar-2022-05-01_13:18:44-fs-wip-vshankar-testing1-20220428-204527-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2022-05-02_16:58:59-fs-wip-vshankar-testing1-20220502-201957-testing-default-smithi/ (after dropping PRs)

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/55332
Failure in snaptest-git-ceph.sh
* https://tracker.ceph.com/issues/53859
qa: Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)
* https://tracker.ceph.com/issues/55516
qa: fs suite tests failing with "json.decoder.JSONDecodeError: Extra data: line 2 column 82 (char 82)"
* https://tracker.ceph.com/issues/55537
mds: crash during fs:upgrade test
* https://tracker.ceph.com/issues/55538
Test failure: test_flush (tasks.cephfs.test_readahead.TestReadahead)

h3. 2022 Apr 25

https://pulpito.ceph.com/?branch=wip-vshankar-testing-20220420-113951 (owner vshankar)

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/55258
lots of "heartbeat_check: no reply from X.X.X.X" in OSD logs
* https://tracker.ceph.com/issues/55377
kclient: mds revoke Fwb caps stuck after the kclient tries writebcak once

h3. 2022 Apr 14

https://pulpito.ceph.com/?branch=wip-vshankar-testing1-20220411-144044

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/52438
qa: ffsb timeout
* https://tracker.ceph.com/issues/55170
mds: crash during rejoin (CDir::fetch_keys)
* https://tracker.ceph.com/issues/55331
pjd failure
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/55332
Failure in snaptest-git-ceph.sh
* https://tracker.ceph.com/issues/55258
lots of "heartbeat_check: no reply from X.X.X.X" in OSD logs

h3. 2022 Apr 11

https://pulpito.ceph.com/?branch=wip-vshankar-testing-55110-20220408-203242

* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/52438
qa: ffsb timeout
* https://tracker.ceph.com/issues/48680
mds: scrubbing stuck "scrub active (0 inodes in the stack)"
* https://tracker.ceph.com/issues/55236
qa: fs/snaps tests fails with "hit max job timeout"
* https://tracker.ceph.com/issues/54108
qa: iogen workunit: "The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'}"
* https://tracker.ceph.com/issues/54971
Test failure: test_perf_stats_stale_metrics (tasks.cephfs.test_mds_metrics.TestMDSMetrics)
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/55258
lots of "heartbeat_check: no reply from X.X.X.X" in OSD logs

h3. 2022 Mar 21

https://pulpito.ceph.com/vshankar-2022-03-20_02:16:37-fs-wip-vshankar-testing-20220319-163539-testing-default-smithi/

Run didn't go well, lots of failures - debugging by dropping PRs and running against master branch. Only merging unrelated PRs that pass tests.

h3. 2022 Mar 08

https://pulpito.ceph.com/vshankar-2022-02-28_04:32:15-fs-wip-vshankar-testing-20220226-211550-testing-default-smithi/

rerun with
- (drop) https://github.com/ceph/ceph/pull/44679
- (drop) https://github.com/ceph/ceph/pull/44958
https://pulpito.ceph.com/vshankar-2022-03-06_14:47:51-fs-wip-vshankar-testing-20220304-132102-testing-default-smithi/

* https://tracker.ceph.com/issues/54419 (new)
`ceph orch upgrade start` seems to never reach completion
* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/52438
qa: ffsb timeout
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing

h3. 2022 Feb 09

https://pulpito.ceph.com/vshankar-2022-02-05_17:27:49-fs-wip-vshankar-testing-20220201-113815-testing-default-smithi/

rerun with
- (drop) https://github.com/ceph/ceph/pull/37938
- (drop) https://github.com/ceph/ceph/pull/44335
- (drop) https://github.com/ceph/ceph/pull/44491
- (drop) https://github.com/ceph/ceph/pull/44501
https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/

* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/54066
test_subvolume_no_upgrade_v1_sanity fails with `AssertionError: 1000 != 0`
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/52438
qa: ffsb timeout

h3. 2022 Feb 01

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20220127.171526

* https://tracker.ceph.com/issues/54107
kclient: hang during umount
* https://tracker.ceph.com/issues/54106
kclient: hang during workunit cleanup
* https://tracker.ceph.com/issues/54108
qa: iogen workunit: "The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'}"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/52438
qa: ffsb timeout

h3. 2022 Jan 13

https://pulpito.ceph.com/vshankar-2022-01-06_13:18:41-fs-wip-vshankar-testing-20220106-145819-testing-default-smithi/

rerun with:
- (add) https://github.com/ceph/ceph/pull/44570
- (drop) https://github.com/ceph/ceph/pull/43184
https://pulpito.ceph.com/vshankar-2022-01-13_04:42:40-fs-wip-vshankar-testing-20220106-145819-testing-default-smithi/

* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/53859
qa: Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)

h3. 2022 Jan 03

https://pulpito.ceph.com/vshankar-2021-12-22_07:37:44-fs-wip-vshankar-testing-20211216-114012-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2022-01-03_12:27:45-fs-wip-vshankar-testing-20220103-142738-testing-default-smithi/ (rerun)

* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/51267
CommandFailedError: Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithi096 with status 1:...
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/51278
mds: "FAILED ceph_assert(!segments.empty())"
* https://tracker.ceph.com/issues/52279
cephadm tests fail due to: error adding seccomp filter rule for syscall bdflush: requested action matches default action of filter

h3. 2021 Dec 22

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20211222.014316

* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/52279
cephadm tests fail due to: error adding seccomp filter rule for syscall bdflush: requested action matches default action of filter
* https://tracker.ceph.com/issues/50224
qa: test_mirroring_init_failure_with_recovery failure
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete

h3. 2021 Nov 30

https://pulpito.ceph.com/vshankar-2021-11-24_07:14:27-fs-wip-vshankar-testing-20211124-094330-testing-default-smithi/
https://pulpito.ceph.com/vshankar-2021-11-30_06:23:32-fs-wip-vshankar-testing-20211124-094330-distro-default-smithi/ (rerun w/ QA fixes)

* https://tracker.ceph.com/issues/53436
mds, mon: mds beacon messages get dropped? (mds never reaches up:active state)
* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/48812
qa: test_scrub_pause_and_resume_with_abort failure
* https://tracker.ceph.com/issues/51076
"wait_for_recovery: failed before timeout expired" during thrashosd test with EC backend.
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")

h3. 2021 November 9

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20211109.180315

* https://tracker.ceph.com/issues/53214
qa: "dd: error reading '/sys/kernel/debug/ceph/2a934501-6731-4052-a836-f42229a869be.client4874/metrics': Is a directory"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/53216
qa: "RuntimeError: value of attributes should be either str or None. client_id"
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")

h3. 2021 November 03

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20211103.023355

* https://tracker.ceph.com/issues/51964
qa: test_cephfs_mirror_restart_sync_on_blocklist failure
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/52436
fs/ceph: "corrupt mdsmap"
* https://tracker.ceph.com/issues/53074
pybind/mgr/cephadm: upgrade sequence does not continue if no MDS are active
* https://tracker.ceph.com/issues/53150
pybind/mgr/cephadm/upgrade: tolerate MDS failures during upgrade straddling v16.2.5
* https://tracker.ceph.com/issues/53155
MDSMonitor: assertion during upgrade to v16.2.5+

h3. 2021 October 26

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20211025.000447

* https://tracker.ceph.com/issues/53074
pybind/mgr/cephadm: upgrade sequence does not continue if no MDS are active
* https://tracker.ceph.com/issues/52997
testing: hang ing umount
* https://tracker.ceph.com/issues/50824
qa: snaptest-git-ceph bus error
* https://tracker.ceph.com/issues/52436
fs/ceph: "corrupt mdsmap"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/53082
ceph-fuse: segmenetation fault in Client::handle_mds_map
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50224
qa: test_mirroring_init_failure_with_recovery failure
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")

h3. 2021 October 19

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20211019.013028

* https://tracker.ceph.com/issues/52995
qa: test_standby_count_wanted failure
* https://tracker.ceph.com/issues/52948
osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
* https://tracker.ceph.com/issues/52996
qa: test_perf_counters via test_openfiletable
* https://tracker.ceph.com/issues/48772
qa: pjd: not ok 9, 44, 80
* https://tracker.ceph.com/issues/52997
testing: hang ing umount
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete

h3. 2021 October 12

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20211012.192211

Some failures caused by teuthology bug: https://tracker.ceph.com/issues/52944

New test caused failure: https://github.com/ceph/ceph/pull/43297#discussion_r729883167

* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/52948
osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/50224
qa: test_mirroring_init_failure_with_recovery failure
* https://tracker.ceph.com/issues/52949
RuntimeError: The following counters failed to be set on mds daemons: {'mds.dir_split'}

h3. 2021 October 02

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20211002.163337

Some failures caused by cephadm upgrade test. Fixed in follow-up qa commit.

test_simple failures caused by PR in this set.

A few reruns because of QA infra noise.

* https://tracker.ceph.com/issues/52822
qa: failed pacific install on fs:upgrade
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete

h3. 2021 September 20

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210917.174826

* https://tracker.ceph.com/issues/52677
qa: test_simple failure
* https://tracker.ceph.com/issues/51279
kclient hangs on umount (testing branch)
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/52438
qa: ffsb timeout

h3. 2021 September 10

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210910.181451

* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")
* https://tracker.ceph.com/issues/52624
qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
* https://tracker.ceph.com/issues/52625
qa: test_kill_mdstable (tasks.cephfs.test_snapshots.TestSnapshots)
* https://tracker.ceph.com/issues/52439
qa: acls does not compile on centos stream
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/52626
mds: ScrubStack.cc: 831: FAILED ceph_assert(diri)
* https://tracker.ceph.com/issues/51279
kclient hangs on umount (testing branch)

h3. 2021 August 27

Several jobs died because of device failures.

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210827.024746

* https://tracker.ceph.com/issues/52430
mds: fast async create client mount breaks racy test
* https://tracker.ceph.com/issues/52436
fs/ceph: "corrupt mdsmap"
* https://tracker.ceph.com/issues/52437
mds: InoTable::replay_release_ids abort via test_inotable_sync
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/52438
qa: ffsb timeout
* https://tracker.ceph.com/issues/52439
qa: acls does not compile on centos stream

h3. 2021 July 30

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210729.214022

* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/51975
pybind/mgr/stats: KeyError

h3. 2021 July 28

https://pulpito.ceph.com/pdonnell-2021-07-28_00:39:45-fs-wip-pdonnell-testing-20210727.213757-distro-basic-smithi/

with qa fix: https://pulpito.ceph.com/pdonnell-2021-07-28_16:20:28-fs-wip-pdonnell-testing-20210728.141004-distro-basic-smithi/

* https://tracker.ceph.com/issues/51905
qa: "error reading sessionmap 'mds1_sessionmap'"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")
* https://tracker.ceph.com/issues/51267
CommandFailedError: Command failed (workunit test fs/snaps/snaptest-multiple-capsnaps.sh) on smithi096 with status 1:...
* https://tracker.ceph.com/issues/51279
kclient hangs on umount (testing branch)

h3. 2021 July 16

https://pulpito.ceph.com/pdonnell-2021-07-16_05:50:11-fs-wip-pdonnell-testing-20210716.022804-distro-basic-smithi/

* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/48772
qa: pjd: not ok 9, 44, 80
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/51279
kclient hangs on umount (testing branch)
* https://tracker.ceph.com/issues/50824
qa: snaptest-git-ceph bus error

h3. 2021 July 04

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210703.052904

* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/39150
mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/48771
qa: iogen: workload fails to cause balancing
* https://tracker.ceph.com/issues/51279
kclient hangs on umount (testing branch)
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details" ("freshly-calculated rstats don't match existing ones")

h3. 2021 July 01

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210701.192056

* https://tracker.ceph.com/issues/51197
qa: [WRN] Scrub error on inode 0x10000001520 (/client.0/tmp/t/linux-5.4/Documentation/driver-api) see mds.f log and `damage ls` output for details
* https://tracker.ceph.com/issues/50866
osd: stat mismatch on objects
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete

h3. 2021 June 26

https://pulpito.ceph.com/pdonnell-2021-06-26_00:57:00-fs-wip-pdonnell-testing-20210625.225421-distro-basic-smithi/

* https://tracker.ceph.com/issues/51183
qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
* https://tracker.ceph.com/issues/51410
kclient: fails to finish reconnect during MDS thrashing (testing branch)
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/51169
qa: ubuntu 20.04 sys protections prevent multiuser file access in /tmp
* https://tracker.ceph.com/issues/48772
qa: pjd: not ok 9, 44, 80

h3. 2021 June 21

https://pulpito.ceph.com/pdonnell-2021-06-22_00:27:21-fs-wip-pdonnell-testing-20210621.231646-distro-basic-smithi/

One failure caused by PR: https://github.com/ceph/ceph/pull/41935#issuecomment-866472599

* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings
* https://tracker.ceph.com/issues/51183
qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/48771
qa: iogen: workload fails to cause balancing
* https://tracker.ceph.com/issues/51169
qa: ubuntu 20.04 sys protections prevent multiuser file access in /tmp
* https://tracker.ceph.com/issues/50495
libcephfs: shutdown race fails with status 141
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/50824
qa: snaptest-git-ceph bus error
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"

h3. 2021 June 16

https://pulpito.ceph.com/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-20210616.191804-distro-basic-smithi/

MDS abort class of failures caused by PR: https://github.com/ceph/ceph/pull/41667

* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/51169
qa: ubuntu 20.04 sys protections prevent multiuser file access in /tmp
* https://tracker.ceph.com/issues/43216
MDSMonitor: removes MDS coming out of quorum election
* https://tracker.ceph.com/issues/51278
mds: "FAILED ceph_assert(!segments.empty())"
* https://tracker.ceph.com/issues/51279
kclient hangs on umount (testing branch)
* https://tracker.ceph.com/issues/51280
mds: "FAILED ceph_assert(r == 0 || r == -2)"
* https://tracker.ceph.com/issues/51183
qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
* https://tracker.ceph.com/issues/51281
qa: snaptest-snap-rm-cmp.sh: "echo 'FAIL: bad match, /tmp/a 4637e766853d1ad16a7b17079e2c6f03 != real c3883760b18d50e8d78819c54d579b00'"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/51076
"wait_for_recovery: failed before timeout expired" during thrashosd test with EC backend.
* https://tracker.ceph.com/issues/51228
qa: rmdir: failed to remove 'a/.snap/*': No such file or directory
* https://tracker.ceph.com/issues/51282
pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings

h3. 2021 June 14

https://pulpito.ceph.com/pdonnell-2021-06-14_20:53:05-fs-wip-pdonnell-testing-20210614.173325-distro-basic-smithi/

Some Ubuntu 20.04 upgrade fallout. In particular, upgrade tests are failing due to missing packages for 18.04 Pacific.

* https://tracker.ceph.com/issues/51169
qa: ubuntu 20.04 sys protections prevent multiuser file access in /tmp
* https://tracker.ceph.com/issues/51228
qa: rmdir: failed to remove 'a/.snap/*': No such file or directory
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/51183
qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/51182
pybind/mgr/snap_schedule: Invalid command: Unexpected argument 'fs=cephfs'
* https://tracker.ceph.com/issues/51229
qa: test_multi_snap_schedule list difference failure
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing

h3. 2021 June 13

https://pulpito.ceph.com/pdonnell-2021-06-12_02:45:35-fs-wip-pdonnell-testing-20210612.002809-distro-basic-smithi/

Some Ubuntu 20.04 upgrade fallout. In particular, upgrade tests are failing due to missing packages for 18.04 Pacific.

* https://tracker.ceph.com/issues/51169
qa: ubuntu 20.04 sys protections prevent multiuser file access in /tmp
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/51182
pybind/mgr/snap_schedule: Invalid command: Unexpected argument 'fs=cephfs'
* https://tracker.ceph.com/issues/51183
qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
* https://tracker.ceph.com/issues/51197
qa: [WRN] Scrub error on inode 0x10000001520 (/client.0/tmp/t/linux-5.4/Documentation/driver-api) see mds.f log and `damage ls` output for details
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed

h3. 2021 June 11

https://pulpito.ceph.com/pdonnell-2021-06-11_18:02:10-fs-wip-pdonnell-testing-20210611.162716-distro-basic-smithi/

Some Ubuntu 20.04 upgrade fallout. In particular, upgrade tests are failing due to missing packages for 18.04 Pacific.

* https://tracker.ceph.com/issues/51169
qa: ubuntu 20.04 sys protections prevent multiuser file access in /tmp
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/48771
qa: iogen: workload fails to cause balancing
* https://tracker.ceph.com/issues/43216
MDSMonitor: removes MDS coming out of quorum election
* https://tracker.ceph.com/issues/51182
pybind/mgr/snap_schedule: Invalid command: Unexpected argument 'fs=cephfs'
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/51183
qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
* https://tracker.ceph.com/issues/51184
qa: fs:bugs does not specify distro

h3. 2021 June 03

https://pulpito.ceph.com/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/

* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/50016
qa: test_damage: "RuntimeError: 2 mutations had unexpected outcomes"
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/50622 (regression)
msg: active_connections regression
* https://tracker.ceph.com/issues/49845#note-2 (regression)
qa: failed umount in test_volumes
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/43216
MDSMonitor: removes MDS coming out of quorum election

h3. 2021 May 18

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210518.214114

Regression in testing kernel caused some failures. Ilya fixed those and rerun
looked better. Some odd new noise in the rerun relating to packaging and "No
module named 'tasks.ceph'".

* https://tracker.ceph.com/issues/50824
qa: snaptest-git-ceph bus error
* https://tracker.ceph.com/issues/50622 (regression)
msg: active_connections regression
* https://tracker.ceph.com/issues/49845#note-2 (regression)
qa: failed umount in test_volumes
* https://tracker.ceph.com/issues/48203 (stock kernel update required)
qa: quota failure

h3. 2021 May 18

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210518.025642

* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/45591
mgr: FAILED ceph_assert(daemon != nullptr)
* https://tracker.ceph.com/issues/50866
osd: stat mismatch on objects
* https://tracker.ceph.com/issues/50016
qa: test_damage: "RuntimeError: 2 mutations had unexpected outcomes"
* https://tracker.ceph.com/issues/50867
qa: fs:mirror: reduced data availability
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/50622 (regression)
msg: active_connections regression
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/50868
qa: "kern.log.gz already exists; not overwritten"
* https://tracker.ceph.com/issues/50870
qa: test_full: "rm: cannot remove 'large_file_a': Permission denied"

h3. 2021 May 11

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210511.232042

* one class of failures caused by PR
* https://tracker.ceph.com/issues/48812
qa: test_scrub_pause_and_resume_with_abort failure
* https://tracker.ceph.com/issues/50390
mds: monclient: wait_auth_rotating timed out after 30
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/50224
qa: test_mirroring_init_failure_with_recovery failure
* https://tracker.ceph.com/issues/50622 (regression)
msg: active_connections regression
* https://tracker.ceph.com/issues/50825
qa: snaptest-git-ceph hang during mon thrashing v2
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/50823
qa: RuntimeError: timeout waiting for cluster to stabilize

h3. 2021 May 14

https://pulpito.ceph.com/pdonnell-2021-05-14_21:45:42-fs-master-distro-basic-smithi/

* https://tracker.ceph.com/issues/48812
qa: test_scrub_pause_and_resume_with_abort failure
* https://tracker.ceph.com/issues/50821
qa: untar_snap_rm failure during mds thrashing
* https://tracker.ceph.com/issues/50622 (regression)
msg: active_connections regression
* https://tracker.ceph.com/issues/50822
qa: testing kernel patch for client metrics causes mds abort
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/50823
qa: RuntimeError: timeout waiting for cluster to stabilize
* https://tracker.ceph.com/issues/50824
qa: snaptest-git-ceph bus error
* https://tracker.ceph.com/issues/50825
qa: snaptest-git-ceph hang during mon thrashing v2
* https://tracker.ceph.com/issues/50826
kceph: stock RHEL kernel hangs on snaptests with mon|osd thrashers

h3. 2021 May 01

https://pulpito.ceph.com/pdonnell-2021-05-01_09:07:09-fs-wip-pdonnell-testing-20210501.040415-distro-basic-smithi/

* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/50281
qa: untar_snap_rm timeout
* https://tracker.ceph.com/issues/48203 (stock kernel update required)
qa: quota failure
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/50390
mds: monclient: wait_auth_rotating timed out after 30
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details"
* https://tracker.ceph.com/issues/50622 (regression)
msg: active_connections regression
* https://tracker.ceph.com/issues/45591
mgr: FAILED ceph_assert(daemon != nullptr)
* https://tracker.ceph.com/issues/50221
qa: snaptest-git-ceph failure in git diff
* https://tracker.ceph.com/issues/50016
qa: test_damage: "RuntimeError: 2 mutations had unexpected outcomes"

h3. 2021 Apr 15

https://pulpito.ceph.com/pdonnell-2021-04-15_01:35:57-fs-wip-pdonnell-testing-20210414.230315-distro-basic-smithi/

* https://tracker.ceph.com/issues/50281
qa: untar_snap_rm timeout
* https://tracker.ceph.com/issues/50220
qa: dbench workload timeout
* https://tracker.ceph.com/issues/50246
mds: failure replaying journal (EMetaBlob)
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details"
* https://tracker.ceph.com/issues/50016
qa: test_damage: "RuntimeError: 2 mutations had unexpected outcomes"
* https://tracker.ceph.com/issues/50222
osd: 5.2s0 deep-scrub : stat mismatch
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/49845
qa: failed umount in test_volumes
* https://tracker.ceph.com/issues/37808
osd: osdmap cache weak_refs assert during shutdown
* https://tracker.ceph.com/issues/50387
client: fs/snaps failure
* https://tracker.ceph.com/issues/50389
mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log"
* https://tracker.ceph.com/issues/50216
qa: "ls: cannot access 'lost+found': No such file or directory"
* https://tracker.ceph.com/issues/50390
mds: monclient: wait_auth_rotating timed out after 30

h3. 2021 Apr 08

https://pulpito.ceph.com/pdonnell-2021-04-08_22:42:24-fs-wip-pdonnell-testing-20210408.192301-distro-basic-smithi/

* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/50016
qa: test_damage: "RuntimeError: 2 mutations had unexpected outcomes"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/50279
qa: "Replacing daemon mds.b as rank 0 with standby daemon mds.c"
* https://tracker.ceph.com/issues/50246
mds: failure replaying journal (EMetaBlob)
* https://tracker.ceph.com/issues/48365
qa: ffsb build failure on CentOS 8.2
* https://tracker.ceph.com/issues/50216
qa: "ls: cannot access 'lost+found': No such file or directory"
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/50280
cephadm: RuntimeError: uid/gid not found
* https://tracker.ceph.com/issues/50281
qa: untar_snap_rm timeout

h3. 2021 Apr 08

https://pulpito.ceph.com/pdonnell-2021-04-08_04:31:36-fs-wip-pdonnell-testing-20210408.024225-distro-basic-smithi/
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20210408.142238 (with logic inversion / QA fix)

* https://tracker.ceph.com/issues/50246
mds: failure replaying journal (EMetaBlob)
* https://tracker.ceph.com/issues/50250
mds: "log [WRN] : Scrub error on inode 0x10000004506 (/client.0/tmp/clients/client3/~dmtmp/COREL) see mds.a log and `damage ls` output for details"

h3. 2021 Apr 07

https://pulpito.ceph.com/pdonnell-2021-04-07_02:12:41-fs-wip-pdonnell-testing-20210406.213012-distro-basic-smithi/

* https://tracker.ceph.com/issues/50215
qa: "log [ERR] : error reading sessionmap 'mds2_sessionmap'"
* https://tracker.ceph.com/issues/49466
qa: "Command failed on gibba030 with status 1: 'set -ex\nsudo dd of=/tmp/tmp.ZEeZBasJer'"
* https://tracker.ceph.com/issues/50216
qa: "ls: cannot access 'lost+found': No such file or directory"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/49845
qa: failed umount in test_volumes
* https://tracker.ceph.com/issues/50220
qa: dbench workload timeout
* https://tracker.ceph.com/issues/50221
qa: snaptest-git-ceph failure in git diff
* https://tracker.ceph.com/issues/50222
osd: 5.2s0 deep-scrub : stat mismatch
* https://tracker.ceph.com/issues/50223
qa: "client.4737 isn't responding to mclientcaps(revoke)"
* https://tracker.ceph.com/issues/50224
qa: test_mirroring_init_failure_with_recovery failure

h3. 2021 Apr 01

https://pulpito.ceph.com/pdonnell-2021-04-01_00:45:34-fs-wip-pdonnell-testing-20210331.222326-distro-basic-smithi/

* https://tracker.ceph.com/issues/48772
qa: pjd: not ok 9, 44, 80
* https://tracker.ceph.com/issues/50177
osd: "stalled aio... buggy kernel or bad device?"
* https://tracker.ceph.com/issues/48771
qa: iogen: workload fails to cause balancing
* https://tracker.ceph.com/issues/49845
qa: failed umount in test_volumes
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/48805
mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
* https://tracker.ceph.com/issues/50178
qa: "TypeError: run() got an unexpected keyword argument 'shell'"
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed

h3. 2021 Mar 24

https://pulpito.ceph.com/pdonnell-2021-03-24_23:26:35-fs-wip-pdonnell-testing-20210324.190252-distro-basic-smithi/

* https://tracker.ceph.com/issues/49500
qa: "Assertion `cb_done' failed."
* https://tracker.ceph.com/issues/50019
qa: mount failure with cephadm "probably no MDS server is up?"
* https://tracker.ceph.com/issues/50020
qa: "RADOS object not found (Failed to operate read op for oid cephfs_mirror)"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/48805
mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
* https://tracker.ceph.com/issues/48772
qa: pjd: not ok 9, 44, 80
* https://tracker.ceph.com/issues/50021
qa: snaptest-git-ceph failure during mon thrashing
* https://tracker.ceph.com/issues/48771
qa: iogen: workload fails to cause balancing
* https://tracker.ceph.com/issues/50016
qa: test_damage: "RuntimeError: 2 mutations had unexpected outcomes"
* https://tracker.ceph.com/issues/49466
qa: "Command failed on gibba030 with status 1: 'set -ex\nsudo dd of=/tmp/tmp.ZEeZBasJer'"

h3. 2021 Mar 18

https://pulpito.ceph.com/pdonnell-2021-03-18_13:46:31-fs-wip-pdonnell-testing-20210318.024145-distro-basic-smithi/

* https://tracker.ceph.com/issues/49466
qa: "Command failed on gibba030 with status 1: 'set -ex\nsudo dd of=/tmp/tmp.ZEeZBasJer'"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/48805
mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/49845
qa: failed umount in test_volumes
* https://tracker.ceph.com/issues/49605
mgr: drops command on the floor
* https://tracker.ceph.com/issues/48203 (stock kernel update required)
qa: quota failure
* https://tracker.ceph.com/issues/49928
client: items pinned in cache preventing unmount x2

h3. 2021 Mar 15

https://pulpito.ceph.com/pdonnell-2021-03-15_22:16:56-fs-wip-pdonnell-testing-20210315.182203-distro-basic-smithi/

* https://tracker.ceph.com/issues/49842
qa: stuck pkg install
* https://tracker.ceph.com/issues/49466
qa: "Command failed on gibba030 with status 1: 'set -ex\nsudo dd of=/tmp/tmp.ZEeZBasJer'"
* https://tracker.ceph.com/issues/49822
test: test_mirroring_command_idempotency (tasks.cephfs.test_admin.TestMirroringCommands) failure
* https://tracker.ceph.com/issues/49240
terminate called after throwing an instance of 'std::bad_alloc'
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/49500
qa: "Assertion `cb_done' failed."
* https://tracker.ceph.com/issues/49843
qa: fs/snaps/snaptest-upchildrealms.sh failure
* https://tracker.ceph.com/issues/49845
qa: failed umount in test_volumes
* https://tracker.ceph.com/issues/48805
mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
* https://tracker.ceph.com/issues/49605
mgr: drops command on the floor

and failure caused by PR: https://github.com/ceph/ceph/pull/39969

h3. 2021 Mar 09

https://pulpito.ceph.com/pdonnell-2021-03-09_03:27:39-fs-wip-pdonnell-testing-20210308.214827-distro-basic-smithi/

* https://tracker.ceph.com/issues/49500
qa: "Assertion `cb_done' failed."
* https://tracker.ceph.com/issues/48805
mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
* https://tracker.ceph.com/issues/48773
qa: scrub does not complete
* https://tracker.ceph.com/issues/45434
qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
* https://tracker.ceph.com/issues/49240
terminate called after throwing an instance of 'std::bad_alloc'
* https://tracker.ceph.com/issues/49466
qa: "Command failed on gibba030 with status 1: 'set -ex\nsudo dd of=/tmp/tmp.ZEeZBasJer'"
* https://tracker.ceph.com/issues/49684
qa: fs:cephadm mount does not wait for mds to be created
* https://tracker.ceph.com/issues/48771
qa: iogen: workload fails to cause balancing