Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2024-03-29T10:45:28ZCeph
Redmine Ceph - Bug #65226 (Fix Under Review): qa: add a test - peer status show "failed" status for makin...https://tracker.ceph.com/issues/652262024-03-29T10:45:28ZJos CollinCephFS - Backport #65223 (New): squid: cephfs-mirror: use snapdiff api for efficient tree traversalhttps://tracker.ceph.com/issues/652232024-03-29T08:23:14ZBackport BotCephFS - Backport #65222 (New): reef: cephfs-mirror: use snapdiff api for efficient tree traversalhttps://tracker.ceph.com/issues/652222024-03-29T08:23:06ZBackport BotCephFS - Bug #65115 (Fix Under Review): cephfs_mirror: failed test test_cephfs_mirror_cancel_mirr...https://tracker.ceph.com/issues/651152024-03-25T14:19:52ZJos Collin
<p>The ceph_client.mirror logs showing "Bad file descriptor", which was failed due to [1].<br />This is different from the issue mentioned in [1], as it's coming in both passed [2] and failed [2] tests.</p>
<p>[1] <a class="external" href="https://tracker.ceph.com/issues/64711">https://tracker.ceph.com/issues/64711</a><br />[2] <a class="external" href="https://pulpito.ceph.com/jcollin-2024-03-22_09:24:18-fs:mirror-wip-jcollin-testing-22032024-distro-default-smithi/">https://pulpito.ceph.com/jcollin-2024-03-22_09:24:18-fs:mirror-wip-jcollin-testing-22032024-distro-default-smithi/</a><br />[3] <a class="external" href="https://pulpito.ceph.com/jcollin-2024-03-25_04:53:04-fs:mirror-wip-jcollin-testing-22032024-distro-default-smithi/">https://pulpito.ceph.com/jcollin-2024-03-25_04:53:04-fs:mirror-wip-jcollin-testing-22032024-distro-default-smithi/</a></p> CephFS - Bug #65073 (Fix Under Review): pybind/mgr/stats/fs: log exceptions to cluster loghttps://tracker.ceph.com/issues/650732024-03-22T13:12:18ZPatrick Donnellypdonnell@redhat.com
<p>There are exceptions raised in the module which are not failing tests:</p>
<pre>
2024-03-20T21:38:38.702 INFO:tasks.ceph.mgr.x.smithi007.stderr:Exception in thread Thread-3:
2024-03-20T21:38:38.702 INFO:tasks.ceph.mgr.x.smithi007.stderr:Traceback (most recent call last):
2024-03-20T21:38:38.702 INFO:tasks.ceph.mgr.x.smithi007.stderr: File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
2024-03-20T21:38:38.704 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_data pg_num
2024-03-20T21:38:38.712 INFO:tasks.ceph.mgr.x.smithi007.stderr: self.run()
2024-03-20T21:38:38.712 INFO:tasks.ceph.mgr.x.smithi007.stderr: File "/usr/lib64/python3.9/threading.py", line 1306, in run
2024-03-20T21:38:38.712 INFO:tasks.ceph.mgr.x.smithi007.stderr: self.function(*self.args, **self.kwargs)
2024-03-20T21:38:38.712 INFO:tasks.ceph.mgr.x.smithi007.stderr: File "/usr/share/ceph/mgr/stats/fs/perf_stats.py", line 222, in re_register_queries
2024-03-20T21:38:38.713 INFO:tasks.ceph.mgr.x.smithi007.stderr: if self.mx_last_updated >= ua_last_updated:
2024-03-20T21:38:38.713 INFO:tasks.ceph.mgr.x.smithi007.stderr:AttributeError: 'FSPerfStats' object has no attribute 'mx_last_updated'
</pre>
<p>From: /teuthology/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7613026/teuthology.log</p>
<p>Exceptions should not be in the mgr log at all. It pollutes the log making grep for actual errors difficult.</p>
<p>If this is a genuine error, log it to the clog to fail the test. Otherwise, handle it quietly.</p> CephFS - Bug #64752 (New): cephfs-mirror: valgrind report leakshttps://tracker.ceph.com/issues/647522024-03-06T16:06:45ZVenky Shankarvshankar@redhat.com
<p>/a/yuriw-2024-03-01_20:51:20-fs-squid-distro-default-smithi/7578146</p>
<pre>
Description: fs/valgrind/{begin/{0-install 1-ceph 2-logrotate} centos_latest debug mirror/{cephfs-mirror/one-per-cluster clients/mirror cluster/1-node mount/fuse overrides/ignorelist_health tasks/mirror}}
</pre> CephFS - Bug #64751 (Fix Under Review): cephfs-mirror coredumped when acquiring pthread mutexhttps://tracker.ceph.com/issues/647512024-03-06T15:33:00ZVenky Shankarvshankar@redhat.com
<p>/a/yuriw-2024-03-01_20:51:20-fs-squid-distro-default-smithi/7578112</p>
<p>Log: ./remote/smithi134/log/ceph-client.mirror.43943.log.gz</p>
<pre>
-9> 2024-03-02T01:14:01.311+0000 7fdcf1493640 10 cephfs::mirror::Utils connect: connected to cluster=ceph using client=client.mirror
-8> 2024-03-02T01:14:01.313+0000 7fdcf1493640 20 cephfs::mirror::Utils mount: filesystem={fscid=58, fs_name=cephfs}
-7> 2024-03-02T01:14:01.320+0000 7fdcf1493640 10 cephfs::mirror::Utils mount: mounted filesystem={fscid=58, fs_name=cephfs}
-6> 2024-03-02T01:14:01.320+0000 7fdcf1493640 10 cephfs::mirror::FSMirror init: rados addrs=172.21.15.134:0/2248701012
-5> 2024-03-02T01:14:01.320+0000 7fdcf1493640 20 cephfs::mirror::FSMirror init_instance_watcher
-4> 2024-03-02T01:14:01.320+0000 7fdcf1493640 20 cephfs::mirror::InstanceWatcher init
-3> 2024-03-02T01:14:01.320+0000 7fdcf1493640 20 cephfs::mirror::InstanceWatcher create_instance
-2> 2024-03-02T01:14:01.320+0000 7fdcf1493640 20 cephfs::mirror::Mirror handle_enable_mirroring: filesystem={fscid=56, fs_name=cephfs}, peers=, r=-2
-1> 2024-03-02T01:14:01.320+0000 7fdcf3c98640 -1 asok(0x5574773da000) AdminSocket: error writing response length (32) Broken pipe
0> 2024-03-02T01:14:01.320+0000 7fdcf0c92640 -1 *** Caught signal (Segmentation fault) **
in thread 7fdcf0c92640 thread_name:safe_timer
ceph version 19.0.0-1578-g4c76c50a (4c76c50a73f63ba48ccdf0adccce03b00d1d80c7) squid (dev)
1: /lib64/libc.so.6(+0x54db0) [0x7fdcf5c54db0]
2: pthread_mutex_lock()
3: (cephfs::mirror::Mirror::update_fs_mirrors()+0x9ae) [0x5574760d877e]
4: cephfs-mirror(+0x29f8d) [0x5574760c7f8d]
5: (CommonSafeTimer<std::mutex>::timer_thread()+0x11e) [0x7fdcf666627e]
6: /usr/lib64/ceph/libceph-common.so.2(+0x26a5b1) [0x7fdcf666a5b1]
7: /lib64/libc.so.6(+0x9f802) [0x7fdcf5c9f802]
8: /lib64/libc.so.6(+0x3f450) [0x7fdcf5c3f450]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre> CephFS - Bug #64711 (Fix Under Review): Test failure: test_cephfs_mirror_cancel_mirroring_and_rea...https://tracker.ceph.com/issues/647112024-03-05T09:35:01ZVenky Shankarvshankar@redhat.com
<p>/a/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580933</p>
<p>Probably a racy check</p>
<pre>
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:======================================================================
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:Test adding a directory path for synchronization post removal of already added directory paths
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_717ce3cce8d0166e4e577faa8004e6f9cb4128c0/qa/tasks/cephfs/test_mirroring.py", line 1276, in test_cephfs_mirror_cancel_mirroring_and_readd
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: self.check_peer_snap_in_progress(self.primary_fs_name, self.primary_fs_id,
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_717ce3cce8d0166e4e577faa8004e6f9cb4128c0/qa/tasks/cephfs/test_mirroring.py", line 231, in check_peer_snap_in_progress
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: self.assertTrue('syncing' == res[dir_name]['state'])
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner:AssertionError: False is not true
</pre> CephFS - Bug #64534 (New): qa: test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirro...https://tracker.ceph.com/issues/645342024-02-22T08:29:39ZJos Collin
<p>test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite <br /><a class="external" href="https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/">https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/</a></p>
<p>test_cephfs_mirror_cancel_sync succeed when running alone 100 times.<br /><a class="external" href="https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/">https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/</a></p>
<p>This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.</p> CephFS - Bug #64486 (Pending Backport): qa: enhance labeled perf counters test for cephfs-mirrorhttps://tracker.ceph.com/issues/644862024-02-19T09:33:20ZVenky Shankarvshankar@redhat.com
<p>In particular, verify peer metric counters.</p> CephFS - Documentation #64483 (In Progress): doc: document labelled perf metrics for mds/cephfs-m...https://tracker.ceph.com/issues/644832024-02-19T07:56:14ZVenky Shankarvshankar@redhat.comCephFS - Bug #63089 (New): qa: tasks/mirror times outhttps://tracker.ceph.com/issues/630892023-10-04T07:06:31ZVenky Shankarvshankar@redhat.com
<p>/a/vshankar-2023-09-28_07:23:59-fs-wip-vshankar-testing-20230926.081818-testing-default-smithi/7405363</p>
<pre>
2023-09-28T11:15:33.524 DEBUG:teuthology.orchestra.run.smithi105:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs mirror enable cephfs
2023-09-28T11:15:33.549 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.549+0000 7f1d69c56040 -1 mgr[py] Module zabbix has missing NOTIFY_TYPES member
2023-09-28T11:15:33.604 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.605+0000 7f1d69c56040 -1 mgr[py] Module balancer has missing NOTIFY_TYPES member
2023-09-28T11:15:33.657 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.657+0000 7f1d69c56040 -1 mgr[py] Module influx has missing NOTIFY_TYPES member
2023-09-28T11:15:33.721 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.721+0000 7f1d69c56040 -1 mgr[py] Module alerts has missing NOTIFY_TYPES member
2023-09-28T11:15:33.794 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.794+0000 7f1d69c56040 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member
2023-09-28T11:15:33.935 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.935+0000 7f1d69c56040 -1 mgr[py] Module rgw has missing NOTIFY_TYPES member
2023-09-28T11:15:34.002 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.002+0000 7f1d69c56040 -1 mgr[py] Module rbd_support has missing NOTIFY_TYPES member
2023-09-28T11:15:34.056 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.056+0000 7f1d69c56040 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
2023-09-28T11:15:34.118 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.118+0000 7f1d69c56040 -1 mgr[py] Module pg_autoscaler has missing NOTIFY_TYPES member
2023-09-28T11:15:34.172 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.172+0000 7f1d69c56040 -1 mgr[py] Module devicehealth has missing NOTIFY_TYPES member
2023-09-28T11:15:34.534 INFO:teuthology.orchestra.run:Running command with timeout 30
2023-09-28T11:15:34.534 DEBUG:teuthology.orchestra.run.smithi105:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@56
2023-09-28T11:15:34.572 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.572+0000 7f1d69c56040 -1 mgr[py] Module rook has missing NOTIFY_TYPES member
2023-09-28T11:15:34.726 INFO:teuthology.orchestra.run.smithi105.stderr:no valid command found; 1 closest matches:
2023-09-28T11:15:34.726 INFO:teuthology.orchestra.run.smithi105.stderr:fs mirror status cephfs@54
2023-09-28T11:15:34.726 INFO:teuthology.orchestra.run.smithi105.stderr:admin_socket: invalid command
2023-09-28T11:15:34.729 DEBUG:teuthology.orchestra.run:got remote process result: 22
2023-09-28T11:15:34.730 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" failed: Command failed (mirror status for fs: cephfs) on smithi105 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@56'
</pre> CephFS - Bug #62925 (Fix Under Review): cephfs-journal-tool: Add preventive measures in the tool ...https://tracker.ceph.com/issues/629252023-09-21T17:37:14ZPrashant D
<p>The cephfs-journal-tool should be used by expert who has the knowledge of CephFS internals. Though we have a clear warning message on <a class="external" href="https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects">https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects</a> doc to not to use cephfs-journal-tool to reset journal without cephfs team's advice, still some users venture out to try this tools without much thought which can result in MDS crash as observed in <a class="external" href="https://tracker.ceph.com/issues/58878">https://tracker.ceph.com/issues/58878</a>.</p>
<pre>
sh-4.4$ cephfs-journal-tool --rank ocs-storagecluster-cephfilesystem:0 event recover_dentries summary
Events by type:
RESETJOURNAL: 1
Errors: 0
sh-4.4$ cephfs-journal-tool --rank ocs-storagecluster-cephfilesystem:0 journal reset
old journal was 8388608~48
new journal start will be 12582912 (4194256 bytes past old end)
writing journal head
writing EResetJournal entry
done
</pre>
<p>We should have a warning message with a prompt to continue or not when we run this tool to reset the journal. Also cephfs-journal-tool should not be run when cephfs is online or we should have a clear warning message when user attempts to run against live cephfs, mostly when "event recover_dentries summary" command to write any inodes/dentries recoverable from the journal to the RADOS store.</p> CephFS - Bug #62221 (In Progress): Test failure: test_add_ancestor_and_child_directory (tasks.cep...https://tracker.ceph.com/issues/622212023-07-28T11:19:29ZVenky Shankarvshankar@redhat.com
<p>/a/yuriw-2023-07-26_14:34:38-fs-reef-release-distro-default-smithi/7353194</p>
<pre>
2023-07-26T23:01:19.423 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_407880c6d3fb77318fff01c863715090f9c2de69/teuthology/orchestra/run.py", line 161, in wait
2023-07-26T23:01:19.424 INFO:tasks.cephfs_test_runner: self._raise_for_status()
2023-07-26T23:01:19.424 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_407880c6d3fb77318fff01c863715090f9c2de69/teuthology/orchestra/run.py", line 181, in _raise_for_status
2023-07-26T23:01:19.424 INFO:tasks.cephfs_test_runner: raise CommandFailedError(
2023-07-26T23:01:19.424 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed (mirror status for fs: cephfs) on smithi053 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@2'
</pre>
<p>The test run the mirror daemon with valgrind. Seems like the daemon didn't start at all. Valgrind report is at: ./remote/smithi053/log/valgrind/cephfs-mirror-client.mirror.log.gz</p> CephFS - Feature #61334 (Pending Backport): cephfs-mirror: use snapdiff api for efficient tree tr...https://tracker.ceph.com/issues/613342023-05-22T10:40:44ZVenky Shankarvshankar@redhat.com
<p>With <a class="external" href="https://github.com/ceph/ceph/pull/43546">https://github.com/ceph/ceph/pull/43546</a> merged, cephfs-mirror can make use the snapdiff api (via readdir_snapdiff) to efficiently traverse the directory tree between two snapshots.</p>
<p>This should hugely improve performance when only a handful of files have changed between two consecutive snapshots.</p>