Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2024-03-27T22:36:30Z
Ceph
Redmine
mgr - Bug #65189 (New): Telemetry pacific-x upgrade test pauses when upgrading to squid
https://tracker.ceph.com/issues/65189
2024-03-27T22:36:30Z
Laura Flores
<p>/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7615987<br /><pre><code class="text syntaxhl"><span class="CodeRay">2024-03-22T06:49:01.882 INFO:journalctl@ceph.mgr.y.smithi023.stdout:Mar 22 06:49:01 smithi023 ceph-e86c638a-e816-11ee-95cd-87774f69a715-mgr-y[39704]: debug 2024-03-22T06:49:01.516+0000 7fb5985ac700 -1 log_channel(cephadm) log [ERR] : Upgrade: Paused due to UPGRADE_BAD_TARGET_VERSION: Upgrade: cannot upgrade/downgrade to 19.0.0-1667-gdb0330b1
</span></code></pre></p>
<p>The solution would be to update the test so it upgrades from reef instead of pacific. We already have a quincy-x upgrade test that works fine since it follows the "n-2" upgrade rule.</p>
Orchestrator - Bug #65187 (New): upgrade/quincy-x/stress-split: upgrade test fails to install qui...
https://tracker.ceph.com/issues/65187
2024-03-27T22:18:47Z
Laura Flores
<pre><code class="text syntaxhl"><span class="CodeRay">2024-03-22T06:52:10.566 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&ref=quincy
2024-03-22T06:52:10.571 INFO:teuthology.orchestra.run.smithi031.stdout:uid [ unknown] Ceph automated package build (Ceph automated package build) <sage@newdream.net>
2024-03-22T06:52:10.571 INFO:teuthology.orchestra.run.smithi031.stdout:uid [ unknown] Ceph.com (release key) <security@ceph.com>
2024-03-22T06:52:10.572 INFO:teuthology.task.install.deb:Installing packages: ceph, cephadm, ceph-mds, ceph-mgr, ceph-common, ceph-fuse, ceph-test, radosgw, python3-rados, python3-rgw, python3-cephfs, python3-rbd, libcephfs2, libcephfs-dev, librados2, librbd1, rbd-fuse on remote deb x86_64
2024-03-22T06:52:10.572 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using branch
2024-03-22T06:52:10.572 INFO:teuthology.packaging:ref: None
2024-03-22T06:52:10.572 INFO:teuthology.packaging:tag: None
2024-03-22T06:52:10.572 INFO:teuthology.packaging:branch: quincy
2024-03-22T06:52:10.572 INFO:teuthology.packaging:sha1: db0330b1e4e2470d52b750e251e55a522b4f7d69
2024-03-22T06:52:10.572 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&ref=quincy
2024-03-22T06:52:10.709 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/contextutil.py", line 30, in nested
vars.append(enter())
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/task/install/__init__.py", line 218, in install
install_packages(ctx, package_list, config)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/task/install/__init__.py", line 81, in install_packages
p.spawn(
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/parallel.py", line 84, in __exit__
for result in self:
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/parallel.py", line 98, in __next__
resurrect_traceback(result)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/parallel.py", line 30, in resurrect_traceback
raise exc.exc_info[1]
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/parallel.py", line 23, in capture_traceback
return func(*args, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/task/install/deb.py", line 79, in _update_package_list_and_install
log.info('Pulling from %s', builder.base_url)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/packaging.py", line 554, in base_url
return self._get_base_url()
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/packaging.py", line 856, in _get_base_url
self.assert_result()
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/packaging.py", line 937, in assert_result
raise VersionNotFoundError(self._result.url)
teuthology.exceptions.VersionNotFoundError: Failed to fetch package version from https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&ref=quincy
2024-03-22T06:52:10.711 ERROR:teuthology.run_tasks:Saw exception from tasks.
</span></code></pre>
RADOS - Bug #65186 (New): OSDs unreachable in upgrade test
https://tracker.ceph.com/issues/65186
2024-03-27T20:28:19Z
Laura Flores
<p>/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616011/remote/smithi087/log/a8e8c570-e819-11ee-95cd-87774f69a715<br /><pre><code class="text syntaxhl"><span class="CodeRay">2024-03-22T07:19:18.215315+0000 mon.a (mon.0) 10 : cluster 0 Standby manager daemon x restarted
2024-03-22T07:19:18.215450+0000 mon.a (mon.0) 11 : cluster 0 Standby manager daemon x started
2024-03-22T07:19:18.215315+0000 mon.a (mon.0) 10 : cluster 0 Standby manager daemon x restarted
2024-03-22T07:19:18.215450+0000 mon.a (mon.0) 11 : cluster 0 Standby manager daemon x started
2024-03-22T07:19:18.277027+0000 mon.a (mon.0) 12 : cluster 0 mgrmap e33: y(active, since 63s), standbys: x
2024-03-22T07:19:18.414028+0000 mon.a (mon.0) 13 : cluster 1 Active manager daemon y restarted
2024-03-22T07:19:18.414630+0000 mon.a (mon.0) 14 : cluster 4 Health check failed: 8 osds(s) are not reachable (OSD_UNREACHABLE)
2024-03-22T07:19:18.414953+0000 mon.a (mon.0) 15 : cluster 1 Activating manager daemon y
2024-03-22T07:19:18.427127+0000 mon.a (mon.0) 16 : cluster 0 osdmap e81: 8 total, 8 up, 8 in
2024-03-22T07:19:18.277027+0000 mon.a (mon.0) 12 : cluster 0 mgrmap e33: y(active, since 63s), standbys: x
2024-03-22T07:19:18.427673+0000 mon.a (mon.0) 17 : cluster 0 mgrmap e34: y(active, starting, since 0.0129348s), standbys: x
2024-03-22T07:19:18.414028+0000 mon.a (mon.0) 13 : cluster 1 Active manager daemon y restarted
2024-03-22T07:19:18.433869+0000 osd.4 (osd.4) 3 : cluster 3 failed to encode map e81 with expected crc
2024-03-22T07:19:18.435418+0000 osd.2 (osd.2) 3 : cluster 3 failed to encode map e81 with expected crc
2024-03-22T07:19:18.414630+0000 mon.a (mon.0) 14 : cluster 4 Health check failed: 8 osds(s) are not reachable (OSD_UNREACHABLE)
2024-03-22T07:19:18.443967+0000 osd.4 (osd.4) 4 : cluster 3 failed to encode map e81 with expected crc
</span></code></pre></p>
<p>Likely connected to <a class="external" href="https://tracker.ceph.com/issues/63389">https://tracker.ceph.com/issues/63389</a>.</p>
RADOS - Bug #65185 (New): OSD_SCRUB_ERROR, inconsistent pg in upgrade tests
https://tracker.ceph.com/issues/65185
2024-03-27T20:21:29Z
Laura Flores
<p>/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098/log/b1f19696-e81a-11ee-95cd-87774f69a715/ceph.log.gz<br /><pre><code class="text syntaxhl"><span class="CodeRay">2024-03-22T09:20:00.000187+0000 mon.a (mon.0) 7863 : cluster 4 [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
2024-03-22T09:20:00.000194+0000 mon.a (mon.0) 7864 : cluster 4 [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
2024-03-22T09:19:59.897409+0000 mon.a (mon.0) 7860 : cluster 0 osdmap e3595: 8 total, 8 up, 8 in
2024-03-22T09:20:00.000202+0000 mon.a (mon.0) 7865 : cluster 4 pg 103.14 is active+clean+inconsistent, acting [5,1,2]
2024-03-22T09:20:00.000151+0000 mon.a (mon.0) 7861 : cluster 4 Health detail: HEALTH_ERR noscrub flag(s) set; 1 scrub errors; Possible data damage: 1 pg inconsistent
</span></code></pre></p>
<p>More in this run: <a class="external" href="https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/">https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/</a></p>
teuthology - Bug #65181 (New): Scrape log not properly collected
https://tracker.ceph.com/issues/65181
2024-03-27T15:04:49Z
Laura Flores
<p>For the following run, the scrape log was almost empty despite the run having many failures.</p>
<p>/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/scrape.log<br /><pre><code class="text syntaxhl"><span class="CodeRay">Found 304 jobs
Missing teuthology log /home/teuthworker/archive/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623511/teuthology.log
</span></code></pre></p>
rgw - Bug #65179 (New): rgw incorrectly uses `Range` header in `X-Amz-Cache`
https://tracker.ceph.com/issues/65179
2024-03-27T14:25:25Z
Taha Jahangir
<p>As noted in RGW Data caching and CDN (<a class="external" href="https://docs.ceph.com/en/latest/radosgw/rgw-cache/">https://docs.ceph.com/en/latest/radosgw/rgw-cache/</a> commited in <a class="external" href="https://github.com/ceph/ceph/pull/33646">https://github.com/ceph/ceph/pull/33646</a>), the `X-Amz-Cache` header can be used to verify original headers/signature, but generating response with new `Range` header, but this doesn't work in action. And the response is returned with original `Range` header. A sample request/response is:</p>
<pre><code class="text syntaxhl"><span class="CodeRay">GET /temp/testfile HTTP/1.1
Host: myrgw.domain.com
x-amz-date: 20240327T134301Z
Authorization: AWS4-HMAC-SHA256 Credential=R612WE7A53PNXNZB4SUW/20240327/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-cache;x-amz-co....
Connection: close
x-amz-cache: .HOST.myrgw.domain.com.RANGE.bytes=10-20.X-AMZ-CONTENT-SHA256.e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.X-AMZ-DATE.20240327....
Range: bytes=0-5242879
X-Request-ID: ba471c53e0256f05585a26ba988968fc
X-Real-IP: 10.76.74.227
X-Forwarded-For: 10.76.74.227
X-Forwarded-Host: myrgw.domain.com
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Forwarded-Scheme: https
X-Scheme: https
Accept-Encoding: identity
User-Agent: Boto3/1.34.23 md/Botocore#1.34.23 ua/2.0 os/linux#6.6.22-1-lts md/arch#x86_64 lang/python#3.9.16 md/pyimpl#CPython Botocore/1.34.23 Resource
X-Amz-Content-SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
amz-sdk-invocation-id: d0b8677c-bcd2-4e65-8bf5-7499e7706396
amz-sdk-request: attempt=1
HTTP/1.1 206 Partial Content
Content-Length: 11
Content-Range: bytes 10-20/1371344
Accept-Ranges: bytes
Last-Modified: Tue, 11 Apr 2023 11:31:05 GMT
x-rgw-object-type: Normal
ETag: "444340706c6ec4d192b59d3f9a453525"
x-amz-meta-mtime: 1663901421.014806364
x-amz-request-id: tx00000a2d3b3ace625e991-0066042265-6b3e37f-myrgw
Content-Type: application/octet-stream
Date: Wed, 27 Mar 2024 13:43:01 GMT
Connection: close
........>..
</span></code></pre>
<p>Tested with Ceph v16.2.15 but bug should exists in all versions.</p>
<p>RGW Log (with level=0) and the req/res is attached.</p>
rgw - Bug #65177 (New): reef: Syscall param write(buf) points to uninitialised byte(s)
https://tracker.ceph.com/issues/65177
2024-03-27T13:42:52Z
Casey Bodley
cbodley@redhat.com
<p>saw on several jobs in <a class="external" href="https://pulpito.ceph.com/cbodley-2024-03-26_12:30:03-rgw-wip-63856-reef-distro-default-smithi/">https://pulpito.ceph.com/cbodley-2024-03-26_12:30:03-rgw-wip-63856-reef-distro-default-smithi/</a></p>
<p><a class="external" href="https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-26_12:30:03-rgw-wip-63856-reef-distro-default-smithi/7623215/teuthology.log">https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-26_12:30:03-rgw-wip-63856-reef-distro-default-smithi/7623215/teuthology.log</a></p>
<p><a class="external" href="https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-26_12:30:03-rgw-wip-63856-reef-distro-default-smithi/7623215/remote/smithi060/log/valgrind/ceph.client.0.log.gz">https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-26_12:30:03-rgw-wip-63856-reef-distro-default-smithi/7623215/remote/smithi060/log/valgrind/ceph.client.0.log.gz</a></p>
<pre><code class="xml syntaxhl"><span class="CodeRay"><span class="tag"><error></span>
<span class="tag"><unique></span>0x0<span class="tag"></unique></span>
<span class="tag"><tid></span>1<span class="tag"></tid></span>
<span class="tag"><kind></span>SyscallParam<span class="tag"></kind></span>
<span class="tag"><what></span>Syscall param write(buf) points to uninitialised byte(s)<span class="tag"></what></span>
<span class="tag"><stack></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x78D9E5D<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>syscall<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x9962941<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libunwind.so.8.0.1<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x9962A57<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libunwind.so.8.0.1<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x9967179<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libunwind.so.8.0.1<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x99681A1<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libunwind.so.8.0.1<span class="tag"></obj></span>
<span class="tag"><fn></span>_ULx86_64_step<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6F5871A<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libtcmalloc.so.4.5.9<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6F57C6F<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libtcmalloc.so.4.5.9<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6F3E371<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libtcmalloc.so.4.5.9<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6F3D9E6<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/libtcmalloc.so.4.5.9<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x400A1AD<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/ld-linux-x86-64.so.2<span class="tag"></obj></span>
<span class="tag"><fn></span>call_init<span class="tag"></fn></span>
<span class="tag"><dir></span>/usr/src/debug/glibc-2.34-82.el9.x86_64/elf<span class="tag"></dir></span>
<span class="tag"><file></span>dl-init.c<span class="tag"></file></span>
<span class="tag"><line></span>70<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x400A1AD<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/ld-linux-x86-64.so.2<span class="tag"></obj></span>
<span class="tag"><fn></span>call_init<span class="tag"></fn></span>
<span class="tag"><dir></span>/usr/src/debug/glibc-2.34-82.el9.x86_64/elf<span class="tag"></dir></span>
<span class="tag"><file></span>dl-init.c<span class="tag"></file></span>
<span class="tag"><line></span>26<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x400A29B<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/ld-linux-x86-64.so.2<span class="tag"></obj></span>
<span class="tag"><fn></span>_dl_init<span class="tag"></fn></span>
<span class="tag"><dir></span>/usr/src/debug/glibc-2.34-82.el9.x86_64/elf<span class="tag"></dir></span>
<span class="tag"><file></span>dl-init.c<span class="tag"></file></span>
<span class="tag"><line></span>117<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x4020E79<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib64/ld-linux-x86-64.so.2<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0xD<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A16<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A1E<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A2E<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A7B<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A7E<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A87<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A91<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A96<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000A99<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000AB9<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000AC4<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000AE8<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000B02<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x1FFF000B36<span class="tag"></ip></span>
<span class="tag"></frame></span>
<span class="tag"></stack></span>
<span class="tag"><auxwhat></span>Address 0x1fff000000 is on thread 1's stack<span class="tag"></auxwhat></span>
<span class="tag"></error></span>
</span></code></pre>
devops - Bug #65175 (New): ccache is always miss in confusa14
https://tracker.ceph.com/issues/65175
2024-03-27T08:56:46Z
Rixin Luo
<p>From: <a class="external" href="https://jenkins.ceph.com/job/ceph-pull-requests-arm64/54223/consoleFull">https://jenkins.ceph.com/job/ceph-pull-requests-arm64/54223/consoleFull</a></p>
<p>ccache -sz shows:<br /><pre>
Summary:
Hits: 0 / 0
Direct: 0 / 0
Preprocessed: 0 / 0
Misses: 0
Direct: 0
Preprocessed: 0
Primary storage:
Hits: 0 / 0
Misses: 0
Cache size (GB): 0.00 / 100.00 (0.00 %)
Use the -v/--verbose option for more details.
Statistics zeroed
</pre></p>
CephFS - Bug #65171 (New): Provide metrics support for the Replication Start/End Notifications
https://tracker.ceph.com/issues/65171
2024-03-27T05:04:30Z
Jos Collin
<p>BZ: <a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=2270946">https://bugzilla.redhat.com/show_bug.cgi?id=2270946</a></p>
<p>At present, metrics counters for the average/sum of time taken to sync all the snapshots are provided.</p>
<p>Metrics should be provided or enable monitoring logic to generate the following alerts:</p>
<ul>
<li>Metrics reflecting the alerts for start time of data replication.</li>
<li>Metrics reflecting the alerts for end time of data replication.</li>
<li>Metrics reflecting the alerts for time taken for the data replication(single snapshot replication) to complete.</li>
</ul>
<p>Also. include the metrics for start/end when a data replication restarts due to some failures.</p>
teuthology - Bug #65162 (New): cephadmunit daemon's start method fails
https://tracker.ceph.com/issues/65162
2024-03-26T20:31:48Z
Vallari Agrawal
<p>Calling daemon.start() of CephadmUnit gives following error:</p>
<pre><code class="text syntaxhl"><span class="CodeRay">
2024-03-25T09:25:34.295 ERROR:tasks.nvmeof.[nvmeof.thrasher]:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_vallariag_ceph_fb51d73931124de2f09457eaa7f1219e2aeb2c0d/qa/tasks/nvmeof.py", line 206, in _run
self.do_thrash()
File "/home/teuthworker/src/github.com_vallariag_ceph_fb51d73931124de2f09457eaa7f1219e2aeb2c0d/qa/tasks/nvmeof.py", line 257, in do_thrash
daemon.start()
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/daemon/cephadmunit.py", line 139, in start
self.remote.run(self.start_cmd)
TypeError: run() takes 1 positional argument but 2 were given
</span></code></pre>
<p>Seen in <a class="external" href="https://pulpito.ceph.com/vallariag-2024-03-25_08:42:26-rbd:nvmeof-ceph-nvmeof-mon-distro-default-smithi/">https://pulpito.ceph.com/vallariag-2024-03-25_08:42:26-rbd:nvmeof-ceph-nvmeof-mon-distro-default-smithi/</a></p>
Ceph - Bug #65158 (New): common/perf_counters_cache.cc: unused variable warnings
https://tracker.ceph.com/issues/65158
2024-03-26T16:49:45Z
Casey Bodley
cbodley@redhat.com
<p>when compiled in release mode, <code>assert()</code> macros get compiled out. this leads to unused variable warnings:<br /><pre>
ceph/src/common/perf_counters_cache.cc: In member function ‘void ceph::perf_counters::PerfCountersCache::check_key(const std::string&)’:
ceph/src/common/perf_counters_cache.cc:16:13: warning: variable ‘key_label’ set but not used [-Wunused-but-set-variable]
16 | for (auto key_label : key_labels) {
| ^~~~~~~~~
ceph/src/common/perf_counters_cache.cc:7:20: warning: variable ‘key_name’ set but not used [-Wunused-but-set-variable]
7 | std::string_view key_name = ceph::perf_counters::key_name(key);
| ^~~~~~~~
</pre></p>
CephFS - Bug #65157 (New): cephfs-mirror: set layout.pool_name xattr of destination subvol correctly
https://tracker.ceph.com/issues/65157
2024-03-26T15:27:11Z
Milind Changire
<ul>
<li>mirroring doesn't handle subvols correctly</li>
<li>the layout xattr is missing on the destination subvol dir i.e. the layout.pool_name</li>
</ul>
teuthology - Bug #65152 (New): teuthology-kill deleting a queued run gives traceback
https://tracker.ceph.com/issues/65152
2024-03-26T13:07:50Z
Vallari Agrawal
<p>I get following traceback when I try to kill a run where all jobs are still queued and not yet started:</p>
<pre><code class="text syntaxhl"><span class="CodeRay">$ teuthology-kill -o scheduled_vallariag@teuthology -m smithi -r vallariag-2024-03-26_12:00:31-rbd:nvmeof-ceph-nvmeof-mon-distro-default-smithi
2024-03-26 12:56:00,234.234 INFO:teuthology.kill:Checking Beanstalk Queue...
2024-03-26 12:56:05,665.665 INFO:teuthology.kill:Deleting job from queue. ID: 7623180 Name: vallariag-2024-03-26_12:00:31-rbd:nvmeof-ceph-nvmeof-mon-distro-default-smithi Desc: None
2024-03-26 12:56:05,678.678 INFO:teuthology.kill:Deleting job from queue. ID: 7623181 Name: vallariag-2024-03-26_12:00:31-rbd:nvmeof-ceph-nvmeof-mon-distro-default-smithi Desc: rbd:nvmeof/{base/install centos_latest conf/{disable-pool-app} workloads/nvmeof_thrash}
2024-03-26 12:56:05,681.681 INFO:teuthology.kill:Deleting job from queue. ID: 7623182 Name: vallariag-2024-03-26_12:00:31-rbd:nvmeof-ceph-nvmeof-mon-distro-default-smithi Desc: None
2024-03-26 12:56:57,215.215 INFO:teuthology.kill:Deleting jobs from paddles: ['7623181']
2024-03-26 12:56:57,340.340 INFO:teuthology.kill:No teuthology processes running
2024-03-26 12:56:58,623.623 INFO:teuthology.kill:No locked machines. Not nuking anything
Traceback (most recent call last):
File "/cephfs/home/vallariag/src/teuthology/virtualenv/bin/teuthology-kill", line 8, in <module>
sys.exit(main())
File "/cephfs/home/vallariag/src/teuthology/scripts/kill.py", line 44, in main
teuthology.kill.main(args)
File "/cephfs/home/vallariag/src/teuthology/teuthology/kill.py", line 40, in main
kill_run(run_name, archive_base, owner, machine_type,
File "/cephfs/home/vallariag/src/teuthology/teuthology/kill.py", line 78, in kill_run
report.try_mark_run_dead(run_name)
File "/cephfs/home/vallariag/src/teuthology/teuthology/report.py", line 579, in try_mark_run_dead
jobs = reporter.get_jobs(run_name, fields=['status'])
File "/cephfs/home/vallariag/src/teuthology/teuthology/report.py", line 374, in get_jobs
response.raise_for_status()
File "/home/vallariag/src/teuthology/virtualenv/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://paddles.front.sepia.ceph.com/runs/vallariag-2024-03-26_12:00:31-rbd:nvmeof-ceph-nvmeof-mon-distro-default-smithi/jobs/?fields=status,job_id
</span></code></pre>
<p>The jobs do get dequeued, the traceback could be avoided.</p>
CephFS - Bug #65147 (New): quincy: Test failure: test_non_existent_cluster (tasks.cephfs.test_nfs...
https://tracker.ceph.com/issues/65147
2024-03-26T09:41:11Z
Venky Shankar
vshankar@redhat.com
<p><a class="external" href="https://pulpito.ceph.com/vshankar-2024-03-14_16:52:41-fs-wip-vshankar-testing1-quincy-2024-03-14-0655-quincy-testing-default-smithi/7600857">https://pulpito.ceph.com/vshankar-2024-03-14_16:52:41-fs-wip-vshankar-testing1-quincy-2024-03-14-0655-quincy-testing-default-smithi/7600857</a></p>
<pre>
2024-03-17T14:00:47.746 INFO:teuthology.orchestra.run.smithi094.stdout:}
2024-03-17T14:00:47.749 DEBUG:teuthology.orchestra.run.smithi094:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph nfs cluster ls
2024-03-17T14:00:48.172 DEBUG:teuthology.orchestra.run.smithi094:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph nfs cluster info foo
2024-03-17T14:00:48.507 INFO:journalctl@ceph.mon.a.smithi094.stdout:Mar 17 14:00:48 smithi094 bash[17670]: cluster 2024-03-17T14:00:46.534517+0000 mgr.x (mgr.14992) 266 : cluster [DBG] pgmap v144: 33 pgs: 33 active+clean; 577 KiB data, 895 MiB used, 267 GiB / 268 GiB avail
2024-03-17T14:00:48.507 INFO:journalctl@ceph.mon.a.smithi094.stdout:Mar 17 14:00:48 smithi094 bash[17670]: audit 2024-03-17T14:00:47.075816+0000 mon.a (mon.0) 1925 : audit [INF] from='client.? 172.21.15.94:0/3942049144' entity='client.admin' cmd='[{"prefix": "log", "logtext": ["Starting test tasks.cephfs.test_nfs.TestNFS.test_non_existent_cluster"]}]': finished
2024-03-17T14:00:48.507 INFO:journalctl@ceph.mon.a.smithi094.stdout:Mar 17 14:00:48 smithi094 bash[17670]: audit 2024-03-17T14:00:47.509467+0000 mon.a (mon.0) 1926 : audit [DBG] from='client.? 172.21.15.94:0/3427461798' entity='client.admin' cmd=[{"prefix": "mgr module ls", "format": "json-pretty"}]: dispatch
2024-03-17T14:00:48.567 INFO:teuthology.orchestra.run.smithi094.stderr:Error ENOENT: cluster does not exist
2024-03-17T14:00:48.577 DEBUG:teuthology.orchestra.run:got remote process result: 2
2024-03-17T14:00:48.578 DEBUG:teuthology.orchestra.run.smithi094:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/ar
</pre>
CephFS - Bug #65136 (New): QA failure: test_fscrypt_dummy_encryption_with_quick_group
https://tracker.ceph.com/issues/65136
2024-03-26T06:23:14Z
Rishabh Dave
<p><a class="external" href="https://pulpito.ceph.com/yuriw-2024-03-14_15:28:28-fs-wip-yuri4-testing-2024-03-13-0733-reef-distro-default-smithi/7600356">https://pulpito.ceph.com/yuriw-2024-03-14_15:28:28-fs-wip-yuri4-testing-2024-03-13-0733-reef-distro-default-smithi/7600356</a></p>
<pre>
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:======================================================================
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:FAIL: test_fscrypt_dummy_encryption_with_quick_group (tasks.cephfs.test_fscrypt.TestFscrypt)
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_234d532354068e06d01621fd032c3b663cead394/qa/tasks/cephfs/test_fscrypt.py", line 76, in test_fscrypt_dummy_encryption_with_quick_group
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner: self.assertEqual(proc.returncode, 0)
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:AssertionError: 1 != 0
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:Ran 1 test in 4466.457s
</pre>
<p>Above job failure on the surface looks similar to <a class="external" href="https://tracker.ceph.com/issues/59684">https://tracker.ceph.com/issues/59684</a> because same test case from same suite failed with same traceback. But the function names present in the dmesg log of <a class="external" href="https://tracker.ceph.com/issues/59684">https://tracker.ceph.com/issues/59684</a> ("ceph_con_v1_try_read", for example) are not present anywhere in the directory of this job. <code>grep -rni ceph_con_v1_try_read /a/yuriw-2024-03-14_15:28:28-fs-wip-yuri4-testing-2024-03-13-0733-reef-distro-default-smithi/7600356</code> returned nothing. So I suspected that this is a different failure. Talking with Xiubo confirmed that this looks like different issue.</p>