Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2024-03-28T19:36:30ZCeph
Redmine Ceph QA - QA Run #65215 (QA Testing): wip-batrick-testing-20240328.192822https://tracker.ceph.com/issues/652152024-03-28T19:36:30ZPatrick Donnellypdonnell@redhat.comCephFS - Backport #65214 (In Progress): squid: mds: quiesce_inode op waiting on remote auth pins ...https://tracker.ceph.com/issues/652142024-03-28T18:45:10ZBackport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56564">https://github.com/ceph/ceph/pull/56564</a></p> rgw - Bug #65212 (Fix Under Review): pubsub: validate Name in CreateTopic apihttps://tracker.ceph.com/issues/652122024-03-28T16:24:24ZCasey Bodleycbodley@redhat.com
<p>prevent topic names that would confuse things like ARN parsing and rados object namespacing</p>
<p>from <a class="external" href="https://docs.aws.amazon.com/sns/latest/api/API_CreateTopic.html#API_CreateTopic_RequestParameters">https://docs.aws.amazon.com/sns/latest/api/API_CreateTopic.html#API_CreateTopic_RequestParameters</a></p>
<pre>
Name
The name of the topic you want to create.
Constraints: Topic names must be made up of only uppercase and lowercase ASCII letters, numbers, underscores, and hyphens, and must be between 1 and 256 characters long.
For a FIFO (first-in-first-out) topic, the name must end with the .fifo suffix.
Type: String
Required: Yes
</pre> Dashboard - Backport #65211 (New): reef: mgr/dashboard: Mark placement targets as non-requiredhttps://tracker.ceph.com/issues/652112024-03-28T16:05:55ZBackport BotDashboard - Backport #65210 (New): squid: mgr/dashboard: Mark placement targets as non-requiredhttps://tracker.ceph.com/issues/652102024-03-28T16:05:48ZBackport BotDashboard - Backport #65209 (New): reef: mgr/dashboard: Align security fieldset and tag fieldset ...https://tracker.ceph.com/issues/652092024-03-28T16:05:40ZBackport BotDashboard - Backport #65208 (New): squid: mgr/dashboard: Align security fieldset and tag fieldset...https://tracker.ceph.com/issues/652082024-03-28T16:05:32ZBackport BotCephFS - Bug #65182 (Pending Backport): mds: quiesce_inode op waiting on remote auth pins is not ...https://tracker.ceph.com/issues/651822024-03-27T16:01:30ZPatrick Donnellypdonnell@redhat.com
<pre>
{
"description": "internal op quiesce_path:mds.1:1048 fp=#0x1/volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2",
"initiated_at": "2024-03-26T10:06:14.974850+0000",
"age": 101818.022728012,
"duration": 101818.025116246,
"continuous": true,
"type_data": {
"result": -2147483648,
"flag_point": "cleaned up request",
"reqid": {
"entity": {
"type": "mds",
"num": 1
},
"tid": 1048
},
"op_type": "internal_op",
"internal_op": 5384,
"op_name": "quiesce_path",
"events": [
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "initiated"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "throttled"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "header_read"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "all_read"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "dispatched"
},
{
"time": "2024-03-26T10:06:14.974869+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:14.974879+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:14.974888+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:14.974898+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:21.501232+0000",
"event": "killing request"
},
{
"time": "2024-03-26T10:06:21.501253+0000",
"event": "cleaned up request"
}
],
"locks": []
}
},
...
{
"description": "internal op quiesce_inode:mds.1:1049 fp=#0x100008e255a fp2=#0x100008e255a",
"initiated_at": "2024-03-26T10:06:14.974908+0000",
"age": 101818.022670109,
"duration": 101818.02511086701,
"continuous": true,
"type_data": {
"result": -2147483648,
"flag_point": "quiesce complete for non-auth inode",
"reqid": {
"entity": {
"type": "mds",
"num": 1
},
"tid": 1049
},
"op_type": "internal_op",
"internal_op": 5385,
"op_name": "quiesce_inode",
"events": [
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "initiated"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "throttled"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "header_read"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "all_read"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "dispatched"
},
{
"time": "2024-03-26T10:06:14.974977+0000",
"event": "requesting remote authpins"
},
{
"time": "2024-03-26T10:06:21.615411+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:21.615458+0000",
"event": "quiesce complete for non-auth inode"
}
],
"locks": [
{
"object": {
"is_auth": false,
"auth_state": {
"replicas": {}
},
"replica_state": {
"authority": [
0,
-2
],
"replica_nonce": 1
},
"auth_pins": 0,
"is_frozen": false,
"is_freezing": false,
"pins": {
"request": 1,
"lock": 1
},
"nref": 2
},
"object_string": "[inode 0x100008e255a [...2ae,head] /volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2/ rep@0.1 v1696 snaprealm=0x55b78d09f440 f(v0 m2024-03-26T10:05:13.326074+0000 10=2+8) n(v56 rc2024-03-26T10:17:04.624239+0000 b2670077140 31541=28967+2574)/n(v0 rc2024-03-26T09:40:15.892764+0000 b1027604480 138=3+135) (inest mix) (iquiesce lock x=1 by request(mds.1:1049 nref=3)) | request=1 lock=1 0x55b78d1b4580]",
"lock": {
"gather_set": [],
"state": "lock",
"type": "iquiesce",
"is_leased": false,
"num_rdlocks": 0,
"num_wrlocks": 0,
"num_xlocks": 1,
"xlock_by": {
"reqid": {
"entity": {
"type": "mds",
"num": 1
},
"tid": 1049
}
}
},
"flags": 4,
"wrlock_target": -1
}
]
}
},
</pre>
<p>This is an op dump from a QE test cluster. The quiesce_path was killed and shortly after the quiesce_inode op received the remote authpins allowing it to proceed. However, MDCache::request_kill does not actually kill a request waiting on remote authpins so it is allowed to proceed with its quiesce.</p> Ceph QA - QA Run #65159 (QA Testing): wip-yuri4-testing-2024-03-26-1132-reefhttps://tracker.ceph.com/issues/651592024-03-26T18:35:40ZYuri Weinsteinyweinste@redhat.com
<p>--- done. these PRs were included:<br /><a class="external" href="https://github.com/ceph/ceph/pull/54258">https://github.com/ceph/ceph/pull/54258</a> - reef: os/bluestore: add bluestore fragmentation micros to prometheus<br /><a class="external" href="https://github.com/ceph/ceph/pull/55548">https://github.com/ceph/ceph/pull/55548</a> - reef: mon: fix health store size growing infinitely<br /><a class="external" href="https://github.com/ceph/ceph/pull/55697">https://github.com/ceph/ceph/pull/55697</a> - reef: osd: Report health error if OSD public address is not within subnet<br /><a class="external" href="https://github.com/ceph/ceph/pull/55777">https://github.com/ceph/ceph/pull/55777</a> - reef: os/bluestore: fix free space update after bdev-expand in NCB mode<br /><a class="external" href="https://github.com/ceph/ceph/pull/55867">https://github.com/ceph/ceph/pull/55867</a> - reef: mon/OSDMonitor: fix get_min_last_epoch_clean()<br /><a class="external" href="https://github.com/ceph/ceph/pull/56140">https://github.com/ceph/ceph/pull/56140</a> - reef: rgw/notification: Kafka persistent notifications not retried and removed even when the broker is down <br /><a class="external" href="https://github.com/ceph/ceph/pull/56197">https://github.com/ceph/ceph/pull/56197</a> - reef: os/kv_test: Fix estimate functions<br /><a class="external" href="https://github.com/ceph/ceph/pull/56347">https://github.com/ceph/ceph/pull/56347</a> - reef: rgw: Add missing empty checks to the split string in is_string_in_set().</p> Orchestrator - Bug #65122 (New): cephadmin returns "1" on successful host-maintenance enter/exit ...https://tracker.ceph.com/issues/651222024-03-25T18:36:46ZDan Brown
<p>cephadmin returns "1" on successful host-maintenance enter/exit - should return "0" as is convention.</p>
<p>~$ sudo cephadm host-maintenance enter --fsid XXXX-XXXXXX-XXXX-XXXXX Inferring config /var/lib/ceph/XXXX-XXXXXX-XXXX-XXXXXconfig/ceph.conf<br />Requested to place host into maintenance<br />success - systemd target ceph-XXXX-XXXXXX-XXXX-XXXXX.target disabled<br />~$ echo $?<br />1</p>
<p>~$ sudo cephadm host-maintenance exit --fsidXXXX-XXXXXX-XXXX-XXXXX<br />Inferring config /var/lib/ceph/XXXX-XXXXXX-XXXX-XXXXX /config/ceph.conf<br />Requested to exit maintenance state<br />success - systemd target ceph-XXXX-XXXXXX-XXXX-XXXXX .target enabled and started<br />~$ echo $?<br />1</p> Dashboard - Cleanup #65110 (Pending Backport): mgr/dashboard: Align security fieldset and tag fie...https://tracker.ceph.com/issues/651102024-03-25T08:20:26ZAfreen Misbah
<a name="Align-security-fieldset-with-the-rest-of-the-bucket-form"></a>
<h3 >Align security fieldset with the rest of the bucket form<a href="#Align-security-fieldset-with-the-rest-of-the-bucket-form" class="wiki-anchor">¶</a></h3>
<p><em>Security fieldset and tag fieldset needs to e updated as per recent changes in bucket form</em></p> CephFS - Bug #65018 (Fix Under Review): PG_DEGRADED warnings during cluster creation via cephadm:...https://tracker.ceph.com/issues/650182024-03-21T00:27:43ZPatrick Donnellypdonnell@redhat.com
<pre>
2024-03-20T19:01:35.938 DEBUG:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:360516069d9393362c4cc6eb9371680fe16d66ab shell --fsid b40d606c-e6ea-11ee-95c9-87774f69a715 -- ceph osd last-stat-seq osd.1
2024-03-20T19:01:36.042 INFO:journalctl@ceph.mon.a.smithi043.stdout:Mar 20 19:01:35 smithi043 ceph-mon[31664]: osdmap e88: 12 total, 12 up, 12 in
2024-03-20T19:01:36.250 INFO:journalctl@ceph.mon.b.smithi118.stdout:Mar 20 19:01:35 smithi118 ceph-mon[36322]: osdmap e88: 12 total, 12 up, 12 in
2024-03-20T19:01:36.261 INFO:journalctl@ceph.mon.c.smithi151.stdout:Mar 20 19:01:35 smithi151 ceph-mon[36452]: osdmap e88: 12 total, 12 up, 12 in
2024-03-20T19:01:36.479 INFO:teuthology.orchestra.run.smithi043.stdout:223338299439
2024-03-20T19:01:36.479 DEBUG:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:360516069d9393362c4cc6eb9371680fe16d66ab shell --fsid b40d606c-e6ea-11ee-95c9-87774f69a715 -- ceph osd last-stat-seq osd.7
2024-03-20T19:01:36.513 INFO:teuthology.orchestra.run.smithi043.stderr:Inferring config /var/lib/ceph/b40d606c-e6ea-11ee-95c9-87774f69a715/mon.a/config
2024-03-20T19:01:37.010 INFO:journalctl@ceph.mon.a.smithi043.stdout:Mar 20 19:01:36 smithi043 ceph-mon[31664]: pgmap v349: 97 pgs: 1 activating+degraded, 2 activating, 1 peering, 93 active+clean; 639 KiB data, 360 MiB used, 1.0 TiB / 1.0 TiB avail; 3.1 KiB/s rd, 5 op/s; 2/192 objects degraded (1.042%)
2024-03-20T19:01:37.011 INFO:journalctl@ceph.mon.a.smithi043.stdout:Mar 20 19:01:36 smithi043 ceph-mon[31664]: Health check failed: Degraded data redundancy: 2/192 objects degraded (1.042%), 1 pg degraded (PG_DEGRADED)
</pre>
<p><a class="external" href="https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612919/">https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612919/</a></p>
<p>many others also fail in similar fashion.</p> rgw - Backport #64792 (In Progress): reef: Notification kafka: Persistent messages are removed ev...https://tracker.ceph.com/issues/647922024-03-07T15:31:10ZBackport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56140">https://github.com/ceph/ceph/pull/56140</a></p> Dashboard - Cleanup #64708 (Pending Backport): mgr/dashboard: Mark placement targets as non-requiredhttps://tracker.ceph.com/issues/647082024-03-05T08:06:50ZAfreen Misbah
<a name="What-are-Placement-Targets-"></a>
<h3 >What are Placement Targets ?<a href="#What-are-Placement-Targets-" class="wiki-anchor">¶</a></h3>
<p>Placement targets control which Pools are associated with a particular bucket. A bucket’s placement target is selected on creation, and cannot be modified.<br />Where are they set ? Where to edit them ?<br />Zonegroup and zones</p>
<p>Placement target details are present in zonegroup and zone. Be default a default placement target is created and used.<br />Users</p>
<p>RGW Users can also se a default placement for themseleves which will the put a Location constraint on the buckets and then buckets will use that placement target only<br />LocationConstraint</p>
<p>Passing a particular placement target in LocationConstraint will override above two and use the passed one for bucket.</p>
<p>Ceph dashboard UI allows easy way to set that!<br />Having said that, in case users dont set this then it will go to:</p>
<p>1. Users’ default placement</p>
<p>2. If that is not found, it will default to zonegroup’s default placement which is always present.</p>
<a name="Working-of-placement-targets"></a>
<h3 >Working of placement targets<a href="#Working-of-placement-targets" class="wiki-anchor">¶</a></h3>
<p>1. Create a bucket without specifying any placement target and choose a user which has no default placement set</p>
<pre><code>This will create a bucket with default placement target present on zonegroup</code></pre>
<p>2. Create a bucket without specifying any placement target and choose a user which has a default placement set</p>
<pre><code>This will create a bucket with default placement target present in user’s config</code></pre>
<p>3. Create a bucket specifying any placement target and choose a user which has a default placement set</p>
<pre><code>This will create a bucket with placement target selected by user overriding what present on user and zonegroup</code></pre>
<a name="For-UI"></a>
<h3 >For UI<a href="#For-UI" class="wiki-anchor">¶</a></h3>
<p>- It makes sense to mark it as optional and not required as per above findings</p>
<p>- Also, since this setting has various defaults we can move it to advanced</p> CephFS - Bug #51282 (Fix Under Review): pybind/mgr/mgr_util: .mgr pool may be created too early c...https://tracker.ceph.com/issues/512822021-06-19T02:43:31ZPatrick Donnellypdonnell@redhat.com
<pre>
2021-06-16T22:22:43.040+0000 7f6e8e779700 20 mon.a@0(leader).mgrstat health checks:
{
"PG_DEGRADED": {
"severity": "HEALTH_WARN",
"summary": {
"message": "Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded",
"count": 1
},
"detail": [
{
"message": "pg 1.0 is active+undersized+degraded, acting [7]"
}
]
}
}
</pre>
<p>From: /ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-20210616.191804-distro-basic-smithi/6175605/remote/smithi120/log/ceph-mon.a.log.gz</p>
<p>and a few other tests:</p>
<pre>
Failure: "2021-06-16T22:22:43.881363+0000 mon.a (mon.0) 143 : cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster log
7 jobs: ['6175605', '6175619', '6175556', '6175591', '6175671', '6175600', '6175639']
suites intersection: ['conf/{client', 'mds', 'mon', 'osd}', 'overrides/{frag_enable', 'whitelist_health', 'whitelist_wrongly_marked_down}']
suites union: ['clusters/1a11s-mds-1c-client-3node', 'clusters/1a3s-mds-1c-client', 'conf/{client', 'distro/{centos_8}', 'distro/{rhel_8}', 'distro/{ubuntu_latest}', 'fs/snaps/{begin', 'fs/workload/{begin', 'k-testing}', 'mds', 'mon', 'mount/fuse', 'mount/kclient/{mount', 'ms-die-on-skipped}}', 'objectstore-ec/bluestore-comp', 'objectstore-ec/bluestore-comp-ec-root', 'omap_limit/10', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/testing/{flavor/centos_latest', 'overrides/{distro/testing/{flavor/ubuntu_latest', 'overrides/{frag_enable', 'ranks/1', 'ranks/3', 'ranks/5', 'scrub/no', 'scrub/yes', 'session_timeout', 'standby-replay', 'tasks/workunit/snaps}', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/fs/misc}', 'workunit/fs/test_o_trunc}', 'workunit/suites/ffsb}', 'workunit/suites/fsstress}', 'workunit/suites/iogen}', 'workunit/suites/iozone}', 'wsync/{no}}', 'wsync/{yes}}']
</pre>
<p>I'm thinking this check is not quite right:https://github.com/ceph/ceph/blob/05d7f883a04d230cf17b40a1c7e8044d402c6a30/src/pybind/mgr/mgr_module.py#L972-L981</p>
<p>(I lifted that code from the devicehealth module.)</p>