Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2024-03-28T16:24:24Z
Ceph
Redmine
rgw - Bug #65212 (New): pubsub: validate Name in CreateTopic api
https://tracker.ceph.com/issues/65212
2024-03-28T16:24:24Z
Casey Bodley
cbodley@redhat.com
<p>prevent topic names that would confuse things like ARN parsing and rados object namespacing</p>
<p>from <a class="external" href="https://docs.aws.amazon.com/sns/latest/api/API_CreateTopic.html#API_CreateTopic_RequestParameters">https://docs.aws.amazon.com/sns/latest/api/API_CreateTopic.html#API_CreateTopic_RequestParameters</a></p>
<pre>
Name
The name of the topic you want to create.
Constraints: Topic names must be made up of only uppercase and lowercase ASCII letters, numbers, underscores, and hyphens, and must be between 1 and 256 characters long.
For a FIFO (first-in-first-out) topic, the name must end with the .fifo suffix.
Type: String
Required: Yes
</pre>
Dashboard - Backport #65211 (New): reef: mgr/dashboard: Mark placement targets as non-required
https://tracker.ceph.com/issues/65211
2024-03-28T16:05:55Z
Backport Bot
Dashboard - Backport #65210 (New): squid: mgr/dashboard: Mark placement targets as non-required
https://tracker.ceph.com/issues/65210
2024-03-28T16:05:48Z
Backport Bot
Dashboard - Backport #65209 (New): reef: mgr/dashboard: Align security fieldset and tag fieldset ...
https://tracker.ceph.com/issues/65209
2024-03-28T16:05:40Z
Backport Bot
Dashboard - Backport #65208 (New): squid: mgr/dashboard: Align security fieldset and tag fieldset...
https://tracker.ceph.com/issues/65208
2024-03-28T16:05:32Z
Backport Bot
Dashboard - Cleanup #65207 (New): mgr/dashboard: Move features to advanced section in create imag...
https://tracker.ceph.com/issues/65207
2024-03-28T15:43:36Z
Afreen Misbah
<a name="Move-features-to-advanced-section-in-create-image-form"></a>
<h3 >Move features to advanced section in create image form<a href="#Move-features-to-advanced-section-in-create-image-form" class="wiki-anchor">¶</a></h3>
<p>A followup from the comment <a class="external" href="https://github.com/ceph/ceph/pull/56514#issuecomment-2022426715">https://github.com/ceph/ceph/pull/56514#issuecomment-2022426715</a></p>
Dashboard - Backport #65206 (In Progress): quincy: mgr/dashboard: Clicking on Ceph logo do not ta...
https://tracker.ceph.com/issues/65206
2024-03-28T15:25:47Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56558">https://github.com/ceph/ceph/pull/56558</a></p>
Dashboard - Backport #65205 (In Progress): reef: mgr/dashboard: Clicking on Ceph logo do not take...
https://tracker.ceph.com/issues/65205
2024-03-28T15:25:39Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56557">https://github.com/ceph/ceph/pull/56557</a></p>
Dashboard - Backport #65204 (In Progress): squid: mgr/dashboard: Clicking on Ceph logo do not tak...
https://tracker.ceph.com/issues/65204
2024-03-28T15:25:32Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56556">https://github.com/ceph/ceph/pull/56556</a></p>
crimson - Bug #65203 (New): ReplicatedRecoveryBackend::recalc_subsets(ObjectRecoveryInfo&, crimso...
https://tracker.ceph.com/issues/65203
2024-03-28T15:00:23Z
Matan Breizman
<p>osd.3: <a class="external" href="https://pulpito.ceph.com/matan-2024-03-27_13:02:57-crimson-rados-main-distro-crimson-smithi/7626294">https://pulpito.ceph.com/matan-2024-03-27_13:02:57-crimson-rados-main-distro-crimson-smithi/7626294</a></p>
<p>After adding a restart OSDs to the thrash tests: <a class="external" href="https://github.com/ceph/ceph/pull/56511">https://github.com/ceph/ceph/pull/56511</a></p>
<pre><code class="text syntaxhl"><span class="CodeRay">DEBUG 2024-03-27 13:26:06,805 [shard 0:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): starting start_pg_operation
DEBUG 2024-03-27 13:26:06,805 [shard 0:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): start_pg_operation in await_active stage
DEBUG 2024-03-27 13:26:06,805 [shard 0:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): start_pg_operation active, entering await_map
DEBUG 2024-03-27 13:26:06,805 [shard 0:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): start_pg_operation await_map stage
DEBUG 2024-03-27 13:26:06,806 [shard 0:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): got map 26, entering get_pg_mapping
DEBUG 2024-03-27 13:26:06,806 [shard 0:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): can_create=false, target-core=2
DEBUG 2024-03-27 13:26:06,806 [shard 0:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): send 37 to the remote pg core 2
DEBUG 2024-03-27 13:26:06,806 [shard 2:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): entering create_or_wait_pg
DEBUG 2024-03-27 13:26:06,806 [shard 2:main] osd - background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))})): have_pg
DEBUG 2024-03-27 13:26:06,806 [shard 2:main] osd - 0x603000429b00 RecoverySubRequest::with_pg: RecoverySubRequest::with_pg: background_recovery_sub(id=362, detail=MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))}))
DEBUG 2024-03-27 13:26:06,806 [shard 2:main] osd - handle_pull_response: MOSDPGPush(3.d 26/25 {PushOp(3:bd1211d5:::smithi05531420-40:1, version: 18'16, data_included: [655473~716476,2099033~332100], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))}) v4
DEBUG 2024-03-27 13:26:06,806 [shard 2:main] osd - handle_pull_response ObjectRecoveryInfo(3:bd1211d5:::smithi05531420-40:1@0'0, size: 2655473, copy_subset: [(0, 2655473)], clone_subset: {}, snapset: 1=[]:{1: [1]}, object_exist: false) ObjectRecoveryProgress(!first, data_recovered_to: 2431133, data_complete: false, omap_recovered_to: , omap_complete: true, error: false) data.size() is 1048576 data_included: [(655473, 716476), (2099033, 332100)]
DEBUG 2024-03-27 13:26:06,807 [shard 2:main] osd - pg_epoch 26 pg[3.d( v 26'20 lc 17'15 (0'0,26'20] local-lis/les=25/26 n=0 ec=14/14 lis/c=25/14 les/c/f=26/15/0 sis=25) [3,0] r=0 lpr=25 pi=[14,25)/1 luod=26'21 lua=21'18 crt=26'21 mlcod 17'15 active+recovering+undersized+degraded ObjectContextLoader::with_head_obc: object 3:bd1211d5:::smithi05531420-40:head
DEBUG 2024-03-27 13:26:06,807 [shard 2:main] osd - pg_epoch 26 pg[3.d( v 26'20 lc 17'15 (0'0,26'20] local-lis/les=25/26 n=0 ec=14/14 lis/c=25/14 les/c/f=26/15/0 sis=25) [3,0] r=0 lpr=25 pi=[14,25)/1 luod=26'21 lua=21'18 crt=26'21 mlcod 17'15 active+recovering+undersized+degraded ObjectContextLoader::get_or_load_obc: cache hit on 3:bd1211d5:::smithi05531420-40:head
DEBUG 2024-03-27 13:26:06,807 [shard 2:main] osd - resolve_oid oid.snap=1,head snapset.seq=1
DEBUG 2024-03-27 13:26:06,807 [shard 2:main] osd - pg_epoch 26 pg[3.d( v 26'20 lc 17'15 (0'0,26'20] local-lis/les=25/26 n=0 ec=14/14 lis/c=25/14 les/c/f=26/15/0 sis=25) [3,0] r=0 lpr=25 pi=[14,25)/1 luod=26'21 lua=21'18 crt=26'21 mlcod 17'15 active+recovering+undersized+degraded ObjectContextLoader::get_or_load_obc: cache miss on 3:bd1211d5:::smithi05531420-40:1
DEBUG 2024-03-27 13:26:06,807 [shard 2:main] osd - load_metadata: object 3:bd1211d5:::smithi05531420-40:1 doesn't exist, returning empty metadata
DEBUG 2024-03-27 13:26:06,807 [shard 2:main] osd - pg_epoch 26 pg[3.d( v 26'20 lc 17'15 (0'0,26'20] local-lis/les=25/26 n=0 ec=14/14 lis/c=25/14 les/c/f=26/15/0 sis=25) [3,0] r=0 lpr=25 pi=[14,25)/1 luod=26'21 lua=21'18 crt=26'21 mlcod 17'15 active+recovering+undersized+degraded ObjectContextLoader::load_obc: loaded obs 3:bd1211d5:::smithi05531420-40:1(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) for 3:bd1211d5:::smithi05531420-40:1
DEBUG 2024-03-27 13:26:06,807 [shard 2:main] osd - pg_epoch 26 pg[3.d( v 26'20 lc 17'15 (0'0,26'20] local-lis/les=25/26 n=0 ec=14/14 lis/c=25/14 les/c/f=26/15/0 sis=25) [3,0] r=0 lpr=25 pi=[14,25)/1 luod=26'21 lua=21'18 crt=26'21 mlcod 17'15 active+recovering+undersized+degraded ObjectContextLoader::load_obc: returning obc 3:bd1211d5:::smithi05531420-40:1(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) for 3:bd1211d5:::smithi05531420-40:1
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2476-g56e21662/rpm/el9/BUILD/ceph-19.0.0-2476-g56e21662/src/crimson/osd/replicated_recovery_backend.cc:886: void ReplicatedRecoveryBackend::recalc_subsets(ObjectRecoveryInfo&, crimson::osd::SnapSetContextRef): Assertion `ssc' failed.
Aborting on shard 2.
Backtrace:
0# 0x00007F182BAA154C in /lib64/libc.so.6
1# raise in /lib64/libc.so.6
2# abort in /lib64/libc.so.6
3# 0x00007F182BA2871B in /lib64/libc.so.6
4# 0x00007F182BA4DCA6 in /lib64/libc.so.6
5# ReplicatedRecoveryBackend::recalc_subsets(ObjectRecoveryInfo&, boost::intrusive_ptr<crimson::osd::SnapSetContext>) in ceph-osd
</span></code></pre>
Ceph QA - QA Run #65202 (QA Testing): wip-yuri11-testing-2024-03-28-0753-reef
https://tracker.ceph.com/issues/65202
2024-03-28T14:59:35Z
Yuri Weinstein
yweinste@redhat.com
<p>--- done. these PRs were included:<br /><a class="external" href="https://github.com/ceph/ceph/pull/54729">https://github.com/ceph/ceph/pull/54729</a> - reef: qa/cephfs: improvements for name generators in test_volumes.py<br /><a class="external" href="https://github.com/ceph/ceph/pull/56359">https://github.com/ceph/ceph/pull/56359</a> - reef: mgr/dashboard/frontend:Ceph dashboard supports multiple languages<br /><a class="external" href="https://github.com/ceph/ceph/pull/56361">https://github.com/ceph/ceph/pull/56361</a> - reef: ceph.spec.in: add support for openEuler OS<br /><a class="external" href="https://github.com/ceph/ceph/pull/56479">https://github.com/ceph/ceph/pull/56479</a> - reef: pybind/mgr/devicehealth: skip legacy objects that cannot be loaded</p>
crimson - Bug #65201 (New): ReplicatedRecoveryBackend::prep_push_to_replica(const hobject_t&, eve...
https://tracker.ceph.com/issues/65201
2024-03-28T14:55:47Z
Matan Breizman
<p>osd.3: <a class="external" href="https://pulpito.ceph.com/matan-2024-03-27_13:02:57-crimson-rados-main-distro-crimson-smithi/7626293">https://pulpito.ceph.com/matan-2024-03-27_13:02:57-crimson-rados-main-distro-crimson-smithi/7626293</a></p>
<p>After adding a restart OSDs to the thrash tests: <a class="external" href="https://github.com/ceph/ceph/pull/56511">https://github.com/ceph/ceph/pull/56511</a></p>
<pre><code class="text syntaxhl"><span class="CodeRay">DEBUG 2024-03-27 13:27:01,678 [shard 0:main] osd - pg_epoch 45 pg[3.0( v 37'19 (0'0,37'19] local-lis/les=44/45 n=6 ec=14/14 lis/c=44/14 les/c/f=45/15/0 sis=44) [3,2,1] r=0 lpr=44 pi=[14,44)/1 crt=37'19 lcod 0'0 mlcod 0'0 active+recovering+degraded ObjectContextLoader::load_obc: returning obc 3:0254ed2b:::smithi01231316-5:8(37'18 client.4225.0:19 s 2067228 uv 3 alloc_hint [0 0 0]) for 3:0254ed2b:::smithi01231316-5:8
DEBUG 2024-03-27 13:27:01,678 [shard 0:main] osd - recover_object: loaded obc: 3:0254ed2b:::smithi01231316-5:8
DEBUG 2024-03-27 13:27:01,678 [shard 0:main] osd - prep_push_to_replica: 3:0254ed2b:::smithi01231316-5:8, 37'18
ERROR 2024-03-27 13:27:01,678 [shard 0:main] none - /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2476-g56e21662/rpm/el9/BUILD/ceph-19.0.0-2476-g56e21662/src/crimson/osd/replicated_recovery_backend.cc:347 : In function 'RecoveryBackend::interruptible_future<PushOp> ReplicatedRecoveryBackend::prep_push_to_replica(const hobject_t&, eversion_t, pg_shard_t)', ceph_assert(%s)
ssc
Aborting on shard 0.
Backtrace:
0# 0x00007F96396A154C in /lib64/libc.so.6
1# raise in /lib64/libc.so.6
2# abort in /lib64/libc.so.6
3# ceph::__ceph_assert_fail(ceph::assert_data const&) in ceph-osd
4# ReplicatedRecoveryBackend::prep_push_to_replica(hobject_t const&, eversion_t, pg_shard_t) in ceph-osd
</span></code></pre>
crimson - Bug #65200 (New): PeeringState::get_peer_info(pg_shard_t) const: Assertion `it != peer_...
https://tracker.ceph.com/issues/65200
2024-03-28T14:54:17Z
Matan Breizman
<p>osd.1: <a class="external" href="https://pulpito.ceph.com/matan-2024-03-27_13:02:57-crimson-rados-main-distro-crimson-smithi/7626293">https://pulpito.ceph.com/matan-2024-03-27_13:02:57-crimson-rados-main-distro-crimson-smithi/7626293</a></p>
<p>After adding a restart OSDs to the thrash tests: <a class="external" href="https://github.com/ceph/ceph/pull/56511">https://github.com/ceph/ceph/pull/56511</a></p>
<pre><code class="text syntaxhl"><span class="CodeRay">INFO 2024-03-27 13:27:01,801 [shard 0:main] osd - start_primary_recovery_ops recovering 0 in pg pg_epoch 45 pg[3.2( v 40'55 lc 36'54 (0'0,40'55] local-lis/les=44/45 n=0 ec=14/14 lis/c=44/14 les/c/f=45/15/0 sis=44) [1,0,3] r=0 lpr=44 pi=[14,44)/2 crt=40'55 mlcod 0'0 active+recovering , missing missing(1 may_include_deletes = 1)
INFO 2024-03-27 13:27:01,801 [shard 0:main] osd - start_primary_recovery_ops 3:48a442ac:::smithi01231316-12:head item.need 40'55 (missing) (missing head)
INFO 2024-03-27 13:27:01,801 [shard 0:main] osd - recover_missing 3:48a442ac:::smithi01231316-12:head v 40'55
INFO 2024-03-27 13:27:01,801 [shard 0:main] osd - recover_missing 3:48a442ac:::smithi01231316-12:head v 40'55, new recovery
DEBUG 2024-03-27 13:27:01,801 [shard 0:main] osd - recover_object: 3:48a442ac:::smithi01231316-12:head, 40'55
DEBUG 2024-03-27 13:27:01,801 [shard 0:main] osd - maybe_pull_missing_obj: 3:48a442ac:::smithi01231316-12:head, 40'55
DEBUG 2024-03-27 13:27:01,802 [shard 0:main] osd - pg_epoch 45 pg[3.2( v 40'55 lc 36'54 (0'0,40'55] local-lis/les=44/45 n=0 ec=14/14 lis/c=44/14 les/c/f=45/15/0 sis=44) [1,0,3] r=0 lpr=44 pi=[14,44)/2 crt=40'55 mlcod 0'0 active+recovering ObjectContextLoader::with_head_obc: object 3:48a442ac:::smithi01231316-12:head
INFO 2024-03-27 13:27:01,802 [shard 0:main] osd - start_primary_recovery_ops started 1 skipped 1
DEBUG 2024-03-27 13:27:01,802 [shard 0:main] osd - pg_epoch 45 pg[3.2( v 40'55 lc 36'54 (0'0,40'55] local-lis/les=44/45 n=0 ec=14/14 lis/c=44/14 les/c/f=45/15/0 sis=44) [1,0,3] r=0 lpr=44 pi=[14,44)/2 crt=40'55 mlcod 0'0 active+recovering ObjectContextLoader::get_or_load_obc: cache hit on 3:48a442ac:::smithi01231316-12:head
DEBUG 2024-03-27 13:27:01,802 [shard 0:main] osd - prepare_pull: 3:48a442ac:::smithi01231316-12:head, 40'55
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2476-g56e21662/rpm/el9/BUILD/ceph-19.0.0-2476-g56e21662/src/osd/PeeringState.h:2349: const pg_info_t& PeeringState::get_peer_info(pg_shard_t) const: Assertion `it != peer_info.end()' failed.
Aborting on shard 0.
Backtrace:
Reactor stalled for 159 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x118a06 0x118829 0x6efa70b 0x6efca58 0x6efd993 0x6efde84 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack: 0xffffffffffffff80 0xffffffff8e781dc1 0xffffffff8e782126 0xffffffff8e505d94 0xffffffff8e505f31 0xffffffff8e50733f 0xffffffff8e50801b 0xffffffff8e5084d0 0xffffffff8f07e45c 0xffffffff8f2000ea
Reactor stalled for 303 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x195b59 0x6efa069 0x6efc6cb 0x6efd993 0x6efde84 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack:
Reactor stalled for 539 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x195b53 0x6efa069 0x6efe1dd 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack:
Reactor stalled for 975 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x195bc1 0x6efa069 0x6efc6cb 0x6efd006 0x6efd5f7 0x6efd7b2 0x6efdcdf 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack:
0# 0x00007F0AE5EA154C in /lib64/libc.so.6
1# raise in /lib64/libc.so.6
2# abort in /lib64/libc.so.6
3# 0x00007F0AE5E2871B in /lib64/libc.so.6
4# 0x00007F0AE5E4DCA6 in /lib64/libc.so.6
5# PeeringState::get_peer_info(pg_shard_t) const in ceph-osd
6# ReplicatedRecoveryBackend::prepare_pull(boost::intrusive_ptr<crimson::osd::ObjectContext> const&, PullOp&, RecoveryBackend::pull_info_t&, hobject_t const&, eversion_t) in ceph-
</span></code></pre>
Ceph - Bug #65199 (New): autoscaler: Scale PGs based on number of objects
https://tracker.ceph.com/issues/65199
2024-03-28T12:42:52Z
Niklas Hambuechen
<p>Ceph's autoscaler scales PGs based on Bytes stored. It seemingly ignores number of objects. This creates problems for pools with many small files.</p>
<p>It creates even more problems for pools with an apparent byte size of 0, but millions of objects; such pools get created when following CephFS-on-EC best practices in the docs.</p>
<p>Red Hat docs describe:</p>
<p><a class="external" href="https://access.redhat.com/documentation/de-de/red_hat_ceph_storage/4/html/storage_strategies_guide/placement_groups_pgs#viewing-placement-group-scaling-recommendations">https://access.redhat.com/documentation/de-de/red_hat_ceph_storage/4/html/storage_strategies_guide/placement_groups_pgs#viewing-placement-group-scaling-recommendations</a></p>
<blockquote>
<p><strong>BIAS</strong>, is a pool property that is used by the PG autoscaler to scale some pools faster than others, in terms of number of PGs. It is essentially a multiplier used to give more PG to a pool than the default number of PGs. This property is <strong>particularly used for metadata pools which might be small in size but have large number of objects</strong>, so scaling them faster is important for better performance.`</p>
</blockquote>
<p>(Note these docs are better than the upstream Ceph docs on BIAS, which are much shorter: <a class="external" href="https://docs.ceph.com/en/reef/rados/operations/placement-groups/">https://docs.ceph.com/en/reef/rados/operations/placement-groups/</a>)</p>
<p>So this confirms that BIAS (pg_autoscale_bias) can be used to partially address the "many small objects", using a constant factor.</p>
<p>But the constant factor stops working when the objects are 0-sized.</p>
<p>This happens when following CephFS best practices: <a class="external" href="https://docs.ceph.com/en/reef/cephfs/createfs/#creating-pools">https://docs.ceph.com/en/reef/cephfs/createfs/#creating-pools</a></p>
<blockquote>
<p>The data pool used to create the file system is the “default” data pool and the location for storing all inode backtrace information, which is used for hard link management and disaster recovery. For this reason, all CephFS inodes have at least one object in the default data pool.<br /><strong>If erasure-coded pools are planned for file system data, it is best to configure the default as a replicated pool</strong> to improve small-object write and read performance when updating backtraces. Separately, another erasure-coded data pool can be added (see also Erasure code) that can be used on an entire hierarchy of directories and files (see also File layouts).</p>
</blockquote>
<p>If you do what is described here ("default" pool on replicated, directory File Layout on EC), you end up with pools like this in `ceph df`:</p>
<pre>
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 203 MiB 26 609 MiB 90.00 5 GiB
data 2 32 0 B 112.23M 0 B 0 61 TiB
data_ec 3 168 124 TiB 115.30M 186 TiB 50.53 121 TiB
metadata 4 128 63 GiB 32.87k 189 GiB 90.00 5 GiB
</pre>
<p>Note how <strong>the `data` pool that stores the inodes bas 112 M objects but 0 Bytes stored</strong>. Apparently the inodes</p>
<p>Because the data size is low (0), the autoscaler assigns no more than 32 PGs.</p>
<p>This means that there are ~4 M objects per PG. If the objects are on HDD that can do 100 seeks per second, running scrubbing, recovery, or balancing (which needs to seek all objects in a PG) will <strong>take at least 11 hours</strong>. And this does not even take EC overhead factors into account.</p>
<p>If there were 1 B objects, handling a single PG would take > 100 hours.</p>
<p>There seems to be nothing in Ceph that scales PGs based on number of objects. This issue requests that to be added.</p>
<p>This would:</p>
<ul>
<li>Fix that CephFS EC recommendations actually make sense and do not produce operational problems.</li>
<li>Improve Ceph's default behaviour for many small files/objects, without the user manually having to set BIAS.</li>
</ul>
RADOS - Backport #65198 (In Progress): squid: Failed to encode map X with expected CRC
https://tracker.ceph.com/issues/65198
2024-03-28T10:29:12Z
Backport Bot