Fix #64573: singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cluster log warning to ignorelist. - RADOS - Ceph

Actions

Copy link

Fix #64573

open

singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cluster log warning to ignorelist.

Added by Sridhar Seshasayee 2 months ago. Updated about 1 month ago.

Status:

Pending Backport

Priority:

Normal

Assignee:

Sridhar Seshasayee

Category:

Tests

Target version:

% Done:

Source:

Tags:

backport_processed

Backport:

squid, reef

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

55764

Crash signature (v1):

Crash signature (v2):

Description

The changes introduced as part of PR: https://github.com/ceph/ceph/pull/53524
made the randomized values of osd_op_queue and osd_op_queue_cut_off
consistent across all OSD shards. The teuthology test results did not flag
any related failure and the PR was merged to "main" branch.

But it's quite likely that due to PR: https://github.com/ceph/ceph/pull/47830
that was merged prior, the following cluster log warning from the singleton
test 'qa/tasks/ec_inconsistent_hinfo.py' (in addition to all other cluster
warnings/errors from other tests) was missed:

"cluster [WRN] Error(s) ignored for 2:ad551702:::test:head enough copies available"

[ Note that there are a couple of PRs that fix the effect of
https://github.com/ceph/ceph/pull/47830 - See PRs:
1. https://github.com/ceph/ceph/pull/49730 and
2. https://github.com/ceph/ceph/pull/55455
]

The above warning showed up in "Reef" testing for the backport
PR: https://github.com/ceph/ceph/pull/54981 since
PR: https://github.com/ceph/ceph/pull/47830 is not backported.

NOTE:
The Reef backport of this tracker must be tested and merged along with
the Reef PR: https://github.com/ceph/ceph/pull/54981

The above warning is benign as the analysis below will show. In summary, the
warning is generated due to the difference in the PG deletion rates between
WPQ and mClock schedulers. Therefore, the warning shows up in cases where
the mClock scheduler is the op queue scheduler chosen randomly for the test.

With WPQ scheduler, the PG deletion rate is considerably slower due to sleeps
introduced between the delete transactions. With mClock scheduler, the
overall PG deletion cost determines the rate of PG deletion. In comparison,
PG deletion with mClock scheduler is much faster than with WPQ scheduler.

ANALYSIS¶

Behavior with WPQ Scheduler:(No cluster warning)¶

The analysis starts from the following test:

"remove test object hash info from primary osd.3 shard and test backfill"

The test corrupts an object on the primary osd.3 and marks osd.1 OUT to trigger a
rebalance/backfill. During the backfill phase, osd.3 recovers the inconsistent
object by reading from osd.1(1) shard as shown below:

2024-02-07T12:12:42.961+0000 7fd6033a1640 -1 osd.3 pg_epoch: 31 pg[2.0s0( v 23'24118 (23'24101,23'24118] local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/27 les/c/f=31/28/0 sis=30) [3,0,2]/[3,1,2]p3(0) backfill=[0(1)] r=0 lpr=30 pi=[27,30)/1 crt=23'24118 lcod 0'0 mlcod 0'0 active+remapped+backfilling rops=1 mbc={0={},1={},2={}}] continue_recovery_op: 2:ad551702:::test:head has inconsistent hinfo
2024-02-07T12:12:42.961+0000 7fd6033a1640 -1 log_channel(cluster) log [ERR] : 2.0s0 missing primary copy of 2:ad551702:::test:head, will try copies on 1(1),2(2)
2024-02-07T12:12:42.961+0000 7fd6033a1640  1 -- v1:172.21.15.27:6805/3970993654 --> v1:172.21.15.27:6813/3711275643 -- pg_backfill(progress 2.0s1 e 31/31 lb 2:ad551702:::test:head) v3 -- 0x55e51bbb6100 con 0x55e51b627c00

On osd.1, the inconsistent object is read and recovered:

2024-02-07T12:12:42.961+0000 7f5244088640  1 -- v1:172.21.15.27:6809/1081303851 <== osd.3 v1:172.21.15.27:6805/3970993654 16262 ==== MOSDECSubOpRead(2.0s1 31/30 ECSubRead(tid=16239, to_read={2:ad551702:::test:head=0,524288,0}, subchunks={2:ad551702:::test:head=[0,1]}, attrs_to_read=2:ad551702:::test:head)) v3 ==== 258+0+0 (unknown 1787856326 0 0) 0x55d103beee00 con 0x55d0fd21bc00
2024-02-07T12:12:42.961+0000 7f5244088640 15 osd.1 31 enqueue_op MOSDECSubOpRead(2.0s1 31/30 ECSubRead(tid=16239, to_read={2:ad551702:::test:head=0,524288,0}, subchunks={2:ad551702:::test:head=[0,1]}, attrs_to_read=2:ad551702:::test:head)) v3 prio 3 type 110 cost 0 latency 0.000025 epoch 31 MOSDECSubOpRead(2.0s1 31/30 ECSubRead(tid=16239, to_read={2:ad551702:::test:head=0,524288,0}, subchunks={2:ad551702:::test:head=[0,1]}, attrs_to_read=2:ad551702:::test:head)) v3

...

2024-02-07T12:12:42.961+0000 7f5227659640 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'24118 (23'24101,23'24118] local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/27 les/c/f=31/28/0 sis=30) [3,0,2]/[3,1,2]p3(0) r=1 lpr=30 pi=[27,30)/1 luod=0'0 crt=23'24118 mlcod 0'0 active+remapped mbc={}] _handle_message: MOSDECSubOpRead(2.0s1 31/30 ECSubRead(tid=16239, to_read={2:ad551702:::test:head=0,524288,0}, subchunks={2:ad551702:::test:head=[0,1]}, attrs_to_read=2:ad551702:::test:head)) v3
2024-02-07T12:12:42.961+0000 7f5227659640 15 bluestore(/var/lib/ceph/osd/ceph-1) read 2.0s1_head 1#2:ad551702:::test:head# 0x0~80000
2024-02-07T12:12:42.961+0000 7f5227659640 20 bluestore(/var/lib/ceph/osd/ceph-1).collection(2.0s1_head 0x55d0fc62d4a0) get_onode oid 1#2:ad551702:::test:head# key 0x818000000000000002AD551702'!test!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F
2024-02-07T12:12:42.961+0000 7f5227659640 20 bluestore(/var/lib/ceph/osd/ceph-1).collection(2.0s1_head 0x55d0fc62d4a0)  r 0 v.len 415
2024-02-07T12:12:42.961+0000 7f5227659640 20 bluestore.blob(0x55d0ff704000) get_ref 0x0~1000 Blob(0x55d0ff704000 blob([0x12f000~1000] llen=0x1000 csum crc32c/0x1000/4) use_tracker(0x0 0x0) SharedBlob(0x55d0fcd0bcc0 sbid 0x0))
2024-02-07T12:12:42.961+0000 7f5227659640 20 bluestore.blob(0x55d0ff704000) get_ref init 0x1000, 1000
2024-02-07T12:12:42.961+0000 7f5227659640 20 bluestore.OnodeSpace(0x55d0fc62d5e0 in 0x55d0fc5ec000) add_onode 1#2:ad551702:::test:head# 0x55d0ff6fb8c0
2024-02-07T12:12:42.961+0000 7f5227659640 20 _add 0x55d0fc5ec000 1#2:ad551702:::test:head# added, num=21488
2024-02-07T12:12:42.961+0000 7f5227659640 20 bluestore(/var/lib/ceph/osd/ceph-1) _do_read 0x0~80000 size 0x1000 (4096)

But as the deletion is in-progress between delete sleeps and before the PG can be removed completely, advance map
and activate map to the next epoch resets the PG and the ongoing PG Deletion is terminated:

2024-02-07T12:13:18.461+0000 7f5227659640 10 osd.1 pg_epoch: 33 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=32 pruub=9.050392151s) [3,0,2]p3(0) r=-1 lpr=32 DELETING pi=[30,32)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY pruub 158.980331421s@ mbc={}] handle_advance_map: 34
2024-02-07T12:13:18.461+0000 7f5227659640 20 osd.1:0.update_pg_epoch 2.0s1 33 -> 34
...
2024-02-07T12:13:18.461+0000 7f5227659640  5 osd.1 pg_epoch: 34 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=32 pruub=9.050392151s) [3,0,2]p3(0) r=-1 lpr=32 pi=[30,32)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY pruub 158.980331421s@ mbc={}] exit Started/ToDelete 9.032702 0 0.000000

Since osd.1 is marked OUT, a new peer (osd.0) is added to the 'up' and 'acting' set:

2024-02-07T12:13:30.211+0000 7f5234e74640 10 osd.1 37 _add_heartbeat_peer: new peer osd.0 v1:172.21.15.27:6815/3663464490 v1:172.21.15.27:6814/3663464490
2024-02-07T12:13:30.211+0000 7f5227659640 10 osd.1 pg_epoch: 36 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=34) [3,NONE,2]p3(0) r=-1 lpr=34 pi=[30,34)/2 crt=23'24118 mlcod 0'0 unknown NOTIFY pruub 158.980331421s@ mbc={}] PeeringState::advance_map handle_advance_map [3,0,2]/[3,0,2] -- 3/3
...
2024-02-07T12:13:30.211+0000 7f5227659640  5 osd.1 pg_epoch: 37 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=37) [3,0,2]p3(0) r=-1 lpr=37 pi=[30,37)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY mbc={}] enter Started/Stray

Therefore, PG 2.0s1 on osd.1 enters the Started/Stray state. The PG 2.0s0 on the
primary osd.3 is in active+clean state. In this state, the test script initiates
the next test which performs the following:

"remove test object hash info from non-primary osd.0 shard and test backfill"

As part of the test, hinfo of a test object is corrupted on osd.0 and osd.1
is marked UP to trigger and rebalance/backfill. But before this, the primary osd.3
as part of peering instructs osd.1 to purge_strays by sending PGRemove to osd.1(1):

2024-02-07T12:13:31.222+0000 7fd6033a1640 10 osd.3 pg_epoch: 38 pg[2.0s0( v 23'24118 (23'24101,23'24118] local-lis/les=37/38 n=24118 ec=15/15 lis/c=37/37 les/c/f=38/38/0 sis=37) [3,0,2]p3(0) r=0 lpr=37 crt=23'24118 mlcod 0'0 active+clean] PeeringState::purge_strays purge_strays 1(1)
2024-02-07T12:13:31.222+0000 7fd6033a1640 10 osd.3 pg_epoch: 38 pg[2.0s0( v 23'24118 (23'24101,23'24118] local-lis/les=37/38 n=24118 ec=15/15 lis/c=37/37 les/c/f=38/38/0 sis=37) [3,0,2]p3(0) r=0 lpr=37 crt=23'24118 mlcod 0'0 active+clean] PeeringState::purge_strays sending PGRemove to osd.1(1)
...
2024-02-07T12:13:31.222+0000 7fd6033a1640  1 -- v1:172.21.15.27:6805/3970993654 --> v1:172.21.15.27:6809/1081303851 -- osd pg remove(epoch 38; pg2.0s1; ) v3 -- 0x55e5114a54a0 con 0x55e51b60d000
2024-02-07T12:13:31.222+0000 7fd6033a1640 10 osd.3 pg_epoch: 38 pg[2.0s0( v 23'24118 (23'24101,23'24118] local-lis/les=37/38 n=24118 ec=15/15 lis/c=37/37 les/c/f=38/38/0 sis=37) [3,0,2]p3(0) r=0 lpr=37 crt=23'24118 mlcod 0'0 active+clean] remove_stray_recovery_sources remove osd 1(1) from missing_loc

osd.1 receives the osd remove pg 2.0s1 message and initiates the PGDeletion process:

2024-02-07T12:13:31.223+0000 7f5244088640  1 -- v1:172.21.15.27:6809/1081303851 <== osd.3 v1:172.21.15.27:6805/3970993654 24148 ==== osd pg remove(epoch 38; pg2.0s1; ) v3 ==== 32+0+0 (unknown 3731761614 0 0) 0x55d0fcdea6e0 con 0x55d0fd21bc00
2024-02-07T12:13:31.223+0000 7f5244088640  7 osd.1 38 handle_fast_pg_remove osd pg remove(epoch 38; pg2.0s1; ) v3 from osd.3
2024-02-07T12:13:31.223+0000 7f5244088640 15 osd.1 38 enqueue_peering_evt 2.0s1 epoch_sent: 38 epoch_requested: 38 DeleteStart

The PG Deletion process is in progress with the deletion being scheduled between sleeps as per osd_delete_sleep_ssd (1s)
(Therefore, PG deletion with WPQ scheduler takes multiple seconds to complete):

2024-02-07T12:13:34.231+0000 7f522b661640 20 osd.1 pg_epoch: 38 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=37) [3,0,2]p3(0) r=-1 lpr=37 DELETING pi=[30,37)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY mbc={}] do_delete_work Delete scheduled at 2024-02-07T12:13:35.233026+0000

But during the PG Deletion process, osd.1 is brought UP:

2024-02-07T12:13:34.237 INFO:teuthology.orchestra.run.smithi027.stderr:marked in osd.1.

PG 2.0s1 is advanced to the new osdmap 39. As part of this, the PG is reset which results
in the termination of the PGDeletion operation on the PG:

2024-02-07T12:13:34.232+0000 7f522b661640 20 osd.1 pg_epoch: 39 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=37) [3,0,2]p3(0) r=-1 lpr=37 DELETING pi=[30,37)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY mbc={}] PeeringState::should_restart_peering new interval newup [3,1,2] newacting [3,1,2]
2024-02-07T12:13:34.232+0000 7f522b661640 10 osd.1 pg_epoch: 39 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=37) [3,0,2]p3(0) r=-1 lpr=37 DELETING pi=[30,37)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY mbc={}] state<Started>: should_restart_peering, transitioning to Reset
2024-02-07T12:13:34.232+0000 7f522b661640  5 osd.1 pg_epoch: 39 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=37) [3,0,2]p3(0) r=-1 lpr=37 DELETING pi=[30,37)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY mbc={}] exit Started/ToDelete/Deleting 3.009682 9 0.002573
2024-02-07T12:13:34.232+0000 7f522b661640  5 osd.1 pg_epoch: 39 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=30/30 les/c/f=31/31/0 sis=37) [3,0,2]p3(0) r=-1 lpr=37 pi=[30,37)/1 crt=23'24118 mlcod 0'0 unknown NOTIFY mbc={}] exit Started/ToDelete 3.009817 0 0.000000

PG 2.0s1 after advmap is now in the following state without any missing/degraded objects:

2024-02-07T12:13:34.236+0000 7f522b661640  1 -- v1:172.21.15.27:6809/1081303851 --> v1:172.21.15.27:6805/3970993654 -- pg_notify2(2.0s0 2.0s0 (query:39 sent:39 2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=37/37 les/c/f=38/38/0 sis=39 pruub=12.985957146s) 1->0 ([30,38] all_participants={0(1), 1(1), 2(2), 3(0)} intervals=[([34,36] acting={2(2), 3(0)}), ([37,38] acting={0(1), 2(2), 3(0)})])) e39/39) v1 -- 0x55d1038e9c00 con 0x55d0fd21bc00

...

2024-02-07T12:13:36.213+0000 7f522b661640 10 osd.1 pg_epoch: 41 pg[2.0s1( v 23'24118 (23'24101,23'24118] lb MIN local-lis/les=30/31 n=24118 ec=15/15 lis/c=37/37 les/c/f=38/38/0 sis=40 pruub=11.009130478s) [3,1,2]/[3,0,2]p3(0) r=-1 lpr=40 pi=[30,40)/2 crt=23'24118 mlcod 0'0 remapped pruub 187.729278564s@ mbc={}] state<Started/ReplicaActive>: Activate Finished

The backfill/rebalance completes quickly since there are no missing/misplaced/degraded objects to recover.

2024-02-07T12:13:36.218+0000 7f5227659640  1 -- v1:172.21.15.27:6809/1081303851 --> v1:172.21.15.27:6805/3970993654 -- MBackfillReserve(2.0s0 GRANT e41/41) v5 -- 0x55d0ff223b00 con 0x55d0fd21bc00
...
2024-02-07T12:13:45.161+0000 7f5243086640  1 -- v1:172.21.15.27:6809/1081303851 <== osd.3 v1:172.21.15.27:6805/3970993654 25086 ==== MBackfillReserve(2.0s1 RELEASE e41/41) v5 ==== 46+0+0 (unknown 599379928 0 0) 0x55d10413d380 con 0x55d0fd21bc00
2024-02-07T12:13:45.161+0000 7f5243086640 15 osd.1 41 enqueue_peering_evt 2.0s1 epoch_sent: 41 epoch_requested: 41 RemoteReservationCanceled

The test passes without any cluster warning since the missing hinfo attribute was retrieved from osd.1(1) shard.

Behavior with mClockScheduler (Fails with cluster warning):¶

As before, the analysis starts from the following test:

"remove test object hash info from primary osd.3 shard and test backfill"

But in this case, the PG deletion at the end of the backfill phase completes in
under a second:

2024-02-07T12:13:34.082+0000 7f406b4d5640  1 -- v1:172.21.15.42:6801/1255668816 <== osd.3 v1:172.21.15.42:6805/329163986 25349 ==== osd pg remove(epoch 32; pg2.0s1; ) v3 ==== 32+0+0 (unknown 1404619636 0 0) 0x5560d69f9600 con 0x5560d88a8400
2024-02-07T12:13:34.082+0000 7f406b4d5640  7 osd.1 32 handle_fast_pg_remove osd pg remove(epoch 32; pg2.0s1; ) v3 from osd.3

...

2024-02-07T12:13:34.932+0000 7f404eaa6640 20 osd.1 op_wq(0) _process OpSchedulerItem(2.0s1 PGDelete(2.0s1 e32) class_id 1 prio 5 qos_cost 15728640 cost 15728640 e32) pg 0x5560d7a65000
2024-02-07T12:13:34.932+0000 7f404eaa6640 10 osd.1 pg_epoch: 32 pg[2.0s1( v 23'25328 (23'25255,23'25328] lb MIN local-lis/les=29/30 n=25328 ec=15/15 lis/c=29/29 les/c/f=30/30/0 sis=31 pruub=8.184023857s) [3,0,2]p3(0) r=-1 lpr=31 DELETING pi=[29,31)/1 crt=23'25328 mlcod 0'0 unknown NOTIFY pruub 130.088516235s@ mbc={}] do_peering_event: epoch_sent: 32 epoch_requested: 32 DeleteSome
2024-02-07T12:13:34.932+0000 7f404eaa6640 10 osd.1 pg_epoch: 32 pg[2.0s1( v 23'25328 (23'25255,23'25328] lb MIN local-lis/les=29/30 n=25328 ec=15/15 lis/c=29/29 les/c/f=30/30/0 sis=31 pruub=8.184023857s) [3,0,2]p3(0) r=-1 lpr=31 DELETING pi=[29,31)/1 crt=23'25328 mlcod 0'0 unknown NOTIFY pruub 130.088516235s@ mbc={}] do_delete_work

...

osd.1 completes the delete work in under a second shown below. Note that
this is the key difference in comparison to the WPQ scheduler.

2024-02-07T12:13:34.969+0000 7f404eaa6640 20 osd.1 32 try_finish_pg_delete 2.0s1 0x5560d7a65000
2024-02-07T12:13:34.969+0000 7f404eaa6640 10 osd.1:0._detach_pg 2.0s1 0x5560d7a65000

The next test is initiated after the PG 2.0s1 deletion on osd.1 and PG 2.0 is active+clean on osd.3 :

2024-02-07T12:13:44.544 INFO:tasks.ec_inconsistent_hinfo:remove test object hash info from non-primary osd.0 shard and test backfill

As part of this test, the test object's hinfo attribute is corrupted from a non-primary osd.0 shard and
osd.1 which was marked down in the previous test is brought UP:

PG 2.0s1 on osd.1 is empty at this point:

2024-02-07T12:13:46.108+0000 7f40582b9640 20 osd.1:0.identify_splits_and_merges identify_splits_and_merges 1 slots
2024-02-07T12:13:46.108+0000 7f40582b9640 20 osd.1:0.identify_splits_and_merges identify_splits_and_merges slot pgid:2.0s1slot:0x5560d7e3a160
2024-02-07T12:13:46.108+0000 7f40592bb640 20 bluestore.BufferSpace(0x5560dbc0cde0 in 0x5560d698a200) _clear
2024-02-07T12:13:46.108+0000 7f40582b9640 20 osd.1:0.identify_splits_and_merges identify_splits_and_merges slot 2.0s1 has no pg and waiting_for_split

...

2024-02-07T12:13:46.108+0000 7f40582b9640 20 osd.1:0.consume_map consume_map  2.0s1 empty, pruning

Later in the test, osd.1 is marked UP:

2024-02-07T12:13:57.182 INFO:teuthology.orchestra.run.smithi042.stderr:marked in osd.1.
2024-02-07T12:13:57.202 DEBUG:tasks.ec_inconsistent_hinfo:waiting for pg 2.0 backfill from osd.0 to osd.1 complete

pg2.0s1 on osd.1 is empty (due to deletion) as the pg_query response to osd.3 shown below:

2024-02-07T12:13:57.182+0000 7f404aa9e640 10 osd.1 37 handle_pg_query_nopg 2.0s1
2024-02-07T12:13:57.182+0000 7f404aa9e640 10 osd.1 37  pg 2.0s1 dne

Since the PG is empty or doesn't exist anymore, a new PG 2.0s1 is created via OSD::handle_pg_query_nopg()
_make_pg as part of pg_log query from primary osd.3:

2024-02-07T12:13:59.194+0000 7f406b4d5640  1 -- v1:172.21.15.42:6801/1255668816 <== osd.3 v1:172.21.15.42:6805/329163986 25354 ==== PGlog(2.0s1 log log((23'25323,23'25328], crt=23'25328) pi ([31,37] all_participants={0(1), 1(1), 2(2), 3(0)} intervals=[([33,35] acting={2(2), 3(0)})]) pg_lease(ru 0.000000000s ub 101.196739197s int 16.000000000s) e39/38) v6 ==== 185001+0+0 (unknown 3238922912 0 0) 0x5560d92b4000 con 0x5560d88a8400
2024-02-07T12:13:59.194+0000 7f406b4d5640 15 osd.1 39 enqueue_peering_evt 2.0s1 epoch_sent: 39 epoch_requested: 38 MLogRec from 3(0) log log((23'25323,23'25328], crt=23'25328) pi ([31,37] all_participants={0(1), 1(1), 2(2), 3(0)} intervals=[([33,35] acting={2(2), 3(0)})]) pg_lease(ru 0.000000000s ub 101.196739197s int 16.000000000s) +create_info
2024-02-07T12:13:59.194+0000 7f406b4d5640 20 osd.1 op_wq(0) _enqueue OpSchedulerItem(2.0s1> PGPeeringEvent(epoch_sent: 39 epoch_requested: 38 MLogRec from 3(0) log log((23'25323,23'25328], crt=23'25328) pi ([31,37] all_participants={0(1), 1(1), 2(2), 3(0)} intervals=[([33,35] acting={2(2), 3(0)})]) pg_lease(ru 0.000000000s ub 101.196739197s int 16.000000000s) +create_info) class_id 2 prio 255 cost 10 e39

The PG 2.0s1 is created but is in DNE state:

2024-02-07T12:13:59.195+0000 7f404aa9e640 10 osd.1 pg_epoch: 38 pg[2.0s1( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) []p? r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown mbc={}] PeeringState::init init role -1 up [3,1,2] acting [3,0,2] history ec=15/15 lis/c=33/31 les/c/f=34/32/0 sis=38 past_intervals ([31,37] all_participants={0(1), 1(1), 2(2), 3(0)} intervals=[([33,35] acting={2(2), 3(0)})])

The backfill phase to osd.1 is initiated by osd.3.

2024-02-07T12:13:59.210+0000 7f406b4d5640  1 -- v1:172.21.15.42:6801/1255668816 <== osd.3 v1:172.21.15.42:6805/329163986 25356 ==== MBackfillReserve(2.0s1 REQUEST prio: 110 e39/39) v5 ==== 46+0+0 (unknown 1921481940 0 0) 0x5560dfd02600 con 0x5560d88a8400
2024-02-07T12:13:59.210+0000 7f406b4d5640 15 osd.1 39 enqueue_peering_evt 2.0s1 epoch_sent: 39 epoch_requested: 39 RequestBackfillPrio: priority 110 primary bytes 207474518 local bytes 0

During the backfill to osd.1, osd.3 attempts to read the missing shards from osd.2
and osd.0. The read from osd.2 succeeds but fails as expected on osd.0 with error
since it is corrupted:

2024-02-07T12:14:35.132+0000 7effaa9aa640  5 osd.0 pg_epoch: 39 pg[2.0s1( v 23'25328 (23'25323,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) r=1 lpr=38 pi=[31,38)/1 luod=0'0 crt=23'25328 lcod 0'0 mlcod 0'0 active+remapped mbc={}] get_hash_info: getattr failed: (61) No data available
2024-02-07T12:14:35.132+0000 7effaa9aa640 -1 log_channel(cluster) log [ERR] : Corruption detected: object 2:ad551702:::test:head is missing hash_info
2024-02-07T12:14:35.132+0000 7effaa9aa640  5 osd.0 pg_epoch: 39 pg[2.0s1( v 23'25328 (23'25323,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) r=1 lpr=38 pi=[31,38)/1 luod=0'0 crt=23'25328 lcod 0'0 mlcod 0'0 active+remapped mbc={}] handle_sub_read: No hinfo for 2:ad551702:::test:head

At the same time, the primary osd.3 throws the following cluster warning:

2024-02-07T12:14:35.132+0000 7fed0072c640 20 osd.3 pg_epoch: 39 pg[2.0s0( v 23'25328 (23'25255,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) backfill=[1(1)] r=0 lpr=38 pi=[31,38)/1 crt=23'25328 mlcod 0'0 active+remapped+backfilling rops=1 mbc={0={},1={},2={}}] handle_sub_read_reply have shard=2
2024-02-07T12:14:35.132+0000 7fed0072c640 20 osd.3 pg_epoch: 39 pg[2.0s0( v 23'25328 (23'25255,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) backfill=[1(1)] r=0 lpr=38 pi=[31,38)/1 crt=23'25328 mlcod 0'0 active+remapped+backfilling rops=1 mbc={0={},1={},2={}}] handle_sub_read_reply have shard=0
2024-02-07T12:14:35.132+0000 7fed0072c640  0 log_channel(cluster) log [WRN] : Error(s) ignored for 2:ad551702:::test:head enough copies available
2024-02-07T12:14:35.132+0000 7fed0072c640 10 osd.3 pg_epoch: 39 pg[2.0s0( v 23'25328 (23'25255,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) backfill=[1(1)] r=0 lpr=38 pi=[31,38)/1 crt=23'25328 mlcod 0'0 active+remapped+backfilling rops=1 mbc={0={},1={},2={}}] handle_sub_read_reply Error(s) ignored for 2:ad551702:::test:head enough copies available

Since there are enough copies available (osd.3), the read continues and completes on the primary osd.3 (shard 0):

2024-02-07T12:14:35.132+0000 7fed0072c640 20 osd.3 pg_epoch: 39 pg[2.0s0( v 23'25328 (23'25255,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) backfill=[1(1)] r=0 lpr=38 pi=[31,38)/1 crt=23'25328 mlcod 0'0 active+remapped+backfilling rops=1 mbc={0={},1={},2={}}] handle_sub_read_reply Complete: ReadOp(tid=42514, to_read={2:ad551702:::test:head=read_request_t(to_read=[0,1048576,0], need={}, want_attrs=0)}, complete={2:ad551702:::test:head=read_result_t(r=0, errors={}, noattrs, returned=(0, 1048576, [2(2),4096, 3(0),4096]))}, priority=3, obj_to_source={2:ad551702:::test:head=0(1),2(2),3(0)}, source_to_obj={0(1)=2:ad551702:::test:head,2(2)=2:ad551702:::test:head,3(0)=2:ad551702:::test:head}, in_progress=)
...
2024-02-07T12:14:35.132+0000 7fed0072c640 10 osd.3 pg_epoch: 39 pg[2.0s0( v 23'25328 (23'25255,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) backfill=[1(1)] r=0 lpr=38 pi=[31,38)/1 crt=23'25328 mlcod 0'0 active+remapped+backfilling rops=1 mbc={0={},1={},2={}}] continue_recovery_op: READING return RecoveryOp(hoid=2:ad551702:::test:head v=18'2 missing_on=1(1) missing_on_shards=1 recovery_info=ObjectRecoveryInfo(2:ad551702:::test:head@18'2, size: 6047, copy_subset: [], clone_subset: {}, snapset: 0=[]:{}, object_exist: true) recovery_progress=ObjectRecoveryProgress(!first, data_recovered_to: 8192, data_complete: true, omap_recovered_to: , omap_complete: true, error: false) obc refcount=3 state=WRITING waiting_on_pushes=1(1) extent_requested=0,1048576)

The backfill process continues and completes on osd.1(1) as shown below.
All missing objects are recovered on osd.1(1) shard:

2024-02-07T12:14:35.133+0000 7fed0072c640 20 osd.3 pg_epoch: 39 pg[2.0s0( v 23'25328 (23'25255,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) backfill=[1(1)] r=0 lpr=38 pi=[31,38)/1 crt=23'25328 mlcod 0'0 active+remapped+backfilling rops=1 mbc={0={},1={},2={}}] PeeringState::update_calc_stats shard 1(1) objects 17186 missing 8142

...

2024-02-07T12:14:53.670+0000 7fed0072c640 20 osd.3 pg_epoch: 39 pg[2.0s0( v 23'25328 (23'25255,23'25328] local-lis/les=38/39 n=25328 ec=15/15 lis/c=38/31 les/c/f=39/32/0 sis=38) [3,1,2]/[3,0,2]p3(0) backfill=[1(1)] r=0 lpr=38 pi=[31,38)/1 crt=23'25328 mlcod 0'0 active+remapped mbc={0={},1={},2={}}] PeeringState::update_calc_stats shard 1(1) objects 25328 missing 0

The test reports 0 unfound objects after the backfill:

2024-02-07T12:15:04.697 DEBUG:tasks.ec_inconsistent_hinfo:there are 0 unfound objects

Therefore, the cluster warning pertaining to "enough copies available" is benign
and can therefore be added to the ignorelist.

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by Sridhar Seshasayee 2 months ago

Status changed from New to Fix Under Review
Pull request ID set to 55764

Actions

Copy link

Updated by Sridhar Seshasayee about 2 months ago

Status changed from Fix Under Review to Pending Backport
Backport set to reef

Actions

Copy link

Updated by Backport Bot about 2 months ago

Copied to Backport #64881: reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cluster log warning to ignorelist. added

Actions

Copy link

Updated by Backport Bot about 2 months ago

Tags set to backport_processed

Actions

Copy link

Updated by Sridhar Seshasayee about 1 month ago

Status changed from Pending Backport to Resolved

Actions

Copy link

Updated by Sridhar Seshasayee about 1 month ago

Backport changed from reef to reef, squid

Actions

Copy link

Updated by Sridhar Seshasayee about 1 month ago

Status changed from Resolved to Pending Backport

Actions

Copy link

Updated by Sridhar Seshasayee about 1 month ago

Backport changed from reef, squid to squid, reef

Actions

Copy link

Updated by Sridhar Seshasayee about 1 month ago

Copied to Backport #65151: squid: singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cluster log warning to ignorelist. added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Fix #64573

singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cluster log warning to ignorelist.

ANALYSIS¶

Behavior with WPQ Scheduler:(No cluster warning)¶

Behavior with mClockScheduler (Fails with cluster warning):¶

Updated by Sridhar Seshasayee 2 months ago

Updated by Sridhar Seshasayee about 2 months ago

Updated by Backport Bot about 2 months ago

Updated by Backport Bot about 2 months ago

Updated by Sridhar Seshasayee about 1 month ago

Updated by Sridhar Seshasayee about 1 month ago

Updated by Sridhar Seshasayee about 1 month ago

Updated by Sridhar Seshasayee about 1 month ago

Updated by Sridhar Seshasayee about 1 month ago