Project

General

Profile

Actions

Bug #62119

open

timeout on reserving replicsa

Added by Ronen Friedman 10 months ago. Updated 3 months ago.

Status:
In Progress
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(or timeout on reserving replicas)

PGScrubResourcesOK event was queued but never handled.

See http://pulpito.front.sepia.ceph.com/rfriedma-2023-07-22_17:48:57-rados:thrash-wip-rf-cephx-init-distro-default-smithi/7348140/.

Actions #1

Updated by Ronen Friedman 10 months ago

  • Description updated (diff)

scheduler issue?

Actions #2

Updated by Neha Ojha 10 months ago

  • Assignee set to Aishwarya Mathuria
  • Priority changed from Normal to High
Actions #3

Updated by Radoslaw Zarzynski 10 months ago

  • Severity changed from 3 - minor to 1 - critical

Important but not urgent.

Actions #4

Updated by Aishwarya Mathuria 10 months ago

Reproduced in : /a/amathuri-2023-07-28_11:16:38-rados:thrash-main-distro-default-smithi/7355161/ with debug_osd 30 logs.

Actions #5

Updated by Aishwarya Mathuria 10 months ago

PGScrubResourcesOK gets queued but is dequeued very late (almost 6 seconds later). This is causing replica reservations to fail since osd_scrub_reservation_timeout is set to 5 seconds.
Looking into why this is happening.

2023-07-28T12:26:09.182+0000 7f610f7ff640 20 osd.5 op_wq(1) _enqueue OpSchedulerItem(4.1 PGScrubResourcesOK(pgid=4.1 epoch_queued=156 scrub-token=0) class_id 1 prio 5 cost 409600 e156) 

2023-07-28T12:26:14.130+0000 7f61260e2640  0 log_channel(cluster) log [WRN] : osd.5 PgScrubber: 4.1 timeout on reserving replicsa (since 2023-07-28T12:26:09.062532+0000) 
2023-07-28T12:26:14.130+0000 7f61260e2640  1 osd.5 ep: 156 scrubber::ReplicaReservations pg[4.1]: handle_no_reply_timeout: timeout! no reply from [] 
2023-07-28T12:26:14.130+0000 7f61260e2640 15 osd.5 159 queue a scrub event (PGScrubDenied(pgid=4.1epoch_queued=159 scrub-token=0)) for pg[4.1( v 159'3534 (47'501,159'3534] local-lis/les=39/40 n=766 ec=24/24 lis/c=39/39 les/c/f=40/41/0 sis=39) [5,4] r=0 lpr=39 luod=159'3526 crt=159'3534 lcod 159'3525 mlcod 159'3525 active+clean+scrubbing [ 4.1:  ] ]. Epoch: 159 
2023-07-28T12:26:14.130+0000 7f61260e2640 20 osd.5 op_wq(1) _enqueue OpSchedulerItem(4.1 PGScrubDenied(pgid=4.1 epoch_queued=159 scrub-token=0) class_id 1 prio 5 cost 409600 e159)

2023-07-28T12:26:15.858+0000 7f610f7ff640 20 osd.5 op_wq(1) _process OpSchedulerItem(4.1 PGScrubDenied(pgid=4.1epoch_queued=159 scrub-token=0) class_id 1 prio 5 qos_cost 695182 cost 409600 e159) queued 
2023-07-28T12:26:15.858+0000 7f6113807640 20 osd.5 op_wq(1) _process 4.1 to_process <OpSchedulerItem(4.1 PGScrubResourcesOK(pgid=4.1epoch_queued=156 scrub-token=0) class_id 1 prio 5 qos_cost 695182 cost 409600 e156),OpSchedulerItem(4.1 PGScrubDenied(pgid=4.1epoch_queued=159 scrub-token=0) class_id 1 prio 5 qos_cost 695182 cost 409600 e159)> waiting <> waiting_peering {} 
2023-07-28T12:26:15.858+0000 7f6113807640 20 osd.5 op_wq(1) _process OpSchedulerItem(4.1 PGScrubResourcesOK(pgid=4.1epoch_queued=156 scrub-token=0) class_id 1 prio 5 qos_cost 695182 cost 409600 e156) pg 0x564fcbbd6000 
2023-07-28T12:26:15.858+0000 7f610f7ff640 20 osd.5 op_wq(1) _process 4.1 to_process <OpSchedulerItem(4.1 PGOpItem(op=osd_op(client.4683.0:14215 4.1 4.f8a0ef5d (undecoded) ondisk+read+rwordered+known_if_redirected+supports_pool_eio e159) v8) class_id 3 prio 63 qos_cost 285583 cost 0 e159),OpSchedulerItem(4.1 PGScrubResourcesOK(pgid=4.1epoch_queued=156 scrub-token=0) class_id 1 prio 5 qos_cost 695182 cost 409600 e156)> waiting <> waiting_peering {} 
2023-07-28T12:26:15.858+0000 7f610f7ff640 20 osd.5 op_wq(1) _process 4.1 to_process <OpSchedulerItem(4.1 PGScrubResourcesOK(pgid=4.1epoch_queued=156 scrub-token=0) class_id 1 prio 5 qos_cost 695182 cost 409600 e156)> waiting <> waiting_peering {}
Actions #6

Updated by Radoslaw Zarzynski 10 months ago

  • Status changed from New to In Progress
Actions #7

Updated by Laura Flores 9 months ago

/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369280
/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369338

Actions #8

Updated by Radoslaw Zarzynski 8 months ago

Bumping this up.

Actions #9

Updated by Radoslaw Zarzynski 8 months ago

Per the core sync on 20 Sep Aishwarya is working on this.

Actions #11

Updated by Aishwarya Mathuria 6 months ago

The logs of the recent failures indicate a different issue than the one the tracker was raised with.
osd.1 tries to reserve replicas:

1:  REQ_SCRUB ]  MUST_DEEP_SCRUB MUST_SCRUB TIME_FOR_DEEP planned REQ_SCRUB] scrubber<ReservingReplicas>: FSM: -- state -->> ReservingReplicas
2023-11-05T22:15:55.739+0000 2d2a0640 10 osd.1 pg_epoch: 191 pg[15.1( v 90'10 (0'0,90'10] local-lis/les=26/28 n=3 ec=26/26 lis/c=26/26 les/c/f=28/33/0 sis=26) [1,6] r=0 lpr=26 crt=90'9 lcod 90'9 mlcod 90'9 active+clean+scrubbing+deep [ 15.1:  REQ_SCRUB ]  MUST_DEEP_SCRUB MUST_SCRUB TIME_FOR_DEEP planned REQ_SCRUB] scrubber<ReservingReplicas>: reserve_replicas
2023-11-05T22:15:55.740+0000 2d2a0640 20 osd.1 192 send_message_osd_cluster MOSDScrubReserve to osd.6 from_epoch 191
2023-11-05T22:15:55.740+0000 2d2a0640 20 osd.1 192 get_nextmap_reserved map_reservations: {192=1}
2023-11-05T22:15:55.740+0000 2d2a0640 10 osd.1 192 send_incremental_map 191 -> 192 to 0x18fd8b60 v2:172.21.15.129:6820/86500
2023-11-05T22:15:55.744+0000 2d2a0640  1 -- [v2:172.21.15.49:6810/86413,v1:172.21.15.49:6811/86413] --> [v2:172.21.15.129:6820/86500,v1:172.21.15.129:6821/86500] -- MOSDScrubReserve(15.1 REQUEST e191) v1 -- 0xd064d00 con 0x18fd8b60
2023-11-05T22:15:55.744+0000 2d2a0640 20 osd.1 192 release_map epoch: 192
2023-11-05T22:15:55.744+0000 2d2a0640 10 osd.1 ep: 191 scrubber::ReplicaReservations pg[15.1]: ReplicaReservations: reserve 6
2023-11-05T22:15:55.744+0000 2d2a0640 20  scrubberFSM  event: --^^^^---- StartScrub
2023-11-05T22:15:55.744+0000 2d2a0640 10 osd.1 pg_epoch: 191 pg[15.1( v 90'10 (0'0,90'10] local-lis/les=26/28 n=3 ec=26/26 lis/c=26/26 les/c/f=28/33/0 sis=26) [1,6] r=0 lpr=26 crt=90'9 lcod 90'9 mlcod 90'9 active+clean+scrubbing+deep [ 15.1:  REQ_SCRUB ]  MUST_DEEP_SCRUB MUST_SCRUB TIME_FOR_DEEP planned REQ_SCRUB] scrubber<ReservingReplicas>: scrubber event --<< StartScrub

osd.1 replica reservation timeout occurs 5 seconds later:

2023-11-05T22:16:00.743+0000 13a63640  0 log_channel(cluster) log [WRN] : osd.1 ep: 191 scrubber::ReplicaReservations pg[15.1]:  timeout on replica reservations (since 2023-11-05 22:15:55)
2023-11-05T22:16:00.743+0000 13a63640  1 osd.1 ep: 191 scrubber::ReplicaReservations pg[15.1]: handle_no_reply_timeout: timeout! no reply from [6]

The MOSDScrubReserve request was sent to osd.6, from osd.6 logs we can see that it receives the request for reservation post timeout.

2023-11-05T22:16:10.865+0000 8c79640  1 -- [v2:172.21.15.129:6820/86500,v1:172.21.15.129:6821/86500] <== osd.1 v2:172.21.15.49:6810/86413 642 ==== MOSDScrubReserve(15.1 REQUEST e191) v1 ==== 43+0+0 (unknown 0 0 0) 0xc9183e0 con 0x18e49910
2023-11-05T22:16:10.865+0000 8c79640 15 osd.6 192 enqueue_op MOSDScrubReserve(15.1 REQUEST e191) v1 prio 127 type 92 cost 0 latency 0.000248 epoch 191 MOSDScrubReserve(15.1 REQUEST e191) v1
2023-11-05T22:16:10.865+0000 8c79640 20 osd.6 op_wq(1) _enqueue OpSchedulerItem(15.1 PGOpItem(op=MOSDScrubReserve(15.1 REQUEST e191) v1) class_id 2 prio 127 cost 0 e191)

It seems like osd.6 has gone down in between looking at the time difference in logs:

2023-11-05T22:15:55.615+0000 2eea3640 20 osd.6 op_wq(4) _process 62.4 to_process <> waiting <> waiting_peering {}
2023-11-05T22:16:10.783+0000 2eea3640 20 osd.6 op_wq(4) _process OpSchedulerItem(62.4 PGPeeringEvent(epoch_sent: 192 epoch_requested: 192 NullEvt) class_id 2 prio 255 cost 10 e192) queued
2023-11-05T22:16:10.783+0000 2eea3640 20 osd.6 op_wq(4) _process 62.4 to_process <OpSchedulerItem(62.4 PGPeeringEvent(epoch_sent: 192 epoch_request)

From mon logs:

2023-11-05T22:15:55.544+0000 11f46640  7 mon.a@0(leader).log v260 update_from_paxos applying incremental log 260 2023-11-05T22:15:54.447871+0000 mon.a (mon.0) 731 : cluster [INF] Health check cleared: SLOW_OPS (was: 2 slow ops, oldest one blocked for 53 sec, daemons [osd.6,osd.7] have slow ops.)

I will look into this further and probably raise a separate tracker for this issue as the event is not being queued and it doesn't look like the issue we saw with PGScrubResourcesOK being queued and not being dequeued.

Actions #12

Updated by Ronen Friedman 4 months ago

The cause of the long delay is still unclear to me, but it's easy to reproduce in rados:thrash testing.
The handling of a long delay introduced in reservation requests was buggy (https://tracker.ceph.com/issues/64052),
and is solved by https://github.com/ceph/ceph/pull/55217.

Actions #13

Updated by Ronen Friedman 3 months ago

@aishwarya - I think we can lower the severity, or maybe even close this bug.
It seems as though some specific tests cause messages to be delayed for many minutes.

Actions

Also available in: Atom PDF