Project

General

Profile

Bug #40117

Updated by Samuel Just almost 5 years ago

osd.9 requests a switch to acting set=[5] from [9,5] which never shows up.    The teuthology test hangs waiting for that pg to go active+clean.    Below is the mon output from the receipt of the pg_temp message message.    I've attached the osd.9 logs grepped for pg[15.17( as well as the section from osd.9 requesting the acting change: ceph-mon.c log. 

 The suite output is sjust-2019-06-02_20:15:42-rados-wip-sjust-crimson-peering-distro-basic-smithi/3999331 

 <pre> 
 2019-06-03 04:59:20.564 7f6db6915700    1 -- v1:172.21.15.83:6789/0 <== osd.9 v1:172.21.15.83:6808/44419 382 ==== osd_pgtemp(e2675 {15.17=[5]} v2675) v2 ==== 52+0+0 (unknown 1235329701 0 0) 0x5582f1b69a00 con 0x5582f487ad00 
 2019-06-03 04:59:20.564 7f6db6915700 20 mon.c@0(leader) e1 _ms_dispatch existing session 0x5582eb9e3600 for osd.9 
 2019-06-03 04:59:20.564 7f6db6915700 20 mon.c@0(leader) e1    caps allow * 
 2019-06-03 04:59:20.564 7f6db6915700 10 mon.c@0(leader).paxosservice(osdmap 1501..2675) dispatch 0x5582f1b69a00 osd_pgtemp(e2675 {15.17=[5]} v2675) v2 from osd.9 v1:172.21.15.83:6808/44419 con 0x5582f487ad00 
 2019-06-03 04:59:20.564 7f6db6915700    5 mon.c@0(leader).paxos(paxos active c 6778..7521) is_readable = 1 - now=2019-06-03 04:59:20.569798 lease_expire=2019-06-03 04:59:25.536743 has v0 lc 7521 
 2019-06-03 04:59:20.564 7f6db6915700 10 mon.c@0(leader).osd e2675 preprocess_query osd_pgtemp(e2675 {15.17=[5]} v2675) v2 from osd.9 v1:172.21.15.83:6808/44419 
 2019-06-03 04:59:20.564 7f6db6915700 10 mon.c@0(leader).osd e2675 preprocess_pgtemp osd_pgtemp(e2675 {15.17=[5]} v2675) v2 
 2019-06-03 04:59:20.564 7f6db6915700 20 is_capable service=osd command= exec addr v1:172.21.15.83:6808/44419 on cap allow * 
 2019-06-03 04:59:20.564 7f6db6915700 20    allow so far , doing grant allow * 
 2019-06-03 04:59:20.564 7f6db6915700 20    allow all 
 2019-06-03 04:59:20.564 7f6db6915700 20 mon.c@0(leader).osd e2675    15.17[] -> [5] 
 2019-06-03 04:59:20.564 7f6db6915700    7 mon.c@0(leader).osd e2675 prepare_update osd_pgtemp(e2675 {15.17=[5]} v2675) v2 from osd.9 v1:172.21.15.83:6808/44419 
 2019-06-03 04:59:20.564 7f6db6915700    7 mon.c@0(leader).osd e2675 prepare_pgtemp e2675 from osd.9 v1:172.21.15.83:6808/44419 
 2019-06-03 04:59:20.564 7f6db6915700 10 mon.c@0(leader).osd e2675 should_propose 
 2019-06-03 04:59:20.564 7f6db6915700 10 mon.c@0(leader).paxosservice(osdmap 1501..2675)    proposal_timer already set 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).paxosservice(mgrstat 1751..1961) propose_pending 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).mgrstat    1962 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader) e1 log_health updated 0 previous 0 
 2019-06-03 04:59:20.624 7f6db911a700    5 mon.c@0(leader).paxos(paxos active c 6778..7521) queue_pending_finisher 0x5582f7b84800 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).paxos(paxos active c 6778..7521) trigger_propose active, proposing now 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).paxos(paxos active c 6778..7521) propose_pending 7522 10485 bytes 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).paxos(paxos updating c 6778..7521) begin for 7522 10485 bytes 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).paxos(paxos updating c 6778..7521)    sending begin to mon.1 
 2019-06-03 04:59:20.624 7f6db911a700    1 -- v1:172.21.15.83:6789/0 send_to--> mon v1:172.21.15.103:6789/0 -- paxos(begin lc 7521 fc 0 pn 500 opn 0) v4 -- ?+0 0x5582ec128c00 
 2019-06-03 04:59:20.624 7f6db911a700    1 -- v1:172.21.15.83:6789/0 --> v1:172.21.15.103:6789/0 -- paxos(begin lc 7521 fc 0 pn 500 opn 0) v4 -- 0x5582ec128c00 con 0x5582e9d5bf80 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).paxos(paxos updating c 6778..7521)    sending begin to mon.2 
 2019-06-03 04:59:20.624 7f6db911a700    1 -- v1:172.21.15.83:6789/0 send_to--> mon v1:172.21.15.200:6789/0 -- paxos(begin lc 7521 fc 0 pn 500 opn 0) v4 -- ?+0 0x5582eb1a9600 
 2019-06-03 04:59:20.624 7f6db911a700    1 -- v1:172.21.15.83:6789/0 --> v1:172.21.15.200:6789/0 -- paxos(begin lc 7521 fc 0 pn 500 opn 0) v4 -- 0x5582eb1a9600 con 0x5582e9d5c400 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).paxosservice(logm 3022..3661) propose_pending 
 2019-06-03 04:59:20.624 7f6db911a700 10 mon.c@0(leader).log v3661 encode_full log v 3661 
 </pre> 

 <pre> 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] choos 
 e_acting all_info osd.5 15.17( v 2674'3903 (0'0,2674'3903] local-lis/les=2638/2639 n=2508 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] choos 
 e_acting all_info osd.6 15.17( v 2674'3903 (0'0,2674'3903] local-lis/les=2638/2639 n=2508 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] choos 
 e_acting all_info osd.8 15.17( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0/0/0 0/0/0) 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] choos 
 e_acting all_info osd.9 15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] calc_ 
 replicated_acting newest update on osd.5 with 15.17( v 2674'3903 (0'0,2674'3903] local-lis/les=2638/2639 n=2508 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) 
 2019-06-03 04:59:20.564 7f3b7848b700 20 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] choos 
 e_async_recovery_replicated candidates by cost are: 3903,9 
 2019-06-03 04:59:20.564 7f3b7848b700 20 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] choos 
 e_async_recovery_replicated result want=[5] async_recovery=9 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] choos 
 e_acting want [5] != acting [9,8], requesting pg_temp change 
 2019-06-03 04:59:20.564 7f3b7848b700    5 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] exit  
 Started/Primary/Peering/GetLog 0.000134 0 0.000000 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] state 
 <Started/Primary/Peering>: Leaving Peering 
 2019-06-03 04:59:20.564 7f3b7848b700    5 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 peering mbc={}] exit  
 Started/Primary/Peering 0.003482 0 0.000000 
 2019-06-03 04:59:20.564 7f3b7848b700    5 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] exit  
 Started/Primary 0.003498 0 0.000000 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] clear 
 _primary_state 
 2019-06-03 04:59:20.564 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] relea 
 se_backoffs [15:e8000000::::0,15:f0000000::::head) 
 2019-06-03 04:59:20.564 7f3b7848b700 20 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] agent 
 _stop 
 2019-06-03 04:59:20.564 7f3b7848b700    5 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] enter 
  Started/Primary 
 2019-06-03 04:59:20.564 7f3b7848b700    5 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] enter Started/Primary/WaitActingChange 
 2019-06-03 04:59:20.564 7f3b7c493700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] do_peering_event: epoch_sent: 2675 epoch_requested: 2675 MNotifyRec 15.17 from 6 notify: (query:2675 sent:2675 15.17( v 2674'3903 (0'0,2674'3903] local-lis/les=2638/2639 n=2508 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675)) features: 0x3f01cfb8ffacffff ([2638,2674] intervals=([2638,2674] acting 5,6)) +create_info 
 2019-06-03 04:59:20.564 7f3b7c493700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] state<Started/Primary/WaitActingChange>: In WaitActingChange, ignoring MNotifyRec 
 2019-06-03 04:59:20.664 7f3b7848b700 10 osd.9 2675 dequeue_op 0x562b9ac0edc0 prio 63 cost 65536 latency 0.000147 osd_op(client.108494.0:79368 15.17 15.c0b0eb57 (undecoded) ondisk+write+known_if_redirected e2675) v8 pg pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] 
 2019-06-03 04:59:20.664 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] can_handle_while_inactive: 0x562b9ac0edc0 
 2019-06-03 04:59:20.668 7f3b7c493700 10 osd.9 2675 dequeue_op 0x562bb07749a0 prio 63 cost 65536 latency 0.000147 osd_op(client.108494.0:79375 15.17 15.56224037 (undecoded) ondisk+write+known_if_redirected e2675) v8 pg pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] 
 2019-06-03 04:59:20.668 7f3b7c493700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] can_handle_while_inactive: 0x562bb07749a0 
 2019-06-03 04:59:21.540 7f3b7848b700 10 osd.9 pg_epoch: 2675 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] handle_advance_map [9,8]/[9,8] -- 9/9 
 2019-06-03 04:59:21.540 7f3b7848b700 10 osd.9 pg_epoch: 2676 pg[15.17( empty local-lis/les=0/0 n=0 ec=2638/2630 lis/c 2638/2638 les/c/f 2639/2639/0 2675/2675/2675) [9,8] r=0 lpr=2675 pi=[2638,2675)/1 crt=0'0 mlcod 0'0 unknown mbc={}] state<Started/Primary/WaitActingChange>: verifying no want_acting [] targets didn't go down 
 </pre>

Back