Project

General

Profile

Bug #65491

Updated by Matan Breizman about 1 month ago

<pre><code class="diff"> 
 2024-04-11T12:39:43.111 INFO:tasks.rados.rados.0.smithi112.stderr:Error: racing read on 65 returned version 12 rather than version 11 
 </code></pre> 

 See requests 16777355 and 16777354 from osd.1: 

 Req 16777355 in in 'process_op' (checking already_complete) before 16777354 completing handle: 
 <pre><code class="diff"> 
 12:39:31,074 v 18'9 active+clean             ClientRequest::with_pg_int: client_request(id=16777353).0: pg active, entering process[_pg]_op 
 12:39:31,074 v 18'9 active+clean             ClientRequest::with_pg_int: client_request(id=16777353).0: pg active, entering process[_pg]_op 
 12:39:31,077 v 18'9 active+clean             ClientRequest::with_pg_int: client_request(id=16777354).0: pg active, entering process[_pg]_op 
 12:39:31,077 v 18'9 active+clean             ClientRequest::with_pg_int: client_request(id=16777355).0: pg active, entering process[_pg]_op 
 12:39:31,077 v 18'9 active+clean             ClientRequest::with_pg_int: client_request(id=16777355).0: pg active, entering process[_pg]_op 
 12:39:31,152 v 18'10 active+clean            ClientRequest::process_op:    client_request(id=16777353).0: client_request(id=16777353 checking already_complete 
 12:39:31,153 v 18'10 active+clean            ClientRequest::process_op:    client_request(id=16777354).0: client_request(id=16777354 checking already_complete 
 12:39:31,153 v 18'10 active+clean            ClientRequest::process_op:    client_request(id=16777353).0: entering process stage 
 12:39:31,154 v 18'10 active+clean            ClientRequest::process_op:    client_request(id=16777353).0: client_request(id=16777353 in process stage, calling do_process 
 12:39:31,220 v 18'11 active+clean            ClientRequest::with_pg_int: client_request(id=16777353).0: client_request(id=16777353 process[_pg]_op complete, completing handle 
 12:39:31,220 v 18'11 active+clean            ClientRequest::with_pg_int: client_request(id=16777353).0: client_request(id=16777353 process[_pg]_op complete, completing handle 
 12:39:31,221 v 18'11 active+clean            ClientRequest::with_pg_int: client_request(id=16777353).0: exit 
 12:39:31,221 v 18'11 active+clean            ClientRequest::process_op:    client_request(id=16777355).0: client_request(id=16777355 checking already_complete 
 12:39:31,221 v 18'11 active+clean            ClientRequest::process_op:    client_request(id=16777355).0: client_request(id=16777355 checking already_complete 
 12:39:31,221 v 18'11 active+clean            ClientRequest::process_op:    client_request(id=16777354).0: entering process stage 
 12:39:31,222 v 18'11 active+clean            ClientRequest::process_op:    client_request(id=16777354).0: client_request(id=16777354 in process stage, calling do_process 

 *************************************************** 
 ***** OSD.3 (not this) was killed and revived ***** 

 12:39:31.250 INFO:tasks.thrashosds.thrasher:Killing osd 3, live_osds are [0, 1, 2, 3] 
 12:39:36.322 INFO:tasks.thrashosds.thrasher:Reviving osd 3 

 *************************************************** 
 *************************************************** 

 12:39:40,928 v 18'12 unknown                 ClientRequest::Orderer::requeue: requeueing client_request(id=16777354) 
 12:39:40,928 v 18'12 unknown                 ClientRequest::Orderer::requeue: requeueing client_request(id=16777354) 
 12:39:40,929 v 18'12 unknown                 ClientRequest::Orderer::requeue: requeueing client_request(id=16777355) 
 12:39:40,935 v 18'12 peering                 ClientRequest::with_pg_int: client_request(id=16777355).0: interrupted due to crimson::common::actingset_changed (acting set changed) 

 12:39:40,937 v 18'12 peering                 ClientRequest::with_pg_int: client_request(id=16777354).0: interrupted due to crimson::common::actingset_changed (acting set changed) 
 12:39:40,937 v 18'12 peering                 ClientRequest::with_pg_int: client_request(id=16777354).0: interrupted due to crimson::common::actingset_changed (acting set changed) 
 12:39:40,956 v 18'12 peering                 ClientRequest::with_pg_int: client_request(id=16777354).0: exit 

 12:39:41,838 v 18'12 unknown                 ClientRequest::Orderer::requeue: requeueing client_request(id=16777354) 
 12:39:41,839 v 18'12 unknown                 ClientRequest::Orderer::requeue: requeueing client_request(id=16777355) 

 12:39:41,857 v 18'12 unknown                 ClientRequest::with_pg_int: client_request(id=16777354).1: interrupted due to crimson::common::actingset_changed (acting set changed) 
 12:39:41,858 v 18'12 unknown                 ClientRequest::with_pg_int: client_request(id=16777354).1: exit 

 12:39:41,871 v 18'12 peering                 ClientRequest::with_pg_int: client_request(id=16777355).1: interrupted due to crimson::common::actingset_changed (acting set changed) 
 12:39:41,872 v 18'12 peering                 ClientRequest::with_pg_int: client_request(id=16777355).1: exit 

 12:39:42,985 v 18'12 active+degraded         ClientRequest::with_pg_int: client_request(id=16777354).2: pg active, entering process[_pg]_op 
 12:39:42,986 v 18'12 active+degraded         ClientRequest::with_pg_int: client_request(id=16777355).2: pg active, entering process[_pg]_op 
 12:39:43,104 v 18'12 active+recovery_wait    ClientRequest::process_op:    client_request(id=16777354).2: client_request(id=16777354 checking already_complete 
 12:39:43,105 v 18'12 active+recovery_wait    ClientRequest::with_pg_int: client_request(id=16777354).2: client_request(id=16777354 process[_pg]_op complete, completing handle 
 12:39:43,106 v 18'12 active+recovery_wait    ClientRequest::process_op:    client_request(id=16777355).2: client_request(id=16777355 checking already_complete 
 12:39:43,106 v 18'12 active+recovery_wait    ClientRequest::with_pg_int: client_request(id=16777354).2: exit 
 12:39:43,106 v 18'12 active+recovery_wait    ClientRequest::process_op:    client_request(id=16777355).2: entering process stage 
 12:39:43,107 v 18'12 active+recovery_wait    ClientRequest::process_op:    client_request(id=16777355).2: client_request(id=16777355 in process stage, calling do_process 
 12:39:43,112 v 18'12 active+recovery_wait    ClientRequest::with_pg_int: client_request(id=16777355).2: client_request(id=16777355 process[_pg]_op complete, completing handle 
 12:39:43,113 v 18'12 active+recovery_wait    ClientRequest::with_pg_int: client_request(id=16777355).2: exit 
 </code></pre> 

 osd.1 handling the read req: 

 <pre><code class="diff"> 
 DEBUG 2024-04-11 12:39:43,107 [shard 2:main] osd - do_osd_ops_execute: object 3:414dc9d4:::smithi11232367-65:head - handling op read 
 </code></pre> 

 -See: See: https://gist.github.com/Matan-B/02e88c375760c1cb29f87d39de9122c9 
 - 
 

 https://pulpito.ceph.com/matan-2024-04-11_11:23:27-crimson-rados-wip-matanb-crimson-replicated-testing-distro-crimson-smithi/7652230 
 https://pulpito.ceph.com/matan-2024-04-11_11:23:27-crimson-rados-wip-matanb-crimson-replicated-testing-distro-crimson-smithi/7652234/ 


Back