Project

General

Profile

Bug #36096

Updated by Patrick Donnelly over 5 years ago

Our Ceph with version Mimic is crashing OSD nodes with the following error in the log: 

 <pre> 
 2018-09-20 02:09:49.160 7f28b1caa700    1 -- 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 137528 ==== MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135878, reqid=client.5246 
 4828.1:1451103, at_version=1002'211422, trim_to=918'208389, roll_forward_to=1002'211420)) v2 ==== 1402736+0+0 (1916543509 0 0) 0x5607398b0800 con 0x560738096e00 
   -158> 2018-09-20 02:09:49.160 7f289426c700    5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'211422, trimmed: , trimmed_dups: , clear_ 
 divergent_priors: 0 
   -157> 2018-09-20 02:09:49.160 7f28b2cac700    5 -- 192.168.31.115:6809/1731 >> 192.168.31.113:6805/1853 conn(0x560738097500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=383 cs=37 l=0). rx  
 osd.16 seq 118788 0x56073e2ed800 MOSDECSubOpWrite(6.24es4 1002/987 ECSubWrite(tid=174695, reqid=client.52464828.1:1451058, at_version=0'0, trim_to=907'100836, roll_forward_to=1002'103931)) v2 
   -156> 2018-09-20 02:09:49.160 7f28b2cac700    1 -- 192.168.31.115:6809/1731 <== osd.16 192.168.31.113:6805/1853 118788 ==== MOSDECSubOpWrite(6.24es4 1002/987 ECSubWrite(tid=174695, reqid=client.524 
 64828.1:1451058, at_version=0'0, trim_to=907'100836, roll_forward_to=1002'103931)) v2 ==== 873+0+0 (3852236216 0 0) 0x56073e2ed800 con 0x560738097500 
   -155> 2018-09-20 02:09:49.160 7f289fa83700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.113:6805/1853 -- MOSDECSubOpWriteReply(6.24es0 1002/987 ECSubWriteReply(tid=174695, last_complete=1002'103 
 931, committed=1, applied=1)) v2 -- 0x56071f3ab440 con 0 
   -154> 2018-09-20 02:09:49.164 7f289426c700    1 get compressor lz4 = 0x560738442780 
   -153> 2018-09-20 02:09:49.164 7f289fa83700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 -- MOSDECSubOpWriteReply(6.107s0 1002/980 ECSubWriteReply(tid=135878, last_complete=1002'211 
 422, committed=1, applied=1)) v2 -- 0x560723047200 con 0 
   -152> 2018-09-20 02:09:49.164 7f28b24ab700    5 -- 192.168.31.115:6808/1731 >> 192.168.31.11:0/1009857432 conn(0x56073a606000 :6808 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=151 cs=1 l=1).  
 rx client.52464828 seq 43192 0x56073dfe5040 osd_op(client.52464828.1:1451054 6.c5eabd3b 6:dcbd57a3:::10000243572.000026a1:head [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e1002) v4 
   -151> 2018-09-20 02:09:49.164 7f28b24ab700    1 -- 192.168.31.115:6808/1731 <== client.52464828 192.168.31.11:0/1009857432 43192 ==== osd_op(client.52464828.1:1451054 6.c5eabd3b 6:dcbd57a3:::100002 
 43572.000026a1:head [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e1002) v4 ==== 159+0+4194304 (780947299 0 4035575411) 0x56073dfe5040 con 0x56073a606000 
   -150> 2018-09-20 02:09:49.184 7f28b2cac700    5 -- 192.168.31.115:6809/1731 >> 192.168.31.113:6805/1853 conn(0x560738097500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=383 cs=37 l=0). rx  
 osd.16 seq 118789 0x56073b63d800 MOSDECSubOpWrite(6.129s3 1002/980 ECSubWrite(tid=174696, reqid=client.52464828.1:1451063, at_version=1002'210278, trim_to=918'207202, roll_forward_to=1002'210277))  
 v2 
   -149> 2018-09-20 02:09:49.184 7f28b2cac700    1 -- 192.168.31.115:6809/1731 <== osd.16 192.168.31.113:6805/1853 118789 ==== MOSDECSubOpWrite(6.129s3 1002/980 ECSubWrite(tid=174696, reqid=client.524 
 64828.1:1451063, at_version=1002'210278, trim_to=918'207202, roll_forward_to=1002'210277)) v2 ==== 1402736+0+0 (2212463013 0 0) 0x56073b63d800 con 0x560738097500 
   -148> 2018-09-20 02:09:49.184 7f2894a6d700    5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'210278, trimmed: , trimmed_dups: , clear_ 
 divergent_priors: 0 
   -147> 2018-09-20 02:09:49.184 7f2895a6f700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 -- MOSDECSubOpWrite(6.13bs4 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054, 
  at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x560709cab800 con 0 
   -146> 2018-09-20 02:09:49.184 7f2895a6f700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.112:6809/3750 -- MOSDECSubOpWrite(6.13bs2 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054, 
  at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x56074152d000 con 0 
   -145> 2018-09-20 02:09:49.184 7f2895a6f700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.114:6807/1001865 -- MOSDECSubOpWrite(6.13bs3 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:14510 
 54, at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x56073b7dc000 con 0 
   -144> 2018-09-20 02:09:49.184 7f2895a6f700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.113:6804/1845 -- MOSDECSubOpWrite(6.13bs1 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054, 
  at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x560709d16800 con 0 
   -143> 2018-09-20 02:09:49.188 7f2895a6f700    5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'213984, trimmed: , trimmed_dups: , clear_ 
 divergent_priors: 0 
   -142> 2018-09-20 02:09:49.188 7f2894a6d700    1 get compressor lz4 = 0x560738442780 
   -141> 2018-09-20 02:09:49.188 7f28b1caa700    5 -- 192.168.31.115:6809/1731 >> 192.168.31.111:6801/1853 conn(0x560738096e00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=337 cs=44 l=0). rx  
 osd.1 seq 137529 0x56073b237000 MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135880, reqid=client.52464828.1:1451103, at_version=0'0, trim_to=918'208389, roll_forward_to=1002'211422)) v2 
   -140> 2018-09-20 02:09:49.188 7f28b1caa700    1 -- 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 137529 ==== MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135880, reqid=client.5246 
 4828.1:1451103, at_version=0'0, trim_to=918'208389, roll_forward_to=1002'211422)) v2 ==== 873+0+0 (2773962764 0 0) 0x56073b237000 con 0x560738096e00 
   -139> 2018-09-20 02:09:49.188 7f28b2cac700    5 -- 192.168.31.115:6809/1731 >> 192.168.31.114:6807/1001865 conn(0x560738ce1c00 :6809 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2816 cs=34 l=0 
 ). rx osd.15 seq 126801 0x560743144800 MOSDECSubOpWrite(6.c2s4 1002/980 ECSubWrite(tid=6776444, reqid=client.52464828.1:1451116, at_version=1002'210831, trim_to=918'207828, roll_forward_to=1002'210 
 830)) v2 
   -138> 2018-09-20 02:09:49.188 7f28b2cac700    1 -- 192.168.31.115:6809/1731 <== osd.15 192.168.31.114:6807/1001865 126801 ==== MOSDECSubOpWrite(6.c2s4 1002/980 ECSubWrite(tid=6776444, reqid=client. 
 52464828.1:1451116, at_version=1002'210831, trim_to=918'207828, roll_forward_to=1002'210830)) v2 ==== 6464+0+0 (791773677 0 0) 0x560743144800 con 0x560738ce1c00 
   -137> 2018-09-20 02:09:49.192 7f2893a6b700    5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'210831, trimmed: , trimmed_dups: , clear_ 
 divergent_priors: 0 
   -136> 2018-09-20 02:09:49.192 7f289fa83700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 -- MOSDECSubOpWriteReply(6.107s0 1002/980 ECSubWriteReply(tid=135880, last_complete=1002'211 
 422, committed=1, applied=1)) v2 -- 0x5607404e7cc0 con 0 
   -135> 2018-09-20 02:09:49.192 7f2893a6b700    1 get compressor lz4 = 0x560738442780 
   -134> 2018-09-20 02:09:49.192 7f2895a6f700    1 get compressor lz4 = 0x560738442780 
   -133> 2018-09-20 02:09:49.196 7f289fa83700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.114:6807/1001865 -- MOSDECSubOpWriteReply(6.c2s0 1002/980 ECSubWriteReply(tid=6776444, last_complete=1002' 
 210831, committed=1, applied=1)) v2 -- 0x56073a125600 con 0 
   -132> 2018-09-20 02:09:49.200 7f289fa83700    1 -- 192.168.31.115:6809/1731 --> 192.168.31.113:6805/1853 -- MOSDECSubOpWriteReply(6.129s0 1002/980 ECSubWriteReply(tid=174696, last_complete=1002'210 
 278, committed=1, applied=1)) v2 -- 0x5607211be900 con 0 
   -131> 2018-09-20 02:09:49.208 7f28b24ab700    5 -- 192.168.31.115:6809/1731 >> 192.168.31.113:6804/1845 conn(0x560737f6a000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=322 cs=31 l=0). rx  
 osd.17 seq 117211 0x560743c32900 MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_complete=1002'213984, committed=1, applied=1)) v2 
   -130> 2018-09-20 02:09:49.208 7f28b24ab700    1 -- 192.168.31.115:6809/1731 <== osd.17 192.168.31.113:6804/1845 117211 ==== MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_c 
 omplete=1002'213984, committed=1, applied=1)) v2 ==== 95+0+0 (3771332997 0 0) 0x560743c32900 con 0x560737f6a000 
   -129> 2018-09-20 02:09:49.208 7f28b1caa700    5 -- 192.168.31.115:6809/1731 >> 192.168.31.111:6801/1853 conn(0x560738096e00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=337 cs=44 l=0). rx  
 osd.1 seq 137530 0x5607404e7cc0 MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_complete=1002'213984, committed=1, applied=1)) v2 
   -128> 2018-09-20 02:09:49.208 7f28b1caa700    1 -- 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 1 

 2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904 
      0> 2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904 
 </pre>

Back