Project

General

Profile

Bug #36096

osd: crashing with errors: "write_log_and_missing with: dirty_to" and "'int BlueStore::_do_alloc_write(BlueStore::TransContext"

Added by Greg Smith over 5 years ago. Updated over 5 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
BlueStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Our Ceph with version Mimic is crashing OSD nodes with the following error in the log:

2018-09-20 02:09:49.160 7f28b1caa700  1 -- 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 137528 ==== MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135878, reqid=client.5246
4828.1:1451103, at_version=1002'211422, trim_to=918'208389, roll_forward_to=1002'211420)) v2 ==== 1402736+0+0 (1916543509 0 0) 0x5607398b0800 con 0x560738096e00
  -158> 2018-09-20 02:09:49.160 7f289426c700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'211422, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
  -157> 2018-09-20 02:09:49.160 7f28b2cac700  5 -- 192.168.31.115:6809/1731 >> 192.168.31.113:6805/1853 conn(0x560738097500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=383 cs=37 l=0). rx 
osd.16 seq 118788 0x56073e2ed800 MOSDECSubOpWrite(6.24es4 1002/987 ECSubWrite(tid=174695, reqid=client.52464828.1:1451058, at_version=0'0, trim_to=907'100836, roll_forward_to=1002'103931)) v2
  -156> 2018-09-20 02:09:49.160 7f28b2cac700  1 -- 192.168.31.115:6809/1731 <== osd.16 192.168.31.113:6805/1853 118788 ==== MOSDECSubOpWrite(6.24es4 1002/987 ECSubWrite(tid=174695, reqid=client.524
64828.1:1451058, at_version=0'0, trim_to=907'100836, roll_forward_to=1002'103931)) v2 ==== 873+0+0 (3852236216 0 0) 0x56073e2ed800 con 0x560738097500
  -155> 2018-09-20 02:09:49.160 7f289fa83700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.113:6805/1853 -- MOSDECSubOpWriteReply(6.24es0 1002/987 ECSubWriteReply(tid=174695, last_complete=1002'103
931, committed=1, applied=1)) v2 -- 0x56071f3ab440 con 0
  -154> 2018-09-20 02:09:49.164 7f289426c700  1 get compressor lz4 = 0x560738442780
  -153> 2018-09-20 02:09:49.164 7f289fa83700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 -- MOSDECSubOpWriteReply(6.107s0 1002/980 ECSubWriteReply(tid=135878, last_complete=1002'211
422, committed=1, applied=1)) v2 -- 0x560723047200 con 0
  -152> 2018-09-20 02:09:49.164 7f28b24ab700  5 -- 192.168.31.115:6808/1731 >> 192.168.31.11:0/1009857432 conn(0x56073a606000 :6808 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=151 cs=1 l=1). 
rx client.52464828 seq 43192 0x56073dfe5040 osd_op(client.52464828.1:1451054 6.c5eabd3b 6:dcbd57a3:::10000243572.000026a1:head [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e1002) v4
  -151> 2018-09-20 02:09:49.164 7f28b24ab700  1 -- 192.168.31.115:6808/1731 <== client.52464828 192.168.31.11:0/1009857432 43192 ==== osd_op(client.52464828.1:1451054 6.c5eabd3b 6:dcbd57a3:::100002
43572.000026a1:head [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e1002) v4 ==== 159+0+4194304 (780947299 0 4035575411) 0x56073dfe5040 con 0x56073a606000
  -150> 2018-09-20 02:09:49.184 7f28b2cac700  5 -- 192.168.31.115:6809/1731 >> 192.168.31.113:6805/1853 conn(0x560738097500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=383 cs=37 l=0). rx 
osd.16 seq 118789 0x56073b63d800 MOSDECSubOpWrite(6.129s3 1002/980 ECSubWrite(tid=174696, reqid=client.52464828.1:1451063, at_version=1002'210278, trim_to=918'207202, roll_forward_to=1002'210277)) 
v2
  -149> 2018-09-20 02:09:49.184 7f28b2cac700  1 -- 192.168.31.115:6809/1731 <== osd.16 192.168.31.113:6805/1853 118789 ==== MOSDECSubOpWrite(6.129s3 1002/980 ECSubWrite(tid=174696, reqid=client.524
64828.1:1451063, at_version=1002'210278, trim_to=918'207202, roll_forward_to=1002'210277)) v2 ==== 1402736+0+0 (2212463013 0 0) 0x56073b63d800 con 0x560738097500
  -148> 2018-09-20 02:09:49.184 7f2894a6d700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'210278, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
  -147> 2018-09-20 02:09:49.184 7f2895a6f700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 -- MOSDECSubOpWrite(6.13bs4 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054,
 at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x560709cab800 con 0
  -146> 2018-09-20 02:09:49.184 7f2895a6f700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.112:6809/3750 -- MOSDECSubOpWrite(6.13bs2 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054,
 at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x56074152d000 con 0
  -145> 2018-09-20 02:09:49.184 7f2895a6f700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.114:6807/1001865 -- MOSDECSubOpWrite(6.13bs3 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:14510
54, at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x56073b7dc000 con 0
  -144> 2018-09-20 02:09:49.184 7f2895a6f700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.113:6804/1845 -- MOSDECSubOpWrite(6.13bs1 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054,
 at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x560709d16800 con 0
  -143> 2018-09-20 02:09:49.188 7f2895a6f700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'213984, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
  -142> 2018-09-20 02:09:49.188 7f2894a6d700  1 get compressor lz4 = 0x560738442780
  -141> 2018-09-20 02:09:49.188 7f28b1caa700  5 -- 192.168.31.115:6809/1731 >> 192.168.31.111:6801/1853 conn(0x560738096e00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=337 cs=44 l=0). rx 
osd.1 seq 137529 0x56073b237000 MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135880, reqid=client.52464828.1:1451103, at_version=0'0, trim_to=918'208389, roll_forward_to=1002'211422)) v2
  -140> 2018-09-20 02:09:49.188 7f28b1caa700  1 -- 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 137529 ==== MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135880, reqid=client.5246
4828.1:1451103, at_version=0'0, trim_to=918'208389, roll_forward_to=1002'211422)) v2 ==== 873+0+0 (2773962764 0 0) 0x56073b237000 con 0x560738096e00
  -139> 2018-09-20 02:09:49.188 7f28b2cac700  5 -- 192.168.31.115:6809/1731 >> 192.168.31.114:6807/1001865 conn(0x560738ce1c00 :6809 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2816 cs=34 l=0
). rx osd.15 seq 126801 0x560743144800 MOSDECSubOpWrite(6.c2s4 1002/980 ECSubWrite(tid=6776444, reqid=client.52464828.1:1451116, at_version=1002'210831, trim_to=918'207828, roll_forward_to=1002'210
830)) v2
  -138> 2018-09-20 02:09:49.188 7f28b2cac700  1 -- 192.168.31.115:6809/1731 <== osd.15 192.168.31.114:6807/1001865 126801 ==== MOSDECSubOpWrite(6.c2s4 1002/980 ECSubWrite(tid=6776444, reqid=client.
52464828.1:1451116, at_version=1002'210831, trim_to=918'207828, roll_forward_to=1002'210830)) v2 ==== 6464+0+0 (791773677 0 0) 0x560743144800 con 0x560738ce1c00
  -137> 2018-09-20 02:09:49.192 7f2893a6b700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'210831, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
  -136> 2018-09-20 02:09:49.192 7f289fa83700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 -- MOSDECSubOpWriteReply(6.107s0 1002/980 ECSubWriteReply(tid=135880, last_complete=1002'211
422, committed=1, applied=1)) v2 -- 0x5607404e7cc0 con 0
  -135> 2018-09-20 02:09:49.192 7f2893a6b700  1 get compressor lz4 = 0x560738442780
  -134> 2018-09-20 02:09:49.192 7f2895a6f700  1 get compressor lz4 = 0x560738442780
  -133> 2018-09-20 02:09:49.196 7f289fa83700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.114:6807/1001865 -- MOSDECSubOpWriteReply(6.c2s0 1002/980 ECSubWriteReply(tid=6776444, last_complete=1002'
210831, committed=1, applied=1)) v2 -- 0x56073a125600 con 0
  -132> 2018-09-20 02:09:49.200 7f289fa83700  1 -- 192.168.31.115:6809/1731 --> 192.168.31.113:6805/1853 -- MOSDECSubOpWriteReply(6.129s0 1002/980 ECSubWriteReply(tid=174696, last_complete=1002'210
278, committed=1, applied=1)) v2 -- 0x5607211be900 con 0
  -131> 2018-09-20 02:09:49.208 7f28b24ab700  5 -- 192.168.31.115:6809/1731 >> 192.168.31.113:6804/1845 conn(0x560737f6a000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=322 cs=31 l=0). rx 
osd.17 seq 117211 0x560743c32900 MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_complete=1002'213984, committed=1, applied=1)) v2
  -130> 2018-09-20 02:09:49.208 7f28b24ab700  1 -- 192.168.31.115:6809/1731 <== osd.17 192.168.31.113:6804/1845 117211 ==== MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_c
omplete=1002'213984, committed=1, applied=1)) v2 ==== 95+0+0 (3771332997 0 0) 0x560743c32900 con 0x560737f6a000
  -129> 2018-09-20 02:09:49.208 7f28b1caa700  5 -- 192.168.31.115:6809/1731 >> 192.168.31.111:6801/1853 conn(0x560738096e00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=337 cs=44 l=0). rx 
osd.1 seq 137530 0x5607404e7cc0 MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_complete=1002'213984, committed=1, applied=1)) v2
  -128> 2018-09-20 02:09:49.208 7f28b1caa700  1 -- 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 1

2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904
     0> 2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904

History

#1 Updated by Patrick Donnelly over 5 years ago

  • Project changed from Ceph to RADOS
  • Subject changed from OSD nodes are craching with errors: "write_log_and_missing with: dirty_to" and "'int BlueStore::_do_alloc_write(BlueStore::TransContext" to osd: crashing with errors: "write_log_and_missing with: dirty_to" and "'int BlueStore::_do_alloc_write(BlueStore::TransContext"
  • Description updated (diff)
  • Category deleted (chef)
  • Target version deleted (v13.2.2)
  • Start date deleted (09/20/2018)
  • Source set to Community (user)

#2 Updated by Neha Ojha over 5 years ago

  • Component(RADOS) BlueStore added

#3 Updated by Neha Ojha over 5 years ago

  • Status changed from New to Need More Info

Could you please attach logs to this tracker?

Also available in: Atom PDF