Project

General

Profile

Bug #53227

Updated by Patrick Donnelly over 2 years ago

GDB: 

 <pre> 
 #12 0x00007f72626352ab in ObjectCacher::C_RetryRead::finish (this=0x7f71dc001f80, r=<optimized out>) at /var/ws/ivan/nautilus-ceph-14.2.5/src/osdc/ObjectCacher.cc:82 
 82              r = oc->_readx(rd, oset, onfinish, false, &trace); 
 (gdb) p rd 
 $6 = (ObjectCacher::OSDRead *) 0x7f6a1a8ac7b0 

 (gdb) pmap hits loff_t BufferHead* 
 elem[0].left: $60 = 0 
 elem[0].right: $61 = (ObjectCacher::BufferHead *) 0x7f6a38fbaf60 
 elem[1].left: $62 = 131072 
 elem[1].right: $63 = (ObjectCacher::BufferHead *) 0x7f6a3d80fce0 
 elem[2].left: $64 = 262144 
 elem[2].right: $65 = (ObjectCacher::BufferHead *) 0x7f6a4e6e86c0 
 elem[3].left: $66 = 393216 
 elem[3].right: $67 = (ObjectCacher::BufferHead *) 0x7f6a4a43c370 
 elem[4].left: $68 = 524288 
 elem[4].right: $69 = (ObjectCacher::BufferHead *) 0x7f6a3d80f940 
 elem[5].left: $70 = 655360 
 elem[5].right: $71 = (ObjectCacher::BufferHead *) 0x7f6a571cfe70 
 elem[6].left: $72 = 786432 
 elem[6].right: $73 = (ObjectCacher::BufferHead *) 0x7f6a274e5570 
 elem[7].left: $74 = 917504 
 elem[7].right: $75 = (ObjectCacher::BufferHead *) 0x7f6a4ce46510 
 elem[8].left: $76 = 1048576 
 elem[8].right: $77 = (ObjectCacher::BufferHead *) 0x7f6a216f1a00 
 elem[9].left: $78 = 1179648 
 elem[9].right: $79 = (ObjectCacher::BufferHead *) 0x7f6a57fffcb0 
 elem[10].left: $80 = 1441792 
 elem[10].right: $81 = (ObjectCacher::BufferHead *) 0x7f6a33e64180 
 elem[11].left: $82 = 1572864 
 elem[11].right: $83 = (ObjectCacher::BufferHead *) 0x7f6a313543b0 
 elem[12].left: $84 = 1703936 
 elem[12].right: $85 = (ObjectCacher::BufferHead *) 0x7f6a19960540 
 elem[13].left: $86 = 1835008 
 elem[13].right: $87 = (ObjectCacher::BufferHead *) 0x7f6a2a7b05d0 
 elem[14].left: $88 = 1966080 
 elem[14].right: $89 = (ObjectCacher::BufferHead *) 0x7f6a47ac5ce0 
 elem[15].left: $90 = 2097152 
 elem[15].right: $91 = (ObjectCacher::BufferHead *) 0x7f6a37b8c840 
 elem[16].left: $92 = 2228224 
 elem[16].right: $93 = (ObjectCacher::BufferHead *) 0x7f6a5464e3d0 
 elem[17].left: $94 = 2490368 
 elem[17].right: $95 = (ObjectCacher::BufferHead *) 0x7f6f923edca0 
 elem[18].left: $96 = 2752512 
 elem[18].right: $97 = (ObjectCacher::BufferHead *) 0x7f6a24ad4300 
 elem[19].left: $98 = 2883584 
 elem[19].right: $99 = (ObjectCacher::BufferHead *) 0x7f6a351c37d0 
 Map size = 20 

 (gdb) p (*(BufferHead *)0x7f6a24ad4300).state 
 $310 = 5     //STATE_TX 
 (gdb) p (*(BufferHead *)0x7f6a24ad4300).error 
 $311 = 0 
 (gdb) p (*(BufferHead *)0x7f6a351c37d0).state   
 $312 = 6    //STATE_ERROR 
 (gdb) p (*(BufferHead *)0x7f6a351c37d0).error 
 $313 = 0    //should be eio 5 
 </pre> 

 in conclusion, bh 0x7f6a351c37d0 have STATE_ERROR, but error number is 0, 
 it's previous bh have STATE_TX, this write triggered bh split. 


 How did the problem arise: 
 when readahead, something make osd reply eio, the error number is 5, 
 eio will be assigned to bh->error, then will read_cond.Signal. 
 But there is a writing that triggers the split of bh. 
 in split will new a BufferHead and and use error(0) to initialize, the eio will lost. 
 When read is awakened, Will continue to branch !error, client will crash. 


 How to resolve: 
 add function set_error in BufferHead, when split set_error like set_state. 

Back