Bug #21861: osdc: truncate Object and remove the bh which have someone wait for read on it occur a assert fail. - CephFS - Ceph

Bug #21861

Updated by Patrick Donnelly about 6 years ago

ceph version: jewel 10.2.2  


 When one osd be written over the full_ratio(default is 0.95) will lead the cluster to full state. 
 As a result client will purge the oset which is dirty_or_tx. But if a read op come in and new a  
 bh in a object and send read op to osd, coincidentally we are purging this object will produce a  
 assert fail. 

 //16.55.20.32 read op come in  
 <pre> 
 2017-10-13 16:55:20.324566 7ff400ff9700    3 client.1444221 ll_read 0x7ff40c060920 1000001c635    0~131072 
 2017-10-13 16:55:20.324569 7ff400ff9700 10 client.1444221 get_caps 1000001c635.head(faked_ino=0 ref=8 ll_ref=5 cap_refs={1024=1,4096=0,8192=1} open={1=1,2=0} mode=100666 size=815191/4194304 mtime=2017-10-13 16:55:20.232085 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[1000001c635 ts 0/0 objects 1 dirty_or_tx 143801] parents=0x7ff40c0b4aa0 0x7ff40c3b2a90) have pAsxLsXsxFsxcrwb need Fr want Fc but not Fc revoking - 
 2017-10-13 16:55:20.324580 7ff400ff9700 10 client.1444221 *_read_async* 1000001c635.head(faked_ino=0 ref=8 ll_ref=5 cap_refs={1024=1,2048=1,4096=0,8192=1} open={1=1,2=0} mode=100666 size=815191/4194304 mtime=2017-10-13 16:55:20.232085 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[1000001c635 ts 0/0 objects 1 dirty_or_tx 143801] parents=0x7ff40c0b4aa0 0x7ff40c3b2a90) 0~131072 
 2017-10-13 16:55:20.324585 7ff400ff9700 10 client.1444221    min_bytes=131072 max_bytes=16777216 max_periods=4 
 2017-10-13 16:55:20.324620 7ff400ff9700    1 -- 192.168.12.216:0/3579173032 --> 192.168.12.216:6803/22142 -- osd_op(client.1444221.0:2102 452.9a6db107 1000001c635.00000000 [read 0~131072] snapc 0=[] ack+read+known_if_redirected e6766) v7 -- ?+0 0x7ff404019820 con 0x7ff40c04bef0    //*send the op to osd 
 * 
 </pre> 

 <pre> 
 


 //16.55.25 purge object 1000001c635.head  
 2017-10-13 16:55:25.214192 7ff420ff9700    4 client.1444221* _handle_full_flag:* FULL: inode 0x1000001c635.head has dirty objects, purging and setting ENOSPC 
 2017-10-13 16:55:25.785130 7ff420ff9700 -1 osdc/ObjectCacher.cc: In function 'void ObjectCacher::Object::truncate(loff_t)' thread 7ff420ff9700 time 2017-10-13 16:55:25.214197 
 osdc/ObjectCacher.cc: 491: *FAILED assert(bh->waitfor_read.empty())* 
 ceph version attr-v1-file-share-op-code-28-g1411918 (1411918ff2a8358567e32bad78f22ee4974c5975) 
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7ff42e6ceeb5] 
 2: (ObjectCacher::Object::truncate(long)+0x266) [0x7ff42e57de56] 
 3: (ObjectCacher::purge(ObjectCacher::Object*)+0x7a) [0x7ff42e57df2a] 
 4: (ObjectCacher::purge_set(ObjectCacher::ObjectSet*)+0xab) [0x7ff42e57e16b] 
 5: (Client::_handle_full_flag(long)+0x1d0) [0x7ff42e4d0540] 
 6: (Client::handle_osd_map(MOSDMap*)+0x17f) [0x7ff42e4d600f] 
 7: (Client::ms_dispatch(Message*)+0x5e3) [0x7ff42e540a73] 
 8: (DispatchQueue::entry()+0x78a) [0x7ff42e7dd1aa] 
 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff42e6b386d] 
 10: (()+0x7e25) [0x7ff42d337e25] 
 11: (clone()+0x6d) [0x7ff42c21f34d] 
 </pre> 

 


 The read op come back and filled the bh also can't affect the cluster because it's clean will not be flushed to osd. 
 So i think we can let go of this bh if someone was waiting for it instead produce a assert fail.What do you think 
 about it.

Back

Project

General

Profile

Ceph » CephFS

Bug #21861