Bug #21861
osdc: truncate Object and remove the bh which have someone wait for read on it occur a assert fail.
Status:
New
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Community (dev)
Tags:
crash
Backport:
luminous,mimic
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Client, osdc
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ceph version: jewel 10.2.2
When one osd be written over the full_ratio(default is 0.95) will lead the cluster to full state.
As a result client will purge the oset which is dirty_or_tx. But if a read op come in and new a
bh in a object and send read op to osd, coincidentally we are purging this object will produce a
assert fail.
//16.55.20.32 read op come in
2017-10-13 16:55:20.324566 7ff400ff9700 3 client.1444221 ll_read 0x7ff40c060920 1000001c635 0~131072 2017-10-13 16:55:20.324569 7ff400ff9700 10 client.1444221 get_caps 1000001c635.head(faked_ino=0 ref=8 ll_ref=5 cap_refs={1024=1,4096=0,8192=1} open={1=1,2=0} mode=100666 size=815191/4194304 mtime=2017-10-13 16:55:20.232085 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[1000001c635 ts 0/0 objects 1 dirty_or_tx 143801] parents=0x7ff40c0b4aa0 0x7ff40c3b2a90) have pAsxLsXsxFsxcrwb need Fr want Fc but not Fc revoking - 2017-10-13 16:55:20.324580 7ff400ff9700 10 client.1444221 *_read_async* 1000001c635.head(faked_ino=0 ref=8 ll_ref=5 cap_refs={1024=1,2048=1,4096=0,8192=1} open={1=1,2=0} mode=100666 size=815191/4194304 mtime=2017-10-13 16:55:20.232085 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[1000001c635 ts 0/0 objects 1 dirty_or_tx 143801] parents=0x7ff40c0b4aa0 0x7ff40c3b2a90) 0~131072 2017-10-13 16:55:20.324585 7ff400ff9700 10 client.1444221 min_bytes=131072 max_bytes=16777216 max_periods=4 2017-10-13 16:55:20.324620 7ff400ff9700 1 -- 192.168.12.216:0/3579173032 --> 192.168.12.216:6803/22142 -- osd_op(client.1444221.0:2102 452.9a6db107 1000001c635.00000000 [read 0~131072] snapc 0=[] ack+read+known_if_redirected e6766) v7 -- ?+0 0x7ff404019820 con 0x7ff40c04bef0 //*send the op to osd *
//16.55.25 purge object 1000001c635.head 2017-10-13 16:55:25.214192 7ff420ff9700 4 client.1444221* _handle_full_flag:* FULL: inode 0x1000001c635.head has dirty objects, purging and setting ENOSPC 2017-10-13 16:55:25.785130 7ff420ff9700 -1 osdc/ObjectCacher.cc: In function 'void ObjectCacher::Object::truncate(loff_t)' thread 7ff420ff9700 time 2017-10-13 16:55:25.214197 osdc/ObjectCacher.cc: 491: *FAILED assert(bh->waitfor_read.empty())* ceph version attr-v1-file-share-op-code-28-g1411918 (1411918ff2a8358567e32bad78f22ee4974c5975) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7ff42e6ceeb5] 2: (ObjectCacher::Object::truncate(long)+0x266) [0x7ff42e57de56] 3: (ObjectCacher::purge(ObjectCacher::Object*)+0x7a) [0x7ff42e57df2a] 4: (ObjectCacher::purge_set(ObjectCacher::ObjectSet*)+0xab) [0x7ff42e57e16b] 5: (Client::_handle_full_flag(long)+0x1d0) [0x7ff42e4d0540] 6: (Client::handle_osd_map(MOSDMap*)+0x17f) [0x7ff42e4d600f] 7: (Client::ms_dispatch(Message*)+0x5e3) [0x7ff42e540a73] 8: (DispatchQueue::entry()+0x78a) [0x7ff42e7dd1aa] 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff42e6b386d] 10: (()+0x7e25) [0x7ff42d337e25] 11: (clone()+0x6d) [0x7ff42c21f34d]
The read op come back and filled the bh also can't affect the cluster because it's clean will not be flushed to osd.
So i think we can let go of this bh if someone was waiting for it instead produce a assert fail.What do you think
about it.
History
#1 Updated by Greg Farnum over 5 years ago
- Project changed from Ceph to CephFS
- Category deleted (
Objecter) - Assignee set to Zheng Yan
Zheng, is this what your recent master PR for ObjectCacher fixed?
#2 Updated by Patrick Donnelly about 5 years ago
- Subject changed from ceph-fuse truncate Object and remove the bh which have someone wait for read on it occur a assert fail. to osdc: truncate Object and remove the bh which have someone wait for read on it occur a assert fail.
- Description updated (diff)
- Category set to Correctness/Safety
- Priority changed from Normal to High
- Target version changed from v12.2.2 to v14.0.0
- Tags set to crash
- Backport set to luminous,mimic
- Component(FS) Client, osdc added
#3 Updated by Zheng Yan over 4 years ago
I think this bug still exists in master
#4 Updated by Patrick Donnelly about 4 years ago
- Target version changed from v14.0.0 to v15.0.0
#5 Updated by Patrick Donnelly over 3 years ago
- Assignee deleted (
Zheng Yan) - Target version deleted (
v15.0.0)