Project

General

Profile

Bug #53354

rgw_file: multi-thread concurrency requests lead to librgw.so crashed

Added by Zhiwei Dai about 1 year ago. Updated 11 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

unpredictable crash stack
1,
[librgw.so.2+0x21952c] cohort::lru::LRU<std::mutex>::unref(cohort::lru::Object*, unsigned int)+0xcc
[librgw.so.2+0x20e494] rgw_close+0x44

2,
[librgw.so.2+0x2195fc] cohort::lru::LRU<std::mutex>::unref(cohort::lru::Object*, unsigned int)+0x19c
[librgw.so.2+0x20a5c8] rgw::RGWFileHandle::~RGWFileHandle()+0xd8
[librgw.so.2+0x20a5f4] rgw::RGWFileHandle::~RGWFileHandle()+0x14

3,
[librgw.so.2+0x203560] rgw_user::to_str(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) const+0x20
[librgw.so.2+0x209ee8] rgw::RGWFileHandle::make_fhk(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x58
[librgw.so.2+0x22c538] rgw::RGWLibFS::lookup_fh(rgw::RGWFileHandle*, char const*, unsigned int)+0x100
[librgw.so.2+0x20ff00] rgw::RGWLibFS::unlink(rgw::RGWFileHandle*, char const*, unsigned int)+0x58

History

#1 Updated by Zhiwei Dai about 1 year ago

A RGWFileHandle is been unref (rgw::RGWLibFS::unref), it may be used for other operation at the same time.
when lru length overpass rgw_nfs_lru_lane_hiwat, lru object will be deleted out of lru lane lock. Then, something bad will be happed.

#2 Updated by Casey Bodley 12 months ago

there was a fix to cohort lru in https://github.com/ceph/ceph/pull/43563 that may be related

#3 Updated by Casey Bodley 12 months ago

  • Assignee set to Matt Benjamin

#4 Updated by Zhiwei Dai 12 months ago

Thanks your reply. After applying the pr, the crash is always here. I will give an fix to solve this.

#5 Updated by Casey Bodley 12 months ago

  • Status changed from New to Need More Info

#6 Updated by Zhiwei Dai 11 months ago

a fix to solve the coredump fault:
https://github.com/ceph/ceph/pull/44326

Also available in: Atom PDF