Project

General

Profile

Bug #47235

rgw/rgw_file: incorrect lru object eviction in lookup_fh

Added by Rixin Luo over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Tags:
Backport:
octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In func lookup_fh, when RGWFileHandle not be found in the cache, it
need to recycle an object and create an new RGWFileHandle. When there
are multi threads using lookup_fh concurrently, cache finding and lru evicting
happens in the same time.Imaging the case:
Thread a got Object A in fh_cache with filename f1 and Thread b evicting Object A
to replace Object A with filename f2 in another partiton. There are some possible
incorrect results:
1) ceph_assert(o->lru_refcnt == SENTINEL_REFCNT) happens as thread a increase
object A's lru_refcnt while thead b evicting object A.
2) Thread a use filename f1 get Object A in fh_cache during func lookup_fh before return,
at the same time Thread B replace Object A with filename f2, Thread A get lookup_fh's
return value Object A with filename f2.


Related issues

Copied to rgw - Backport #47850: octopus: rgw/rgw_file: incorrect lru object eviction in lookup_fh Resolved
Copied to rgw - Backport #47851: nautilus: rgw/rgw_file: incorrect lru object eviction in lookup_fh Resolved

History

#1 Updated by Casey Bodley over 3 years ago

rosin luo wrote:

In func lookup_fh, when RGWFileHandle not be found in the cache, it
need to recycle an object and create an new RGWFileHandle. When there
are multi threads using lookup_fh concurrently, cache finding and lru evicting
happens in the same time.Imaging the case:

what actual behavior are you observing as a result? is there a crash? can you provide more details?

#2 Updated by Casey Bodley over 3 years ago

  • Assignee set to Matt Benjamin

#3 Updated by Rixin Luo over 3 years ago

2020-07-24 18:24:20.220 fff5033df200 -1 /home/pgy/ceph/src/common/cohort_lru.h: In function 'cohort::lru::Object* cohort::lru::LRU<LK>::evict_block() [with LK = std::mutex]' thread fff5033df200 time 2020-07-24 18:24:20.233245
/home/pgy/ceph/src/common/cohort_lru.h: 160: FAILED ceph_assert(o->lru_refcnt == SENTINEL_REFCNT)
ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x150) [0xfff4ebc7a438]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xfff4ebc7a5f8]
3: (()+0x2426c4) [0xfff4fbfc26c4]
4: (rgw::RGWLibFS::lookup_fh(rgw::RGWFileHandle*, char const*, unsigned int)+0x280) [0xfff4fbfce8d0] 5: (rgw::RGWLibFS::stat_leaf(rgw::RGWFileHandle*, char const*, rgw_fh_type, unsigned int)+0x368) [0xfff4fbfaec50]
6: (rgw_lookup()+0x154) [0xfff4fbfb235c] 7: (Java_org_apache_hadoop_fs_s3a_CephRgwFileSystem_rgwLookup()+0x74) [0xfff4fc513198]
8: [0xffff9408f3a0]

#4 Updated by Daniel Gryniewicz over 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 36942

#5 Updated by Matt Benjamin over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to octopus, nautilus

#6 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47850: octopus: rgw/rgw_file: incorrect lru object eviction in lookup_fh added

#7 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47851: nautilus: rgw/rgw_file: incorrect lru object eviction in lookup_fh added

#8 Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF