Project

General

Profile

Bug #22736

rgw file deadlock on lru evicting

Added by Zongyou Yao 12 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
Start date:
01/19/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Hi,

A deadlock occurs when uploading files using nfs proto by nfs-ganesha, here is my nfs-ganesha conf:
LOG {
components {
FSAL = INFO;
}
}

CACHEINODE {
Entries_HWMark = 60;
Cache_FDs = FALSE;
LRU_Run_Interval = 1;
FD_HWMark_Percent = 20;
FD_LWMark_Percent = 10;
FD_Limit_Percent = 30;
Attr_Expiration_Time = 0;
}

EXPORT {
Export_ID=1;

Path = "testbucket";
Pseudo = "/";
Access_Type = RW;
Protocols = 4;
Transports = TCP;
  1. Exporting FSAL
    FSAL {
    Name = RGW;
    User_Id = "testid";
    Access_Key_Id = "0555b35654ad1656d804";
    Secret_Access_Key = "h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q==";
    }
    }

RGW {
ceph_conf = "/path/to/ceph.conf";
init_args = "--debug-rgw=5 --rgw_enable_lc_threads=false";
}

and I mount using the following command:
mount -t nfs -o sync 127.0.0.1:/ /mnt

The uploading hangs after uploading the first 60 objects. Here is the backtrace for nfs-ganesha:
Thread 132 (Thread 0x7fa6e3f77700 (LWP 23149)):
#0 0x00007fa77713f1bd in _lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fa77713ad02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007fa77713ac08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fa767756c83 in __gthread_mutex_lock (
_mutex=<optimized out>) at /data/home/richardyao/bin/gcc72/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
#4 std::mutex::lock (this=<optimized out>) at /data/home/richardyao/bin/gcc72/include/c++/7.2.0/bits/std_mutex.h:103
#5 0x00007fa767768d8d in cohort::lru::TreeX<rgw::RGWFileHandle, boost::intrusive::rbtree<rgw::RGWFileHandle, boost::intrusive::compare<rgw::RGWFileHandle::FhLT>, boost::intrusive::member_hook<rgw::RGWFileHandle, boost::intrusive::set_member_hook<boost::intrusive::link_mode<(boost::intrusive::link_mode_type)1>, void, void, void>, &rgw::RGWFileHandle::fh_hook>, void, void, void, void>, rgw::RGWFileHandle::FhLT, rgw::RGWFileHandle::FhEQ, rgw::fh_key, std::mutex>::remove (this=0x7fa76a0cee10, hk=16837666083334884419, v=v@entry=0x7fa695d40200, flags=flags@entry=1) at /data/home/richardyao/workspace/github/ceph/src/common/cohort_lru.h:444
#6 0x00007fa76774a368 in rgw::RGWFileHandle::reclaim (this=0x7fa695d40200) at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:1012
#7 0x00007fa76776b631 in evict_block (this=<optimized out>) at /data/home/richardyao/workspace/github/ceph/src/common/cohort_lru.h:146
#8 insert (flags=1, edge=cohort::lru::MRU, fac=<synthetic pointer>, this=<optimized out>) at /data/home/richardyao/workspace/github/ceph/src/common/cohort_lru.h:238
#9 rgw::RGWLibFS::lookup_fh (this=this@entry=0x7fa76a0cec00, parent=parent@entry=0x7fa76a0cec20, name=name@entry=0x7fa758c27058 "61", flags=flags@entry=68) at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.h:1064
#10 0x00007fa76774d97e in rgw::RGWLibFS::create (this=0x7fa76a0cec00, parent=0x7fa76a0cec20, name=0x7fa758c27058 "61", st=0x7fa6e3f72a80, mask=7, flags=0) at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:645
#11 0x00007fa76774dc14 in rgw_create (rgw_fs=<optimized out>, parent_fh=<optimized out>, name=<optimized out>, st=<optimized out>, mask=<optimized out>, fh=0x7fa6e3f72998, posix_flags=193, flags=0) at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:1665
#12 0x00007fa767dcb4c3 in rgw_fsal_open2 (obj_hdl=0x7fa765608200, state=0x7fa758c39d40, openflags=2, createmode=FSAL_EXCLUSIVE, name=0x7fa758c27058 "61", attrib_set=0x7fa6e3f736e0, verifier=0x7fa6e3f737c8 "\212\263\244)\257\\", new_obj=0x7fa6e3f72d00, attrs_out=0x7fa6e3f72c10, caller_perm_check=0x7fa6e3f72daf) at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/FSAL_RGW/handle.c:1060
#13 0x0000000000535a48 in mdcache_open2 (obj_hdl=0x7fa76557e738, state=0x7fa758c39d40, openflags=2, createmode=FSAL_EXCLUSIVE, name=0x7fa758c27058 "61", attrs_in=0x7fa6e3f736e0, verifier=0x7fa6e3f737c8 "\212\263\244)\257\\", new_obj=0x7fa6e3f737e0, attrs_out=0x0, caller_perm_check=0x7fa6e3f72daf) at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:672
#14 0x000000000042fda3 in open2_by_name (in_obj=0x7fa76557e738, state=0x7fa758c39d40, openflags=2, createmode=FSAL_EXCLUSIVE, name=0x7fa758c27058 "61", attr=0x7fa6e3f736e0, verifier=0x7fa6e3f737c8 "\212\263\244)\257\\", obj=0x7fa6e3f737e0, attrs_out=0x0) at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/fsal_helper.c:406
#15 0x0000000000432a8c in fsal_open2 (in_obj=0x7fa76557e738, state=0x7fa758c39d40, openflags=2, createmode=FSAL_EXCLUSIVE, name=0x7fa758c27058 "61", attr=0x7fa6e3f736e0, verifier=0x7fa6e3f737c8 "\212\263\244)\257\\", obj=0x7fa6e3f737e0, attrs_out=0x0) at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/fsal_helper.c:1788
#16 0x000000000047054d in open4_ex (arg=0x7fa6a19f7e98, data=0x7fa6e3f73950, res_OPEN4=0x7fa758c81128, clientid=0x7fa6a1019100, owner=0x7fa69981d200, file_state=0x7fa6e3f73898, new_state=0x7fa6e3f73897) at /data/home/richardyao/workspace/github/nfs-ganesha/src/Protocols/NFS/nfs4_op_open.c:1440
#17 0x000000000047191a in nfs4_op_open (op=0x7fa6a19f7e90, data=0x7fa6e3f73950, resp=0x7fa758c81120) at /data/home/richardyao/workspace/github/nfs-ganesha/src/Protocols/NFS/nfs4_op_open.c:1845
#18 0x000000000045c791 in nfs4_Compound (arg=0x7fa6a1a28a30, req=0x7fa6a1a28228, res=0x7fa69740e280) at /data/home/richardyao/workspace/github/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:743
#19 0x000000000044b19d in nfs_rpc_execute (reqdata=0x7fa6a1a28200) at /data/home/richardyao/workspace/github/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1290
#20 0x000000000044b9b7 in worker_run (ctx=0x7fa7287df880) at /data/home/richardyao/workspace/github/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1562
#21 0x0000000000504bc5 in fridgethr_start_routine (arg=0x7fa7287df880) at /data/home/richardyao/workspace/github/nfs-ganesha/src/support/fridgethr.c:550
#22 0x00007fa777138dc5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007fa77649774d in clone () from /lib64/libc.so.6


Related issues

Copied to rgw - Backport #22773: luminous: rgw file deadlock on lru evicting Resolved
Copied to rgw - Backport #22774: jewel: rgw file deadlock on lru evicting Resolved

History

#2 Updated by Matt Benjamin 12 months ago

  • Status changed from New to In Progress
  • Priority changed from Normal to High
  • Backport set to jewel, luminous

#3 Updated by Nathan Cutler 12 months ago

  • Status changed from In Progress to Pending Backport

#4 Updated by Nathan Cutler 12 months ago

  • Copied to Backport #22773: luminous: rgw file deadlock on lru evicting added

#5 Updated by Nathan Cutler 12 months ago

  • Copied to Backport #22774: jewel: rgw file deadlock on lru evicting added

#6 Updated by Nathan Cutler 10 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF